This option was never enabled in GENERIC and does not appear to work
(the cdevsw is stored in a global array but never passed to make_dev
to be associated with a character device).
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D35008
These files no longer depend on the macros required when these checks
were added.
PR: 263102 (exp-run)
Reviewed by: brooks, imp, emaste
Differential Revision: https://reviews.freebsd.org/D34804
All supported compilers (modern versions of GCC and clang) support
this.
PR: 263102 (exp-run)
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D34802
All supported compilers (modern versions of GCC and clang) support
this.
Many places didn't have an #else so would just silently do the wrong
thing. Ancient versions of icc (the original motivation for this) are
no longer a compiler FreeBSD supports.
PR: 263102 (exp-run)
Reviewed by: brooks, imp
Differential Revision: https://reviews.freebsd.org/D34797
These headers #include <sys/cdefs.h> right after checking if it has
already been #included. The nested #include already existed when the
check for _SYS_CDEFS_H_ was added, so the check shouldn't have been
added in the first place.
PR: 263102 (exp-run)
Reported by: brooks
Reviewed by: brooks, imp, emaste
Differential Revision: https://reviews.freebsd.org/D34796
This register set contains the values of the fsbase and gsbase
registers. Note that these registers can already be controlled
individually via ptrace(2) via MD operations, so the main reason for
adding this is to include these register values in core dumps. In
particular, this will enable looking up the value of TLS variables
from core dumps in gdb.
The value of NT_X86_SEGBASES was chosen to match the value of
NT_386_TLS on Linux. The notes serve similar purposes, but FreeBSD
will never dump a note equivalent to NT_386_TLS (which dumps a single
segment descriptor rather than a pair of addresses) and picking a
currently-unused value in the NT_X86_* range could result in a future
conflict.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D34650
Introduce a helper to fetch the TSC frequency from CPUID when running
under Xen.
Since the TSC can also be initialized early when running as a Xen
guest pull out the call to tsc_init() from the
early_clock_source_init() handlers and place it in clock_init(), as
otherwise all handlers would call tsc_init() anyway.
Reviewed by: markj
Sponsored by: Citrix Systems R&D
Differential revision: https://reviews.freebsd.org/D34581
This lets us use the TSC to implement early DELAY, limiting the use of
the sometimes-unreliable 8254 PIT.
PR: 262155
Reviewed by: emaste
Tested by: emaste, mike tancsa <mike@sentex.net>, Stefan Hegnauer <stefan.hegnauer@gmx.ch>
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34367
In a subsequent commit clock_init() will attempt to determine the TSC
frequency, and this requires that CPU identification is finalized.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Add a static assert for the siginfo_t, mcontext_t and ucontext_t
sizes. These are de-facto ABI options and cannot change size ever.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D34211
These headers originate with the Xen project and shouldn't be mixed with
the main portion of the FreeBSD kernel. Notably they shouldn't be the
target of clean-up commits.
Switch to use the headers in sys/contrib/xen.
Reviewed by: royger
Modules no longer call kernel functions for atomic ops, and since the
previous commit, we always use lock prefix.
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: jhb, markj
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D34153
Atomics have significant other use besides providing in-system
primitives for safe memory updates. They are used for implementing
communication with out of system software or hardware following some
protocols.
For instance, even UP kernel might require a protocol using atomics to
communicate with the software-emulated device on SMP hypervisor. Or
real hardware might need atomic accesses as part of the proper
management protocol.
Another point is that UP configurations on x86 are extinct, so slight
performance hit by unconditionally use proper atomics is not important.
It is compensated by less code clutter, which in fact improves the
UP/i386 lifetime expectations.
Requested by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: Elliott Mitchell, imp, jhb, markj, royger
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D34153
This adds the PT_GETREGSET and PT_SETREGSET ptrace types. These can be
used to access all the registers from a specified core dump note type.
The NT_PRSTATUS and NT_FPREGSET notes are initially supported. Other
machine-dependant types are expected to be added in the future.
The ptrace addr points to a struct iovec pointing at memory to hold the
registers along with its length. On success the length in the iovec is
updated to tell userspace the actual length the kernel wrote or, if the
base address is NULL, the length the kernel would have written.
Because the data field is an int the arguments are backwards when
compared to the Linux PTRACE_GETREGSET call.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19831
The approach taken by the stack gap implementation was to insert a
random gap between the top of the fixed stack mapping and the true top
of the main process stack. This approach was chosen so as to avoid
randomizing the previously fixed address of certain process metadata
stored at the top of the stack, but had some shortcomings. In
particular, mlockall(2) calls would wire the gap, bloating the process'
memory usage, and RLIMIT_STACK included the size of the gap so small
(< several MB) limits could not be used.
There is little value in storing each process' ps_strings at a fixed
location, as only very old programs hard-code this address; consumers
were converted decades ago to use a sysctl-based interface for this
purpose. Thus, this change re-implements stack address randomization by
simply breaking the convention of storing ps_strings at a fixed
location, and randomizing the location of the entire stack mapping.
This implementation is simpler and avoids the problems mentioned above,
while being unlikely to break compatibility anywhere the default ASLR
settings are used.
The kern.elfN.aslr.stack_gap sysctl is renamed to kern.elfN.aslr.stack,
and is re-enabled by default.
PR: 260303
Reviewed by: kib
Discussed with: emaste, mw
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704
Rather than fetching the ps_strings address directly from a process'
sysentvec, use this macro. With stack address randomization the
ps_strings address is no longer fixed.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704
The size of the ps_strings structure varies between ABIs, so this is
useful for computing the address of the ps_strings structure relative to
the top of the stack when stack address randomization is enabled.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704
Requested by: kib
Reviewed by: brooks, imp, kib
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33886
This matches the recent renaming of struct freebsd4_ucontext.
Reviewed by: brooks, imp, kib
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33880
Missed issues in truss on at least armv7 and powerpcspe need to be
resolved before recommit.
This reverts commit 3889fb8af0.
This reverts commit 1544e0f5d1.
Simplify control flow around handling of the execpath length and signal
trampoline. Cache the sysentvec pointer in a local variable.
No functional change intended.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33703
The introduction of <sched.h> improved compatibility with some 3rd
party software, but caused the configure scripts of some ports to
assume that they were run in a GLIBC compatible environment.
Parts of sched.h were made conditional on -D_WITH_CPU_SET_T being
added to ports, but there still were compatibility issues due to
invalid assumptions made in autoconfigure scripts.
The differences between the FreeBSD version of macros like CPU_AND,
CPU_OR, etc. and the GLIBC versions was in the number of arguments:
FreeBSD used a 2-address scheme (one source argument is also used as
the destination of the operation), while GLIBC uses a 3-adderess
scheme (2 source operands and a separately passed destination).
The GLIBC scheme provides a super-set of the functionality of the
FreeBSD macros, since it does not prevent passing the same variable
as source and destination arguments. In code that wanted to preserve
both source arguments, the FreeBSD macros required a temporary copy of
one of the source arguments.
This patch set allows to unconditionally provide functions and macros
expected by 3rd party software written for GLIBC based systems, but
breaks builds of externally maintained sources that use any of the
following macros: CPU_AND, CPU_ANDNOT, CPU_OR, CPU_XOR.
One contributed driver (contrib/ofed/libmlx5) has been patched to
support both the old and the new CPU_OR signatures. If this commit
is merged to -STABLE, the version test will have to be extended to
cover more ranges.
Ports that have added -D_WITH_CPU_SET_T to build on -CURRENT do
no longer require that option.
The FreeBSD version has been bumped to 1400046 to reflect this
incompatible change.
Reviewed by: kib
MFC after: 2 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D33451
When a DMA request using bounce pages completes, a swi is triggered to
schedule pending DMA requests using the just-freed bounce pages. For
a long time this bus_dma swi has been tied to a "virtual memory" swi
(swi_vm). However, all of the swi_vm implementations are the same and
consist of checking a flag (busdma_swi_pending) which is always true
and if set calling busdma_swi. I suspect this dates back to the
pre-SMPng days and that the intention was for swi_vm to serve as a
mux. However, in the current scheme there's no need for the mux.
Instead, remove swi_vm and vm_ih. Each bus_dma implementation that
uses bounce pages is responsible for creating its own swi (busdma_ih)
which it now schedules directly. This swi invokes busdma_swi directly
removing the need for busdma_swi_pending.
One consequence is that the swi now works on RISC-V which had previously
failed to invoke busdma_swi from swi_vm.
Reviewed by: imp, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33447
Notably, the current compat_options only makes sense for native and
freebsd32 ABIs. For the others, it just adds cruft. Switch to having
sets of compat options, and default to the native set. Setup the other
ABIs where it doesn't make sense to opt-out of the native set.
This removes some redundant COMPAT_FREEBSD* stuff from Linuxolator bits.
line_expr in makesyscalls.lua is fixed to allow empty strings to be
specified, since they're harmless.
Reviewed by: brooks, kib (both earlier version)
Differential Revision: https://reviews.freebsd.org/D33356
Suggested by: Michael Pratt <mpratt@google.com>
Reviewed by: Michael Pratt <mpratt@google.com>, emaste
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D33423
The header exports the following:
- Definition of struct tcb.
- Helpers to get/set the tcb for the current thread.
- TLS_TCB_SIZE (size of TCB)
- TLS_TCB_ALIGN (alignment of TCB)
- TLS_VARIANT_I or TLS_VARIANT_II
- TLS_DTV_OFFSET (bias of pointers in dtv[])
- TLS_TP_OFFSET (bias of "thread pointer" relative to TCB)
Note that TLS_TP_OFFSET does not account for if the unbiased thread
pointer points to the start of the TCB (arm and x86) or the end of the
TCB (MIPS, PowerPC, and RISC-V).
Note also that for amd64, the struct tcb does not include the unused
tcb_spare field included in the current structure in libthr. libthr
does not use this field, and the existing calls in libc and rtld that
allocate a TCB for amd64 assume it is the size of 3 Elf_Addr's (and
thus do not allocate room for tcb_spare).
A <sys/_tls_variant_i.h> header is used by architectures using
Variant I TLS which uses a common struct tcb.
Reviewed by: kib (older version of x86/tls.h), jrtc27
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33351
After a round of cleanups in late 2020, all definitions are
functionally identical.
This removes a rotted __aligned(8) on arm. It was added in
b7112ead32 and was intended to align the
args member so that 64-bit types (off_t, etc) could be safely read on
armeb compiled with clang. With the removal of armev, this is no
longer needed (armv7 requires that 32-bit aligned reads of 64-bit
values be supported and we enable such support on armv6). As further
evidence this is unnecessary, cleanups to struct syscall_args have
resulted in args being 32-bit aligned on 32-bit systems. The sole
effect is to bloat the struct by 4 bytes.
Reviewed by: kib, jhb, imp
Differential Revision: https://reviews.freebsd.org/D33308
The headers were mostly identical on amd64 and i386.
No functional change intended.
Reviewed by: cperciva, mav, imp, kib, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33205
It stopped being used in 3c256f5395, when trap() was reorganized to
have separate switch statements for user and kernel traps. Remove the
two leftover references and the flag itself.
Reviewed by: kib
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D33253
On this platform, the pcb and FPU save area are allocated from the top
of each kernel stack, so they should be excluded from the calculation of
the total and used stack sizes.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32581
The vga and splash devices are part of the sc(4) system console. vt(4)
uses the vt_vga driver instead, and has some limited splash support
directly in vt_core.c. Leave the sc(4) options in GENERIC/MINIMAL (for
now) but group them together under an sc(4) comment.
Sponsored by: The FreeBSD Foundation
Followup to b8cf1c5c30, remove from MINIMAL in addition to GENERIC.
options VESA / vesa.ko provides VESA Bios Extensions (VBE) support for
the legacy sc(4) console. It is not used by the default console, vt(4).
PR: 253733
Fixes: b8cf1c5c30 ("Remove options VESA from x86 GENERIC")
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
options VESA / vesa.ko provides VESA Bios Extensions (VBE) support for
the legacy sc(4) console. It is not used by the default console, vt(4).
There is a report[1] of an incompatibility between VESA and the Nvidia
driver breaking suspend/resume. Since VESA is not used by the default
configuration anyway, just remove options VESA from GENERIC. The kernel
module is still available and may be loaded by sc(4) users who want to
select a VBE mode.
(Note that vt(4) does not support selecting a VBE mode. The loader can
set a VBE mode and vt(4) will use it via the vt_vbefb driver.)
[1] https://lists.freebsd.org/archives/freebsd-hackers/2021-November/000469.html
PR: 253733
Reported by: Stefan Blachmann [1]
Reviewed by: imp, manu, tsoome
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33141
Belatedly remove twa(4). It was supposed to go before 13.0, but was
overlooked.
Sponsored by: Netflix
Relnotes: yes
Reviewed by: scottl
Differential Revision: https://reviews.freebsd.org/D33114
Belatedly remove esp(4). It was tagged as gone in 13, but was overlooked
until now.
Sponsored by: Netflix
Reviewed by: scottl
Differential Revision: https://reviews.freebsd.org/D33115
Belatedly remove amr(4). It was slated to depart before 13.0 but was
overlooked until now.
Sponsored by: Netflix
Relnotes: yes
Reviewed by: scottl
Differential Revision: https://reviews.freebsd.org/D33113
Belatedly remove iir(4). It was slated to go before 13, but was
overlooked.
Sponsored by: Netflix
Relnotes: yes
Reviewed by: scottl
Differential Revision: https://reviews.freebsd.org/D33112
We'd said this was going away in 13, but was overlooked. Belatedly
remove.
Sponsored by: Netflix
Relnotes: yes
Reviewed by: scottl
Differential Revision: https://reviews.freebsd.org/D33111
in_cksum() and related routines are implemented separately for each
platform, but only i386 and arm have optimized versions. Other
platforms' copies of in_cksum.c are identical except for style
differences and support for big-endian CPUs.
Deduplicate the implementations for the rest of the platforms. This
will make it easier to implement in_cksum() for unmapped mbufs. On arm
and i386, define HAVE_MD_IN_CKSUM to mean that the MI implementation is
not to be compiled.
No functional change intended.
Reviewed by: kp, glebius
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33095
It was never implemented on powerpc or riscv and appears to have been
unused since it was added in 1998. No functional change intended.
Reviewed by: kp, glebius, cy
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33093
When constructing the set of dumpable pages, use the bitset provided by
the state argument, rather than assuming vm_page_dump invariably. For
normal kernel minidumps this will be a pointer to vm_page_dump, but when
dumping the live system it will not.
To do this, the functions in vm_dumpset.h are extended to accept the
desired bitset as an argument. Note that this provided bitset is assumed
to be derived from vm_page_dump, and therefore has the same size.
Reviewed by: kib, markj, jhb
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31992
Don't assume we are dumping the global message buffer, but use the one
provided by the state argument. While here, drop superfluous
cast to char *.
Reviewed by: markj, jhb
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31991
During a live dump, we may race with updates to the kernel page tables.
This is generally okay; we accept that the state of the system while
dumping may be somewhat inconsistent with its state when the dump was
invoked. However, when walking the kernel page tables, it is important
that we load each PDE/PTE only once while operating on it. Otherwise, it
is possible to have the relevant PTE change underneath us. For example,
after checking the valid bit, but before reading the physical address.
Convert the loads to atomics, and add some validation around the
physical addresses, to ensure that we do not try to dump a non-existent
or non-canonical physical address.
Similarly, don't read kernel_vm_end more than once, on the off chance
that pmap_growkernel() is called between the two page table walks.
Reviewed by: kib, markj
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31990
The minidump code is written assuming that certain global state will not
change, and rightly so, since it executes from a kernel debugger
context. In order to support taking minidumps of a live system, we
should allow copies of relevant global state that is likely to change to
be passed as parameters to the minidumpsys() function.
This patch does the work of parameterizing this function, by adding a
struct minidumpstate argument. For now, this struct allows for copies of
the kernel message buffer, and the bitset that tracks which pages should
be dumped (vm_page_dump). Follow-up changes will actually make use of
these arguments.
Notably, dump_avail[] does not need a snapshot, since it is not expected
to change after system initialization.
The existing minidumpsys() definitions are renamed, and a thin MI
wrapper is added to kern_dump.c, which handles the construction of
the state struct. Thus, calling minidumpsys() remains as simple as
before.
Reviewed by: kib, markj, jhb
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31989
Define CC_NEWRENO in all the appropriate DEFAULTS and std.* config
files. It's the default congestion control algorithm. Add code to cc.c
so that CC_DEFAULT is "newreno" if it's not overriden in the config
file.
Sponsored by: Netflix
Fixes: b8d60729de ("tcp: Congestion control cleanup.")
Revired by: manu, hselasky, jhb, glebius, tuexen
Differential Revision: https://reviews.freebsd.org/D32964
The MINIMAL configs were overlooked. They are compiled as part of
universe, so this broke universe builds. Add the same defafults as for
GENERIC.
Sponsored by: Netflix
NOTE: HEADS UP read the note below if your kernel config is not including GENERIC!!
This patch does a bit of cleanup on TCP congestion control modules. There were some rather
interesting surprises that one could get i.e. where you use a socket option to change
from one CC (say cc_cubic) to another CC (say cc_vegas) and you could in theory get
a memory failure and end up on cc_newreno. This is not what one would expect. The
new code fixes this by requiring a cc_data_sz() function so we can malloc with M_WAITOK
and pass in to the init function preallocated memory. The CC init is expected in this
case *not* to fail but if it does and a module does break the
"no fail with memory given" contract we do fall back to the CC that was in place at the time.
This also fixes up a set of common newreno utilities that can be shared amongst other
CC modules instead of the other CC modules reaching into newreno and executing
what they think is a "common and understood" function. Lets put these functions in
cc.c and that way we have a common place that is easily findable by future developers or
bug fixers. This also allows newreno to evolve and grow support for its features i.e. ABE
and HYSTART++ without having to dance through hoops for other CC modules, instead
both newreno and the other modules just call into the common functions if they desire
that behavior or roll there own if that makes more sense.
Note: This commit changes the kernel configuration!! If you are not using GENERIC in
some form you must add a CC module option (one of CC_NEWRENO, CC_VEGAS, CC_CUBIC,
CC_CDG, CC_CHD, CC_DCTCP, CC_HTCP, CC_HD). You can have more than one defined
as well if you desire. Note that if you create a kernel configuration that does not
define a congestion control module and includes INET or INET6 the kernel compile will
break. Also you need to define a default, generic adds 'options CC_DEFAULT=\"newreno\"
but you can specify any string that represents the name of the CC module (same names
that show up in the CC module list under net.inet.tcp.cc). If you fail to add the
options CC_DEFAULT in your kernel configuration the kernel build will also break.
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
RELNOTES:YES
Differential Revision: https://reviews.freebsd.org/D32693
Change the 'period' argument to 'duration' and change its type to
sbintime_t so we can more easily express different durations.
Reviewed by: tsoome, glebius
Differential Revision: https://reviews.freebsd.org/D32619
In preparation for machine-independent sys/compat/linux/linux_ptrace.c,
rename the i386-specific Linux ptrace(2) implementation. No functional
changes.
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D32757
The last two drivers that required sppp are cp(4) and ce(4).
These devices are still produced and can be purchased
at Cronyx <http://cronyx.ru/hardware/wan.html>.
Since Roman Kurakin <rik@FreeBSD.org> has quit them, they no
longer support FreeBSD officially. Later they have dropped
support for Linux drivers to. As of mid-2020 they don't even
have a developer to maintain their Windows driver. However,
their support verbally told me that they could provide aid to
a FreeBSD developer with documentaion in case if there appears
a new customer for their devices.
These drivers have a feature to not use sppp(4) and create an
interface, but instead expose the device as netgraph(4) node.
Then, you can attach ng_ppp(4) with help of ports/net/mpd5 on
top of the node and get your synchronous PPP. Alternatively
you can attach ng_frame_relay(4) or ng_cisco(4) for HDLC.
Actually, last time I used cp(4) back in 2004, using netgraph(4)
instead of sppp(4) was already the right way to do.
Thus, remove the sppp(4) related part of the drivers and enable
by default the negraph(4) part. Further maintenance of these
drivers in the tree shouldn't be a big deal.
While doing that, remove some cruft and enable cp(4) compilation
on amd64. The ce(4) for some unknown reason marks its internal
DDK functions with __attribute__ fastcall, which most likely is
safe to remove, but without hardware I'm not going to do that, so
ce(4) remains i386-only.
Reviewed by: emaste, imp, donner
Differential Revision: https://reviews.freebsd.org/D32590
See also: https://reviews.freebsd.org/D23928
Remove page zeroing code from consumers and stop specifying
VM_ALLOC_NOOBJ. In a few places, also convert an allocation loop to
simply use VM_ALLOC_WAITOK.
Similarly, convert vm_page_alloc_domain() callers.
Note that callers are now responsible for assigning the pindex.
Reviewed by: alc, hselasky, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31986
This implementation is faster and doesn't modify the cpuset, so it lets
us avoid some unnecessary copying as well. No functional change
intended.
This is a re-application of commit
9068f6ea69.
Reviewed by: cem, kib, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32029
also move ptrace-related helpers to ptrace_machdep.c
Apply some style. Use ANSI C function definitions.
Remove MPSAFE annotations.
Reviewed by: emaste, imp
Discussed with: jrtc27
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32310
The implementation of the progress bar is simple, but duplicated for
most minidump implementations. Extract the common bits to kern_dump.c.
Ensure that the bar is reset with each subsequent dump; this was only
done on some platforms previously.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31885
The function is identical in each minidump implementation, so move it to
vm_phys.c. The only slight exception is powerpc where the function was
public, for use in moea64_scan_pmap().
Reviewed by: kib, markj, imp (earlier version)
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31884
We don't need to assert we have Giant here. All machines that require
the F00F workaround are UP and interrupts are disabled. Since we are
single threaded, it's safe to allocate the IDT area with pmap_trm_alloc,
interact with the current idt table and replace the IDT table without
any Giant locking.
Sponsored by: Netflix
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D31839
According to https://github.com/NuxiNL/cloudlibc:
CloudABI is no longer being maintained. It was an awesome experiment,
but it never got enough traction to be sustainable.
There is no reason to keep it in FreeBSD.
Approved by: ed (private mail)
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D31923
This implementation is faster and doesn't modify the cpuset, so it lets
us avoid some unnecessary copying as well. No functional change
intended.
Reviewed by: cem, kib, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32029
The file as is is the maze of #ifdef passages, all slightly different.
Divorcing i386 and amd64 version actually makes changing the code
easier, also no changes for i386 are planned.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31931
- Re-implement pcib interface to use standard pci bus driver on top of
vmd(4) instead of custom one.
- Re-implement memory/bus resource allocation to properly handle even
complicated configurations.
- Re-implement interrupt handling to evenly distribute children's MSI/
MSI-X interrupts between available vmd(4) MSI-X vectors and setup them
to be handled by standard OS mechanisms with minimal overhead, except
sharing when unavoidable.
Successfully tested on Dell XPS 13 laptop with Core i7-1185G7 CPU (VMD
device ID 0x9a0b) and single NVMe SSD, dual-booting with Windows 10.
Successfully tested on Supermicro X11DPI-NT motherboard with Xeon(R)
Gold 6242R CPUs (VMD device ID 0x201d), simultaneously handling NVMe
SSD on one PCIe port and PLX bridge with 3 NVMe and 1 AHCI SSDs on
another. Handles SSD hot-plug (except Optane 905p for some reason,
which are not detected until manual bus rescan) and enabled IOMMU
(directly connected SSDs work, but ones connected to the PLX fail
without errors from IOMMU).
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential revision: https://reviews.freebsd.org/D31762
Move the common kernel function signatures from machine/reg.h to a new
sys/reg.h. This is in preperation for adding PT_GETREGSET to ptrace(2).
Reviewed by: imp, markj
Sponsored by: DARPA, AFRL (original work)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19830
Add support for the KVM paravirtual clock device.
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D29733
Add a variant of 'rdtsc()' that performs the ordered version of 'rdtsc'
appropriate for the invoking x86 variant.
Also, expose the 'lfence'-ed and 'mfence'-ed 'rdtsc()' variants needed
by 'rdtsc_ordered()' for general use.
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31416
Add a variant of 'rdtscp()' that retains and returns the 'IA32_TSC_AUX'
value read by 'rdtscp'.
Sponsored By: Juniper Networks, Inc.
Sponsored By: Klara, Inc.
Reviewed by: markj, kib
Differential Revision: https://reviews.freebsd.org/D31415
In preparation for clone3 system call add struct clone_args and use it in
clone implementation.
Move all of clone related bits to the newly created linux_fork.h header.
Differential revision: https://reviews.freebsd.org/D31474
MFC after: 2 weeks
At least Linux x86 ABI's does not use carry bit and expects that the dx register
is preserved. For this add a new sv_set_fork_retval hook and call it from cpu_fork().
Add a short comment about touching dx in x86_set_fork_retval(), for more details
see phab comments from kib@ and imp@.
Reviewed by: kib
Differential revision: https://reviews.freebsd.org/D31472
MFC after: 2 weeks
These ones were unambiguous cases where the Foundation was the only
listed copyright holder (in the associated license block).
Sponsored by: The FreeBSD Foundation
On amd64 XENHVM depends on the xentimer device for PVH early startup,
so both should be added or removed together (like the current
dependency with xenpci). Fix this by adding xentimer to NOTES and
updating the comments on the config files. Note that on i386 there's
no such dependency between xentimer and XENHVM, since there's no PVH
support.
While there also fix the MINIMAL i386 build to include the xentimer,
so it keeps the same functionality as before xentimer was split from
XENHVM.
Reported by: lwhsu
PR: 257549
Fixes: ae59812748 ('xen/timer: make xen timer optional')
which is the place to put MD asserts about allocated pages.
On amd64, verify that allocated page does not belong to the kernel
(text, data) or early allocated pages.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31121
There is no reason now why do we need to allocate trampoline page very
early in the boot process. The only requirement for the page is that
it is below 1M to be usable by the real mode during init. This can be
handled by vm_alloc_contig() when we do the startup.
Also assert that startup trampoline fits into single page. In principle
we can do multi-page allocation if needed, but it is not.
Move the alloc_ap_trampoline() function and the boot_address variable to
i386/mp_machdep.c. Keep existing mechanism of early alloc on i386.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D31343
gcc failed as it didn't inlined the builtins and generates calls to
the libgcc, ld can't find libgcc as cross-toolchain libgcc is not installed.
To avoid this add internal vDSO ffs functions without optimized builtins.
Reported by: jhb
MFC after: 2 weeks
The timer is not used on ARM.
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D29041
Initial patch from submitter was adapted by me to prevent unconditional
FUTEX_REQUEUE use.
PR: 255947
Submitted by: Philippe Michaud-Boudreault
Differential Revision: https://reviews.freebsd.org/D30332
The vDSO initialisation order should be as follows:
- native abi init via exec_sysvec_init();
- vDSO symbols queued to the linux_vdso_syms list;
- linux_vdso_install();
- linux_exec_sysvec_init();
As the exec_sysvec_init() called with SI_ORDER_ANY (last) at SI_SUB_EXEC
order, move linux_vdso_install() and linux_exec_sysvec_init() to the
SI_SUB_EXEC+1 order.
Reviewed by: trasz
Differential Revision: https://reviews.freebsd.org/D30902
MFC after 2 weeks
In order to reduce diff between arches constify vdso install/deinstall
functions like arm64.
Reviewed by: emaste
Differential revision: https://reviews.freebsd.org/D30901
MFC after: 2 weeks
The vDSO (virtual dynamic shared object) is a small shared library that the
kernel maps R/O into the address space of all Linux processes on image
activation. The vDSO is a fully formed ELF image, shared by all processes
with the same ABI, has no process private data.
The primary purpose of the vDSO:
- non-executable stack, signal trampolines not copied to the stack;
- signal trampolines unwind, mandatory for the NPTL;
- to avoid contex-switch overhead frequently used system calls can be
implemented in the vDSO: for now gettimeofday, clock_gettime.
The first two have been implemented, so add the implementation of system
calls.
System calls implemenation based on a native timekeeping code with some
limitations:
- ifunc can't be used, as vDSO r/o mapped to the process VA and rtld
can't relocate symbols;
- reading HPET memory is not implemented for now (TODO).
In case on any error vDSO system calls fallback to the kernel system
calls. For unimplemented vDSO system calls added prototypes which call
corresponding kernel system call.
Tested by: trasz (arm64)
Differential revision: https://reviews.freebsd.org/D30900
MFC after: 2 weeks
Temporary add stubs to the Linux emulation layer which calls the existing hook.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D30911
MFC after: 2 weeks
The underlying types for both are the same so arguably this doesn't
really matter, but using the wrong type is still confusing and
technically incorrect.
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned. This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.
This reapplies 3a522ba1bc with a fix for
the static assertion failure on i386.
Approved by: markj (mentor)
Reviewed by: kib, bcr (manpages)
Differential Revision: https://reviews.freebsd.org/D29185
amd64 and 32-bit ARM already had assertions to this effect. Add them to
other pmaps.
Reviewed by: alc, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31171
This controller supports 2.5G/1G/100MB/10MB speeds, and allows
tx/rx checksum offload, TSO, LRO, and multi-queue operation.
The driver was derived from code contributed by Intel, and modified
by Netgate to fit into the iflib framework.
Thanks to Mike Karels for testing and feedback on the driver.
Reviewed by: bcr (manpages), kbowling, scottl, erj
MFC after: 1 month
Relnotes: yes
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D30668
The syscall number is stored in the same register as the syscall return
on amd64 (and possibly other architectures) and so it is impossible to
recover in the signal handler after the call has returned. This small
tweak delivers it in the `si_value` field of the signal, which is
sufficient to catch capability violations and emulate them with a call
to a more-privileged process in the signal handler.
Approved by: markj (mentor)
Reviewed by: kib, bcr (manpages)
Differential Revision: https://reviews.freebsd.org/D29185
Use sysentvec hooks to only call umtx_thread_exit/umtx_exec, which handle
robust mutexes, for native FreeBSD ABI. Similarly, there is no sense
in calling sigfastblock_clear() for non-native ABIs.
Requested by: dchagin
Reviewed by: dchagin, markj (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D30987
This adds `sv_elf_core_osabi`, `sv_elf_core_abi_vendor`,
and `sv_elf_core_prepare_notes` fields to `struct sysentvec`,
and modifies imgact_elf.c to make use of them instead
of hardcoding FreeBSD-specific values. It also updates all
of the ABI definitions to preserve current behaviour.
This makes it possible to implement non-native ELF coredump
support without unnecessary code duplication. It will be used
for Linux coredumps.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30921
In Linux vDSO is a small shared ELF library, so it is not intended
for aout binaries. This was added on 64-bit Linuxulator import by mistake.
MFC after: 2 weeks
Assuming we can't run on i486, i586 class cpu, retire linux_kplatform var
and use hardcoded 'machine' value in linux_newuname().
I have added linux_kplatform for consistency with linux_platform which is
placed in to vdso to avoid excess copyout it on stack for AT_PLATFORM at
exec time.
This is the first stage of Linuxulator's vdso revision.
Reviewed by: trasz, imp
Differential Revision: https://reviews.freebsd.org/D30774
MFC after: 2 weeks
Stop confusing people, retire COMPAT_LINUX and COMPAT_LINUX32 kernel
build options. Since we have 32 and 64 bit Linux emulators, we can't build both
emulators together into the kernel. I don't think it matters, Linux emulation
depends on loadable modules (via rc).
Cut LINPROCFS and LINSYSFS for consistency.
PR: 215061
Reviewed by: bcr (manpages), trasz
Differential Revision: https://reviews.freebsd.org/D30751
MFC after: 2 weeks
For now the Linux emulation layer uses in kernel ppoll(2) without
conversion of user supplied fd 'events', and does not convert the
kernel supplied fd 'revents'.
At least POLLRDHUP is handled by FreeBSD differently than by
Linux. Seems that Linux silencly ignores POLLRDHUP on non socket fd's
unlike FreeBSD, which does more strictly check and fails.
Rework the Linux ppoll, using kern_poll and converting 'events'
and 'revents' values.
While here, move poll events defines to the MI part of code as they
mostly identical on all arches except arm.
Differential Revision: https://reviews.freebsd.org/D30716
MFC after: 2 weeks
Require queueing of the signals with default action, and disable
dequeueing SIGCHLD on wait for live process.
Reported and tested by: dchagin
Reviewed by: dchagin, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30675
The vmbus ISR needs to live in a trampoline. Dynamically allocating a
trampoline at driver initialization time poses some difficulties due to
the fact that the KENTER macro assumes that the offset relative to
tramp_idleptd is fixed at static link time. Another problem is that
native_lapic_ipi_alloc() uses setidt(), which assumes a fixed trampoline
offset.
Rather than fight this, move the Hyper-V ISR to i386/exception.s. Add a
new HYPERV kernel option to make this optional, and configure it by
default on i386. This is sufficient to make use of vmbus(4) after the
4/4 split. Note that vmbus cannot be loaded dynamically and both the
HYPERV option and device must be configured together. I think this is
not too onerous a requirement, since vmbus(4) was previously
non-functional.
Reported by: Harry Schmalzbauer <freebsd@omnilan.de>
Tested by: Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by: whu, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30577
Retire ksiginfo_to_lsiginfo function, use siginfo_to_lsiginfo instead.
Convert rt_sigtimedwait siginfo variables to well known names.
MFC after: 2 weeks
setidt_disp is the offset of the ISR trampoline relative to the address
of the routines in exception.s, so uintptr_t is not quite right.
Also remove a bogus declaration I added in commit 18f55c67f7, it is not
required after all.
Reported by: jrtc27
Reviewed by: jrtc27, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30590
Make it under SI_SUB_CPU sysinit, instead of much later SI_SUB_DRIVERS.
The SI_SUB_DRIVERS survived from times when FPU used real ISA attachment,
now it is only pnp stub claiming id.
PR: 255997
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30512
While here, fix all links to older en_US.ISO8859-1 documentation
in the src/ tree.
PR: 255026
Reported by: Michael Büker <freebsd@michael-bueker.de>
Reviewed by: dbaio
Approved by: blackend (mentor), re (gjb)
MFC after: 10 days
Differential Revision: https://reviews.freebsd.org/D30265
PVHv1 was officially removed from Xen in 4.9, so just axe the related
code from FreeBSD.
Note FreeBSD supports PVHv2, which is the replacement for PVHv1.
Sponsored by: Citrix Systems R&D
Reviewed by: kib, Elliott Mitchell
Differential Revision: https://reviews.freebsd.org/D30228