Commit Graph

9037 Commits

Author SHA1 Message Date
Mihai Burcea
9aa02d5120 vmm: Fix snapshots for AMD CPUs
This patch fixes the AMD implementation for snapshotting.  It removes
unnecessary vmcb fields that should not be saved and duplicates.

Reviewed by:	jhb
Differential Revision: https://reviews.freebsd.org/D33431
2022-06-30 16:13:55 -07:00
Vitaliy Gusev
5afcca138f vmm: Cherry pick illumos commit '13361 bhyve should mask RDT cpuid info'
Summary:
    commit  1a5f1879be09d3de900b2510692dd12003784d84
    Author: Patrick Mooney <pmooney@pfmooney.com>
    Date:   2020-12-16T20:02:23.000Z

        13361 bhyve should mask RDT cpuid info
        Reviewed by: Andy Fiddaman <andy@omnios.org>
        Reviewed by: Toomas Soome <tsoome@me.com>
        Approved by: Robert Mustacchi <rm@fingolfin.org>

    https://github.com/illumos/illumos-gate/commit/1a5f1879be09d3de900b2510692dd12003784d8

----

We saw similar warning of GP (on Intel Xeon CPU E5-2630 v4 and VM with Ubuntu 20.04 5.4.0-113-generic)  until this commit is applied:

```
[    1.658880] kernel: unchecked MSR access error: WRMSR to 0xc8f (tried to write 0x0000000000000000) at rIP: 0xffffffffacc735b4 (native_write_msr+0x4/0x30)
[    1.662734] kernel: Call Trace:
[    1.663885] kernel:  ? clear_closid_rmid.isra.0+0x36/0x40
[    1.665501] kernel:  resctrl_online_cpu+0xdc/0x3f0
[    1.666952] kernel:  ? __switch_to_asm+0x40/0x70
[    1.668358] kernel:  ? __switch_to+0x7f/0x480
[    1.669693] kernel:  ? cat_wrmsr+0x70/0x70
[    1.670970] kernel:  cpuhp_invoke_callback+0x9b/0x580
[    1.672541] kernel:  ? __schedule+0x2eb/0x740
[    1.673893] kernel:  cpuhp_thread_fun+0xb8/0x120
[    1.675304] kernel:  smpboot_thread_fn+0xd0/0x170
[    1.676685] kernel:  kthread+0x104/0x140
[    1.677948] kernel:  ? sort_range+0x30/0x30
[    1.679299] kernel:  ? kthread_park+0x90/0x90
[    1.680570] kernel:  ret_from_fork+0x35/0x40
[    1.682000] kernel: *** VALIDATE rdt ***
[    1.683454] kernel: resctrl: L3 monitoring detected
```

Reviewed by:	markj, jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D35442
2022-06-30 10:27:27 -07:00
Dmitry Chagin
050f5a8405 amd64: Reload CPU ext features after resume or cr4 changes
Reviewed by:		kib
Differential revision:	https://reviews.freebsd.org/D35555
MFC after:		2 weeks
2022-06-29 10:34:43 +03:00
Roger Pau Monné
881c145431 elfnote: place note in a PT_NOTE program header
Some tools (firecraker loader) only check for notes in PT_NOTE program
headers, so make sure the notes added using the ELFNOTE macro end up
in such header.

Output from readelf -Wl for and amd64 kernel after the change:

Elf file type is EXEC (Executable file)
Entry point 0xffffffff8038a000
There are 11 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0xffffffff80200040 0x0000000000200040 0x000268 0x000268 R   0x8
  INTERP         0x0002a8 0xffffffff802002a8 0x00000000002002a8 0x00000d 0x00000d R   0x1
      [Requesting program interpreter: /red/herring]
  LOAD           0x000000 0xffffffff80200000 0x0000000000200000 0x189e28 0x189e28 R   0x200000
  LOAD           0x18a000 0xffffffff8038a000 0x000000000038a000 0xe447e8 0xe447e8 R E 0x200000
  LOAD           0xfce7f0 0xffffffff811ce7f0 0x00000000011ce7f0 0x6b955c 0x6b955c R   0x200000
  LOAD           0x1800000 0xffffffff81a00000 0x0000000001a00000 0x000140 0x000140 RW  0x200000
  LOAD           0x1801000 0xffffffff81a01000 0x0000000001a01000 0x1c8480 0x5ff000 RW  0x200000
  DYNAMIC        0x1800000 0xffffffff81a00000 0x0000000001a00000 0x000140 0x000140 RW  0x8
  GNU_RELRO      0x1800000 0xffffffff81a00000 0x0000000001a00000 0x000140 0x000140 R   0x1
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
  NOTE           0x1687ae0 0xffffffff81887ae0 0x0000000001887ae0 0x0001c0 0x0001c0 R   0x4

 Section to Segment mapping:
  Segment Sections...
[...]
   10     .note.gnu.build-id .note.Xen

Reported by: cperciva
Fixes: 1a9cdd373a ('xen: add PV/PVH kernel entry point')
Fixes: 93ee134a24 ('Integrate support for xen in to i386 common code.')
Sponsored by: Citrix Systems R&D
Reviewed by: emaste
Differential revision: https://reviews.freebsd.org/D35611
2022-06-28 09:51:57 +02:00
Dmitry Chagin
d416ee86c7 linux(4): To reuse MD linux.h hide kernel dependencies unde _KERNEL constraint
MFC after:		2 weeks
2022-06-22 14:28:24 +03:00
Alexander Motin
6210ac95a1 amd64: Stop using REP MOVSB for backward memmove()s.
Enhanced REP MOVSB feature of CPUs starting from Ivy Bridge makes
REP MOVSB the fastest way to copy memory in most of cases. However
Intel Optimization Reference Manual says: "setting the DF to force
REP MOVSB to copy bytes from high towards low addresses will expe-
rience significant performance degradation". Measurements on Intel
Cascade Lake and Alder Lake, same as on AMD Zen3 show that it can
drop throughput to as low as 2.5-3.5GB/s, comparing to ~10-30GB/s
of REP MOVSQ or hand-rolled loop, used for non-ERMS CPUs.

This patch keeps ERMS use for forward ordered memory copies, but
removes it for backward overlapped moves where it does not work.

Reviewed by:	mjg
MFC after:	2 weeks
2022-06-16 13:46:34 -04:00
Mark Johnston
756bc3adc5 kasan: Create a shadow for the bootstack prior to hammer_time()
When the kernel is compiled with -asan-stack=true, the address sanitizer
will emit inline accesses to the shadow map.  In other words, some
shadow map accesses are not intercepted by the KASAN runtime, so they
cannot be disabled even if the runtime is not yet initialized by
kasan_init() at the end of hammer_time().

This went unnoticed because the loader will initialize all PML4 entries
of the bootstrap page table to point to the same PDP page, so early
shadow map accesses do not raise a page fault, though they are silently
corrupting memory.  In fact, when the loader does not copy the staging
area, we do get a page fault since in that case only the first and last
PML4Es are populated by the loader.  But due to another bug, the loader
always treated KASAN kernels as non-relocatable and thus always copied
the staging area.

It is not really practical to annotate hammer_time() and all callees
with __nosanitizeaddress, so instead add some early initialization which
creates a shadow for the boot stack used by hammer_time().  This is only
needed by KASAN, not by KMSAN, but the shared pmap code handles both.

Reported by:	mhorne
Reviewed by:	kib
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35449
2022-06-15 11:39:10 -04:00
Mark Johnston
c6d092b510 pmap: Keep PTI page table pages busy
PTI page table pages are allocated from a VM object, so must be
exclusively busied when they are freed, e.g., when a thread loses a race
in pmap_pti_pde().  Simply keep PTPs busy at all times, as was done for
some other kernel allocators in commit
e9ceb9dd11.

Also remove some redundant assertions on "ref_count":
vm_page_unwire_noq() already asserts that the page's reference count is
greater than zero.

Reported by:	syzkaller
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35466
2022-06-15 11:38:04 -04:00
Brooks Davis
5ea3094e6a amd64: -m32 support for machine/md_var.h
Install the i386 md_var.h under /usr/include/i386 on amd64 and include
when targeting i386.

This is a mostly kernel-only header required by procstat's ZFS support.
It is pulled in by the i386 machine/counter.h.

Reviewed by:	jhb, imp
2022-06-13 18:35:40 +01:00
Brooks Davis
8dc3fdfe69 amd64: -m32 support for machine/counter.h
Install the i386 counter.h under /usr/include/i386 on amd64 and include
when targeting i386.

This is a kernel-only header required by procstat's ZFS support.

Reviewed by:	jhb, imp
2022-06-13 18:35:40 +01:00
Brooks Davis
9f7588dd93 amd64: -m32 support for machine/pcpu_aux.h
Install the i386 pcpu_aux.h under /usr/include/i386 on amd64 and include
when targeting i386.

This is a kernel-only header that is required by procstat's ZFS support.

Reviewed by:	jhb, imp
2022-06-13 18:35:40 +01:00
Brooks Davis
f6fada5eed amd64: -m32 support for machine/pcpu.h
Install the i386 pcpu.h under /usr/include/i386 on amd64 and include
when targeting i386.

This is a kernel-only header and should not be required, but
procstat's zfs support includes this with _KERNEL defined.

Reviewed by:	jhb, imp
2022-06-13 18:35:40 +01:00
Brooks Davis
a29263b656 amd64: -m32 support for machine/sb_buf.h
The contents of the amd64 version are kernel-only and incompatible with
other headers when compiled for i386 userspace with _KERNEL defined.
Just ifdef the whole file out in that case rather than giving this file
the full x86 treatment since it's not needed for current use cases
(procstat zfs support).

Reviewed by:	jhb, imp
2022-06-13 18:35:39 +01:00
Brooks Davis
a69511db18 amd64: -m32 support for machine/vmparam.h
Install the i386 vmparam.h under /usr/include/i386 on amd64 and include
when targeting i386.

Reviewed by:	jhb, imp
2022-06-13 18:35:39 +01:00
Brooks Davis
b3120c0aeb amd64: -m32 support for machine/proc.h
Install the i386 proc.h under /usr/include/i386 on amd64 and include
when targeting i386.

Reviewed by:	jhb, imp
2022-06-13 18:35:39 +01:00
Brooks Davis
3cd1b382c6 amd64: -m32 support for machine/pmap.h
Install the i386 pmap.h under /usr/include/i386 on amd64 and include
when targeting i386.

Reviewed by:	jhb, imp
2022-06-13 18:35:39 +01:00
Brooks Davis
c2c8157ebe amd64: -m32 support for machine/segments.h
Install the i386 segments.h under /usr/include/i386 on amd64 and include
when targeting i386.

Reviewed by:	jhb, imp
2022-06-13 18:35:39 +01:00
Brooks Davis
08f16287a5 amd64: -m32 support for machine/atomic.h
Install the i386 atomic.h under /usr/include/i386 on amd64 and include
when targeting i386.

Reviewed by:	jhb, imp
2022-06-13 18:35:39 +01:00
Brooks Davis
92a98611ca amd64: -m32 support for machine/asm(macros).h
Install the i386 versions under /usr/include/i386 on amd64 and include
when targeting i386.

Reviewed by:	jhb, imp
2022-06-13 18:35:39 +01:00
Brooks Davis
68049f6da8 amd64: -m32 support for machine/profile.h
Install the i386 profile.h under /usr/include/i386 on amd64 and include
when targeting i386.

Reviewed by:	jhb, imp
2022-06-13 18:35:39 +01:00
Brooks Davis
cca1927261 amd64: -m32 support for machine/cpufunc.h
Install the i386 cpufunc.h under /usr/include/i386 on amd64 and include
when targeting i386.

Reviewed by:	jhb, imp
2022-06-13 18:35:38 +01:00
Brooks Davis
24983043bf x86: cleanup in machine/cpufunc.h
Reduce diffs between amd64 and i386 and improve whitespace

Reviewed by:	jhb, imp
2022-06-13 18:35:38 +01:00
Hans Petter Selasky
9fd0d9b16e ktls: Remove the KERN_TLS option from the i386 and amd64 LINT-NOIP kernel configurations.
Kernel TLS depends on INET or INET6 being enabled.

Reported by:	bz@
MFC after:	1 week
Sponsored by:	NVIDIA Networking
2022-06-11 21:31:28 +02:00
Vitaliy Gusev
e7d34aeda4 vmm: move bumping VMEXIT_USERSPACE stat to the right place
Statistic for "number of vm exits handled in userspace" should be
increased in vm_run() instead of vmx_run() because in some cases
vm_run() doesn't exit to userspace and keeps entering the guest.

Also svm_run's implementation even wrongly misses that stat.

Reviewed by:	markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D35350
2022-06-09 08:57:25 -04:00
Gordon Bergling
faff37be46 fpu: Fix a typo in a source code comment
- s/choise/choice/

Obtained from:	NetBSD
MFC after:	3 days
2022-06-04 13:15:53 +02:00
Dmitry Chagin
4a6c2d075d linux(4): Properly restore the thread signal mask after signal delivery on i386
Replace sigframe sf_extramask by native sigset_t and use it to
store/restore the thread signal mask without conversion to/from
Linux signal mask.

Pointy hat to:		dchagin
MFC after:		2 weeks
2022-05-30 20:03:49 +03:00
Dmitry Chagin
109fd18ad9 linux(4): Properly build argument list for the signal handler
Provide arguments 2 and 3 if signal handler installed with SA_SIGINFO.

MFC after:		2 weeks
2022-05-30 19:53:12 +03:00
Dmitry Chagin
c30a767c6f linux(4): Microoptimize rt_sendsig(), convert signal mask once
On amd64 Linux saves the thread signal mask in both contexts, in the machine
dependent and in the machine independent. Both contexts are user accessible.
Convert the mask once, then copy it.

MFC after:		2 weeks
2022-05-30 19:49:45 +03:00
Dmitry Chagin
2ab9b59faa linux(4): Avoid direct manipulation of td_sigmask
Use kern_sigprocmask() instead of direct manipulation of td_sigmask
to reschedule newly blocked signals.

MFC after:		2 weeks
2022-05-30 19:48:20 +03:00
Dmitry Chagin
2ca34847e7 linux(4): Reduce duplication between MD parts of the Linuxulator
Move sigprocmask actions defines under compat/linux,
they are identical across all Linux architectures.

MFC after:		2 weeks
2022-05-30 19:47:26 +03:00
Corvin Köhne
3ba952e1a2 vmm: add tunable to trap WBINVD
x86 is cache coherent. However, there are special cases where cache
coherency isn't ensured (e.g. when switching the caching mode). In these
cases, WBINVD can be used. WBINVD writes all cache lines back into main
memory and invalidates the whole cache.

Due to the invalidation of the whole cache, WBINVD is a very heavy
instruction and degrades the performance on all cores. So, we should
minimize the use of WBINVD as much as possible.

In a virtual environment, the WBINVD call is mostly useless. The guest
isn't able to break cache coherency because he can't switch the physical
cache mode. When using pci passthrough WBINVD might be useful.

Nevertheless, trapping and ignoring WBINVD is an unsafe operation. For
that reason, we implement it as tunable.

Reviewed by:	jhb
Sponsored by:	Beckhoff Automation GmbH & Co. KG
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D35253
2022-05-30 10:04:22 +02:00
Dmitry Chagin
0e26e54bdf linux(4): Handle 64-bit SO_TIMESTAMP for 32-bit binaries
To solve y2k38 problem in the recvmsg syscall the new SO_TIMESTAMP
constant were added on v5.1 Linux kernel. So, old 32-bit binaries
that knows only 32-bit time_t uses the old value of the constant,
and binaries that knows 64-bit time_t uses the new constant.

To determine what size of time_t type is expected by the user-space,
store requested value (SO_TIMESTAMP) in the process emuldata structure.

MFC after:		2 weeks
2022-05-28 23:45:39 +03:00
Bartosz Sobczak
cdcd52d41e
irdma: Add RDMA driver for Intel(R) Ethernet Controller E810
This is an initial commit for RDMA FreeBSD driver for Intel(R) Ethernet
Controller E810, called irdma.  Supporting both RoCEv2 and iWARP
protocols in per-PF manner, RoCEv2 being the default.

Testing has been done using krping tool, perftest, ucmatose, rping,
ud_pingpong, rc_pingpong and others.

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

Reviewed by:	#manpages (pauamma_gundo.com) [documentation]
MFC after:	1 week
Relnotes:	yes
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D34690
2022-05-23 16:52:49 -07:00
Dmitry Chagin
26700ac0c4 linux(4): Deduplicate execve
As linux_execve is common across archs, except amd64 32-bit Linuxulator,
move it under compat/linux.

Noted by:		andrew@
MFC after:		2 weeks
2022-05-23 13:18:41 +03:00
Dmitry Chagin
53726a1f1e linux(4): Fix execve() on amd64/linux32 after a125ed50
MFC after:		2 weeks
2022-05-23 13:17:39 +03:00
Dmitry Chagin
9016ec056a linux(4): Deduplicate bsd_to_linux_trapcode()
As bsd_to_linux_trapcode() is common for x86 Linuxulators,
move it under x86/linux.

MFC after:		2 weeks
2022-05-23 13:16:58 +03:00
Dmitry Chagin
2434137f69 linux(4): Deduplicate translate_traps()
As translate_traps() is common for x86 Linuxulators,
move it under x86/linux.

MFC after:		2 weeks
2022-05-23 13:16:26 +03:00
John Baldwin
7a0c23da4e vmm: Destroy character devices synchronously.
This fixes a userland race where bhyveload or bhyve can fail to reuse
a VM name after bhyvectl --destroy has returned.

Reported by:	Michael Dexter
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D35186
2022-05-20 09:53:43 -07:00
Dmitry Chagin
eca368ecb6 Retire sv_transtrap
Call translate_traps directly from sendsig().

MFC after:		2 weeks
2022-05-20 14:54:03 +03:00
Dmitry Chagin
8f9635dc99 linux(4): Retire handmade DWARF annotations from signal trampolines
The Linux exports __kernel_sigreturn and __kernel_rt_sigreturn from the
vdso. Modern glibc's sigaction sets the sa_restorer field of sigaction
to the corresponding vdso __sigreturn, and sets the SA_RESTORER.
Our signal trampolines uses the FreeBSD-way to call a signal handler,
so does not use the sigaction's sa_restorer.

However, as glibc's runtime linker depends on the existment of the vdso
__sigreturn symbols, for all Linuxulators was added separate trampolines
named __sigcode with DWARF anotations and left separate __sigreturn
methods, which are exported.

MFC after:		2 weeks
2022-05-15 21:08:12 +03:00
Dmitry Chagin
6e826d27c3 linux(4): Better naming for ucontext field of struct rt_sigframe
To reduce sendsig code difference and to avoid confusing me,
rename sf_sc to sf_uc to match the content.

MFC after:		2 weeks
2022-05-15 21:06:47 +03:00
Dmitry Chagin
af557e649c linux(4): Rework the definition of struct siginfo to match Linux actual one
Rework the defintion of struct siginfo so that the array padding
struct siginfo to SI_MAX_SIZE can be placed in a union along side of the
rest of the struct siginfo members.  The result is that we no longer need
the __ARCH_SI_PREAMBLE_SIZE or SI_PAD_SIZE definitions.

Move struct siginfo definition under /compat/linux to reduce MD part.
To avoid headers polution include linux_siginfo.h in the MD linux.h

MFC after:		2 weeks
2022-05-15 21:05:01 +03:00
Dmitry Chagin
21f2461741 linux(4): Move sigframe definitions to separate headers
The signal trampoine-related definitions are used only in the MD part
of code, wherefore moved from everywhere used linux.h to separate MD
headers.

MFC after:		2 weeks
2022-05-15 21:03:01 +03:00
Dmitry Chagin
ba279bcd6d linux(4): Cleanup signal trampolines
This is the first stage of a signal trampolines refactoring.

From trampolines retired emulation of the 'call' instruction, which is
replaced by direct call of a signal handler. The signal handler address
is in the register.

The previous trampoline implemenatation used semi-Linux-way to call
a signal handler via the 'jmp' instruction. Wherefore the trampoline
emulated a 'call' instruction to into the stack the return address for
signal handler's 'ret' instruction.  Wherefore handmade DWARD annotations
was used.

While here rephrased and removed excessive comments.

MFC after:		2 weeks
2022-05-15 21:00:05 +03:00
Dmitry Chagin
0b5d5dc376 linux(4): Retire unneeded initialization
Both uc_flags and uc_link are zeroed above. On amd64 and i386 the
uc_link field is not used at all. The UC_FP_XSTATE bit should be set
in the uc_flags if OS xsave knob is turned on (and xsave is implemented).

MFC after:		2 weeks
2022-05-15 20:58:47 +03:00
Mitchell Horne
db71383b88 kerneldump: remove physical from dump routines
It is unused, especially now that the underlying d_dumper methods do not
accept the argument.

Reviewed by:	markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D35174
2022-05-13 10:43:19 -03:00
Dmitry Chagin
7b76c79b0b linux(4): Regen for prctl fix.
MFC after:		2 weeks
2022-05-09 21:48:15 +03:00
Dmitry Chagin
bfae7fbaa2 linux(4): Fix 039e98e6.
The patch was about an year in my local queue and I still screwed up...

MFC after:		2 weeks
2022-05-09 21:45:51 +03:00
Dmitry Chagin
07d108932a linux(4): Return native error from futex_atomic_op to avoid conversion by the caller.
MFC after:		2 weeks
2022-05-09 21:16:31 +03:00
Dmitry Chagin
e768576718 linux(4): Regen for prctl syscall.
MFC after:		2 weeks
2022-05-09 21:10:48 +03:00
Dmitry Chagin
039e98e60c linux(4): Change prctl syscall definition to match Linux actual one.
Otherwise argX conversion leads to an unexpected behaviour.

MFC after:		2 weeks
2022-05-09 21:09:39 +03:00
Dmitry Chagin
5a6a4fb284 linux(4): Implement vdso getcpu for x86.
This is modeled after f2395455 (by kib@).

MFC after:		2 weeks
2022-05-08 17:20:52 +03:00
Dmitry Chagin
332eca05b5 linux(4): Refactor vdso_gettc_x86 includes.
Factor out includes from common vdso_gettc_x86 file to the corresponding
MD files.

MFC after:		2 weeks
2022-05-08 17:20:51 +03:00
Dmitry Chagin
61f45f6733 linux(4): Regen for ppoll_time64 syscall.
MFC after:		2 weeks
2022-05-08 13:38:17 +03:00
Dmitry Chagin
94f5f150ef linux(4): Fix ppoll_time64 syscall definition.
Fixed my typo in ed61e0ce1d. Here tsp is a pointer to the 64-bit timespec.

MFC after:		2 weeks
2022-05-08 13:37:48 +03:00
John Baldwin
badcc74fe9 amd64: Remove unused devclass arguments to DRIVER_MODULE. 2022-05-06 15:46:59 -07:00
Dmitry Chagin
3245a2ecea linux(4): Implement semtimedop syscalls.
On i386 are two semtimedop. The old one is called via multiplexor and
uses 32-bit timespec, and new semtimedop_tim64, which is uses 64-bit
timespec.

MFC after:		2 weeks
2022-05-06 20:02:59 +03:00
Dmitry Chagin
430460d717 linux(4): Regen for semtimedop syscalls.
MFC after:		2 weeks
2022-05-06 20:02:16 +03:00
Dmitry Chagin
f19c4e2341 linux(4): Change semtimedop syscall definition to match Linux actual one.
MFC after:		2 weeks
2022-05-06 20:01:43 +03:00
Dmitry Chagin
f48a68874b linux(4): Retire linux_semop implementation.
In i386 Linux semop called via ipc() multiplexor, so use kern_semop
directly from multiplexor.

MFC after:		2 weeks
2022-05-06 20:00:13 +03:00
Dmitry Chagin
cd875998dc linux(4): Regen for semop syscall.
MFC after:		2 weeks
2022-05-06 19:59:33 +03:00
Dmitry Chagin
f686092664 linux(4): Call semop directly.
As the Linux semop syscall is not defined in i386, and as it is equal
to the native semop syscall, call it directly.
Fix semop definition to match Linux actual one - nsops is size_t type.

MFC after:		2 weeks
2022-05-06 19:58:53 +03:00
Dan Carpenter
e99c0c8b79 xen: Prevent buffer overflow in privcmd ioctl
The "call" variable comes from the user in privcmd_ioctl_hypercall().
It's an offset into the hypercall_page[] which has (PAGE_SIZE / 32)
elements.  We need to put an upper bound on it to prevent an out of
bounds access.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>

Obtained from: Linux
Linux commit: 42d8644bd77dd2d747e004e367cb0c895a606f39
Fixes: bf7313e3b7 ("xen: implement the privcmd user-space device")
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by: royger
2022-05-06 09:31:32 +02:00
Dmitry Chagin
1744f14e26 linux(4): Implement recvmmsg_time64 syscall.
MFC after:		2 weeks
2022-05-04 13:06:53 +03:00
Dmitry Chagin
79695e9585 linux(4): Regen for recvmmsg_time64 syscall.
MFC after:	2 weeks
2022-05-04 13:06:52 +03:00
Dmitry Chagin
17ccda0039 linux(4): Change recvmmsg_time64 syscall definition to match Linux actual one.
MFC after:		2 weeks
2022-05-04 13:06:52 +03:00
Dmitry Chagin
ce9f8d6ab0 linux(4): Implement timerfd_gettime64 syscall.
MFC after:		2 weeks
2022-05-04 13:06:52 +03:00
Dmitry Chagin
ac80ae9313 linux(4): Regen for timerfd_gettime64 syscall.
MFC after:	2 weeks
2022-05-04 13:06:51 +03:00
Dmitry Chagin
16aefe5ba3 linux(4): Change timerfd_gettime64 syscall definition to match Linux actual one.
MFC after:		2 weeks
2022-05-04 13:06:51 +03:00
Dmitry Chagin
b1f0b08d93 linux(4): Implement timerfd_settime64 syscall.
MFC after:		2weeks
2022-05-04 13:06:50 +03:00
Dmitry Chagin
f4228fbb4e linux(4): Regen for timerfd_settime64 syscall.
MFC after:	2 weeks
2022-05-04 13:06:50 +03:00
Dmitry Chagin
8545bcff31 linux(4): Change timerfd_settime64 syscall definition to match Linux actual one.
MFC after:		2 weeks
2022-05-04 13:06:50 +03:00
Dmitry Chagin
a1fd2911dd linux(4): Implement timer_settime64 syscall.
MFC after:		2 weeks
2022-05-04 13:06:49 +03:00
Dmitry Chagin
9038a0b74c linux(4): Regen for timer_settime64 syscall.
MFC after:	2 weeks
2022-05-04 13:06:49 +03:00
Dmitry Chagin
1508b1b6a0 linux(4): Change timer_settime64 syscall definition to match Linux actual one.
MFC after:		2 weeks
2022-05-04 13:06:48 +03:00
Dmitry Chagin
783c1bd8cb linux(4): Implement timer_gettime64 syscall.
MFC after:		2 weeks
2022-05-04 13:06:48 +03:00
Dmitry Chagin
1cccef6dff linux(4): Regen for timer_gettime64 syscall.
MFC after:	2 weeks
2022-05-04 13:06:48 +03:00
Dmitry Chagin
ccec96033c linux(4): Change timer_gettime64 syscall definition to match Linux actual one.
MFC after:		2 weeks
2022-05-04 13:06:47 +03:00
Dmitry Chagin
8c84ca657b linux(4): Implement sched_rr_get_interval_time64 syscall.
MFC after:		2 weeks
2022-05-04 13:06:47 +03:00
Dmitry Chagin
cdddbb77c3 linux(4): Regen for sched_rr_get_interval_time64 syscall.
MFC after:	2 weeks
2022-05-04 13:06:46 +03:00
Dmitry Chagin
7b520c0b3c linux(4): Change sched_rr_get_interval_time64 syscall definition to match Linux actual one.
MFC after:		2 weeks
2022-05-04 13:06:45 +03:00
Dmitry Chagin
fe2c9f83a6 Remove dead code.
is_physical_memory() dead since 235a54de.

Reviewed by:		markj
Differential revision:	https://reviews.freebsd.org/D35056
MFC after:		2 weeks
2022-04-26 19:40:59 +03:00
Dmitry Chagin
df377f1fb8 linux(4): Regen for epoll_pwait2 syscall. 2022-04-26 19:35:58 +03:00
Dmitry Chagin
81b0b7dc0c linux(4): Change epoll_pwait2 syscall definition to match Linux actual one.
MFC after:	2 weeks
2022-04-26 19:35:57 +03:00
Dmitry Chagin
75e409495f linux(4): Regen for rseq syscall. 2022-04-26 19:35:55 +03:00
Dmitry Chagin
f202f35db0 linux(4): Change rseq syscall definition to match Linux actual one.
MFC after:	2 weeks
2022-04-26 19:35:54 +03:00
John Baldwin
f2d166d532 amd64 NOTES: Add entries for qlxgb, glxgbe, and glxge. 2022-04-22 15:18:06 -07:00
John Baldwin
5bf623bbcd amd64 NOTES: Sort the axp entry. 2022-04-22 15:18:06 -07:00
John Baldwin
0b377a49fa FB_INSTALL_CDEV: Remove this option and related code.
This option was never enabled in GENERIC and does not appear to work
(the cdevsw is stored in a global array but never passed to make_dev
to be associated with a character device).

Reviewed by:	emaste
Differential Revision:	https://reviews.freebsd.org/D35008
2022-04-21 10:29:14 -07:00
Brooks Davis
c2f6aae007 machine/in_cksum.h: don't include sys/cdefs.h
All consumers already do it and it was required on amd64 and i386
until recently (1c1bf5bd7c).

Reviewed by:	emaste
Differential Revision:	https://reviews.freebsd.org/D34932
2022-04-18 21:02:19 +01:00
John Baldwin
3d6f4411e4 Remove checks for <sys/cdefs.h> being included.
These files no longer depend on the macros required when these checks
were added.

PR:		263102 (exp-run)
Reviewed by:	brooks, imp, emaste
Differential Revision:	https://reviews.freebsd.org/D34804
2022-04-12 10:06:18 -07:00
John Baldwin
5f9c9ae2f2 Remove checks for __CC_SUPPORTS_WARNING assuming it is always true.
All supported compilers (modern versions of GCC and clang) support
this.

PR:		263102 (exp-run)
Reviewed by:	brooks, imp
Differential Revision:	https://reviews.freebsd.org/D34803
2022-04-12 10:06:13 -07:00
John Baldwin
f08613087a Remove checks for __CC_SUPPORTS__INLINE assuming it is always true.
All supported compilers (modern versions of GCC and clang) support
this.

PR:		263102 (exp-run)
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D34802
2022-04-12 10:06:09 -07:00
John Baldwin
5ab33279ad Remove checks for __GNUCLIKE___TYPEOF assuming it is always true.
All supported compilers (modern versions of GCC and clang) support
this.

PR:		263102 (exp-run)
Reviewed by:	brooks, imp, emaste
Differential Revision:	https://reviews.freebsd.org/D34798
2022-04-12 10:05:50 -07:00
John Baldwin
56f5947a71 Remove checks for __GNUCLIKE_ASM assuming it is always true.
All supported compilers (modern versions of GCC and clang) support
this.

Many places didn't have an #else so would just silently do the wrong
thing.  Ancient versions of icc (the original motivation for this) are
no longer a compiler FreeBSD supports.

PR:		263102 (exp-run)
Reviewed by:	brooks, imp
Differential Revision:	https://reviews.freebsd.org/D34797
2022-04-12 10:05:45 -07:00
John Baldwin
1c1bf5bd7c x86: Remove silly checks for <sys/cdefs.h>.
These headers #include <sys/cdefs.h> right after checking if it has
already been #included.  The nested #include already existed when the
check for _SYS_CDEFS_H_ was added, so the check shouldn't have been
added in the first place.

PR:		263102 (exp-run)
Reported by:	brooks
Reviewed by:	brooks, imp, emaste
Differential Revision:	https://reviews.freebsd.org/D34796
2022-04-12 10:05:39 -07:00
Robert Wing
d4e8207317 vmm_instruction_emul.c: fix bhyve build
The __diagused macro was used to cure a "set but not used" warning. This
broke the build for bhyve since __diagused is only defined in the
kernel. Define __diagused when not building the kernel.

Fixes:          5241577a22 ("vmm: fix set but not used warning")
Reported by:    Jenkins
2022-04-10 13:37:24 -08:00
Robert Wing
5a17f489d5 vmm: fix set but not used warning 2022-04-10 10:30:19 -08:00
Robert Wing
5241577a22 vmm: fix set but not used warning 2022-04-10 10:30:16 -08:00
Robert Wing
3587bfa797 vmm: fix set but not used warning 2022-04-10 10:30:14 -08:00
Robert Wing
5c272efaba vmm: fix set but not used warnings 2022-04-10 10:30:11 -08:00
Robert Wing
f877977a03 vmm: fix set but not used warnings 2022-04-10 10:30:08 -08:00
Robert Wing
893a3dd697 vmm: fix set but not used warning 2022-04-10 10:30:05 -08:00
Gordon Bergling
ffedb32dd3 pax(1): Remove a few double words in source code comments
- s/an an/an/

MFC after:	3 days
2022-04-09 14:27:39 +02:00
Gordon Bergling
efd0fdfe28 NOTES: Remove a double word in comments
- s/for for/for/

MFC after:	3 days
2022-04-09 10:31:49 +02:00
John Baldwin
6a33ecdc2f Fix a typo in previous commit.
Reported by:	npn
Pointy hat to:	jhb
Fixes:		572edd3dae vmm: Re-quiet set but unused warnings.
2022-04-08 12:01:33 -07:00
John Baldwin
a7d876f701 vmm amdvi: Move ctrl under #ifdef AMDVI_DEBUG_CMD. 2022-04-07 17:01:29 -07:00
John Baldwin
572edd3dae vmm: Re-quiet set but unused warnings.
__diagused is no longer used for KTR, so instead use #ifdef KTR or
expand KTR-only variables into their values in traces.
2022-04-07 17:01:29 -07:00
John Baldwin
3797f43e91 sgx: Remove unused variable. 2022-04-07 17:01:28 -07:00
Dmitry Chagin
d5dc757e84 linux(4): Add compat.linux32.emulate_i386 knob.
Historically 32-bit Linuxulator under amd64 emulated the real i386
behavior. Since 3d8dd983 the old i386 Linux world can't be used under
amd64 Linuxulator as it don't know anything about amd64 machine (which
is returned now by newuname() syscall). So, add a knob to allow to swith
the behavior and use i386 Linux binaries on amd64.
Set knob to the new behavior as I think this is common to the modern
Linux distros.

Reviewed by:		Pau Amma (doc), emaste
Differential revision:	https://reviews.freebsd.org/D34708
MFC after:		2 weeks
2022-03-31 21:01:09 +03:00
Brooks Davis
8601fca789 sysent: regen for syscallarg_t 2022-03-28 19:43:03 +01:00
Brooks Davis
b1ad6a9000 syscallarg_t: Add a type for system call arguments
This more clearly differentiates system call arguments from integer
registers and return values. On current architectures it has no effect,
but on architectures where pointers are not integers (CHERI) and may
not even share registers (CHERI-MIPS) it is necessiary to differentiate
between system call arguments (syscallarg_t) and integer register values
(register_t).

Obtained from:	CheriBSD

Reviewed by:	imp, kib
Differential Revision:	https://reviews.freebsd.org/D33780
2022-03-28 19:43:03 +01:00
John Baldwin
931983ee08 x86: Add a NT_X86_SEGBASES register set.
This register set contains the values of the fsbase and gsbase
registers.  Note that these registers can already be controlled
individually via ptrace(2) via MD operations, so the main reason for
adding this is to include these register values in core dumps.  In
particular, this will enable looking up the value of TLS variables
from core dumps in gdb.

The value of NT_X86_SEGBASES was chosen to match the value of
NT_386_TLS on Linux.  The notes serve similar purposes, but FreeBSD
will never dump a note equivalent to NT_386_TLS (which dumps a single
segment descriptor rather than a pair of addresses) and picking a
currently-unused value in the NT_X86_* range could result in a future
conflict.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D34650
2022-03-24 11:36:19 -07:00
Bjoern A. Zeeb
246c398145 bhyve: Do not remove guest physical addresses from IOMMU host domain
This permits I/O devices on the host to directly access wired memory
dedicated to guests using passthru devices.  Note that wired memory
belonging to guests that do not use passthru devices has always been
accessible by I/O devices on the host.

bhyve maps guest physical addresses into the user address space of
the bhyve process by mmap'ing /dev/vmm/<vmname>.  Device models pass
pointers derived from this mapping directly to system calls such as
preadv() to minimize copies when emulating DMA.  If the backing store
for a device model is a raw host device (e.g. when exporting a raw disk
device such as /dev/ada<n> as a drive in the guest), the host device
driver (e.g. ahci for /dev/ada<n>) can itself use DMA on the host
directly to the guest's memory.  However, if the guest's memory is
not present in the host IOMMU domain, these DMA requests by the host
device will fail without raising an error visible to the host device
driver or to the guest resulting in non-working I/O in the guest.

It is unclear why guest addresses were removed from the IOMMU host domain
initially, especially only for VM's with a passthru device as the
host IOMMU domain does not affect the permissions of passthru devices,
only devices on the host.

A considered alternative was using bounce buffers instead (D34535
is a proof of concept), but that adds additional overhead for unclear
benefit.

This solves a long-standing problem when using passthru devices and
physical disks in the same VM.

Thanks to:	grehan (patience and help)
Thanks to:	jhb (for improving the commit message)
PR:		260178
Reviewed by:	grehan, jhb
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D34607
2022-03-24 15:21:24 +00:00
Roger Pau Monné
1ca34862dc x86/tsc: fetch frequency from CPUID when running on Xen
Introduce a helper to fetch the TSC frequency from CPUID when running
under Xen.

Since the TSC can also be initialized early when running as a Xen
guest pull out the call to tsc_init() from the
early_clock_source_init() handlers and place it in clock_init(), as
otherwise all handlers would call tsc_init() anyway.

Reviewed by: markj
Sponsored by: Citrix Systems R&D
Differential revision: https://reviews.freebsd.org/D34581
2022-03-18 10:21:04 +01:00
Corvin Köhne
e47fe3183e bhyve: add ROM emulation
Some PCI devices especially GPUs require a ROM to work properly.
The ROM is executed by boot firmware to initialize the device.
To add a ROM to a device use the new ROM option for passthru device
(e.g. -s passthru,0/2/0,rom=<path>/<to>/<rom>).

It's necessary that the ROM is executed by the boot firmware.
It won't be executed by any OS.
Additionally, the boot firmware should be configured to execute the
ROM file.
For that reason, it's only possible to use a ROM when using
OVMF with enabled bus enumeration.

Differential Revision:	https://reviews.freebsd.org/D33129
Sponsored by:   Beckhoff Automation GmbH & Co. KG
MFC after:      1 month
2022-03-10 12:30:37 +01:00
Andrew Turner
9cf15aefb9 Fix the spelling of EFI_PAGE_SIZE
We assume EFI_PAGE_SIZE is the same as PAGE_SIZE, however this may not
be the case. Use the former when working with a list of pages from the
UEFI firmware so the correct size is used.

This will be needed on arm64 where PAGE_SIZE could be 16k or 64k in the
future. The other architectures have been updated to be consistent.

Reviewed by:	imp
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34510
2022-03-10 10:43:54 +00:00
John Baldwin
f1d450ddee bhyve: Remove VM_MAXCPU from the userspace API/ABI.
Reviewed by:	grehan
Differential Revision:	https://reviews.freebsd.org/D34494
2022-03-09 15:39:28 -08:00
Mark Johnston
f7a6dccf42 amd64: Call clock_init() after finishidentcpu()
As in commit c3d830cf7c, we should finalize CPU identification before
probing the TSC frequency.

Fixes:		84369dd523 ("x86: Probe the TSC frequency earlier")
Reported by:	khng
2022-03-04 19:32:39 -05:00
Mark Johnston
84369dd523 x86: Probe the TSC frequency earlier
This lets us use the TSC to implement early DELAY, limiting the use of
the sometimes-unreliable 8254 PIT.

PR:		262155
Reviewed by:	emaste
Tested by:	emaste, mike tancsa <mike@sentex.net>, Stefan Hegnauer <stefan.hegnauer@gmx.ch>
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D34367
2022-03-01 09:39:35 -05:00
Robert Wing
2062ce996d vmm: fix "set but not used" warnings 2022-02-28 15:09:32 -09:00
Robert Wing
39d87a0235 vmm: fix "set but not used" warnings 2022-02-28 14:55:37 -09:00
Robert Wing
73505a1076 vmm: fix "set but not used" warnings 2022-02-28 14:46:08 -09:00
Mateusz Guzik
b53133a778 proc: load/store p_cowgen using atomic primitives 2022-02-13 13:07:08 +00:00
Warner Losh
690601f0b4 amd64: Add static asssert for context size
Add a static assert for the siginfo_t, mcontext_t and ucontext_t
sizes. These are de-facto ABI options and cannot change size ever.

Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D34212
2022-02-10 14:32:20 -07:00
John Baldwin
6426978617 Extend the VMM stats interface to support a dynamic count of statistics.
- Add a starting index to 'struct vmstats' and change the
  VM_STATS ioctl to fetch the 64 stats starting at that index.
  A compat shim for <= 13 continues to fetch only the first 64
  stats.

- Extend vm_get_stats() in libvmmapi to use a loop and a static
  thread local buffer which grows to hold the stats needed.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D27463
2022-02-07 14:11:10 -08:00
Elliott Mitchell
ad7dd51499 xen: switch to use headers in contrib
These headers originate with the Xen project and shouldn't be mixed with
the main portion of the FreeBSD kernel. Notably they shouldn't be the
target of clean-up commits.

Switch to use the headers in sys/contrib/xen.

Reviewed by: royger
2022-02-07 10:11:56 +01:00
John Baldwin
6bea696af2 linux_copyout_strings: Use PROC_PS_STRINGS().
Reviewed by:	markj
Obtained from:	CheriBSD
Differential Revision:	https://reviews.freebsd.org/D34173
2022-02-04 15:57:57 -08:00
Konstantin Belousov
9596b349bb x86 atomic.h: remove obsoleted comment
Modules no longer call kernel functions for atomic ops, and since the
previous commit, we always use lock prefix.

Submitted by:	Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by:	jhb, markj
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34153
2022-02-04 14:01:39 +02:00
Konstantin Belousov
9c0b759bf9 x86 atomics: use lock prefix unconditionally
Atomics have significant other use besides providing in-system
primitives for safe memory updates.  They are used for implementing
communication with out of system software or hardware following some
protocols.

For instance, even UP kernel might require a protocol using atomics to
communicate with the software-emulated device on SMP hypervisor.  Or
real hardware might need atomic accesses as part of the proper
management protocol.

Another point is that UP configurations on x86 are extinct, so slight
performance hit by unconditionally use proper atomics is not important.
It is compensated by less code clutter, which in fact improves the
UP/i386 lifetime expectations.

Requested by:	Elliott Mitchell <ehem+freebsd@m5p.com>
Reviewed by:	Elliott Mitchell, imp, jhb, markj, royger
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34153
2022-02-04 14:01:39 +02:00
Konstantin Belousov
cbf999e75d x86 atomic.h: cleanup comments for preprocessor directives
Reviewed by:	Elliott Mitchell, imp, jhb, markj, royger
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D34153
2022-02-04 14:01:39 +02:00
Konstantin Belousov
e7c5442162 amd64: micro-optimize vptopte()/vtopde() further
Eliminate shlq $3,address shift after masking of the va is done, which
is needed to convert pt_entry_t[] array index into byte offset.
Do it by preshifting the mask, and compensating the right shift of va.

Suggested by:	alc
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33786
2022-02-02 11:40:04 +02:00
Andrew Turner
548a2ec49b Add PT_GETREGSET
This adds the PT_GETREGSET and PT_SETREGSET ptrace types. These can be
used to access all the registers from a specified core dump note type.
The NT_PRSTATUS and NT_FPREGSET notes are initially supported. Other
machine-dependant types are expected to be added in the future.

The ptrace addr points to a struct iovec pointing at memory to hold the
registers along with its length. On success the length in the iovec is
updated to tell userspace the actual length the kernel wrote or, if the
base address is NULL, the length the kernel would have written.

Because the data field is an int the arguments are backwards when
compared to the Linux PTRACE_GETREGSET call.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19831
2022-01-27 11:40:34 +00:00
Mark Johnston
758d98debe exec: Remove the stack gap implementation
ASLR stack randomization will reappear in a forthcoming commit.  Rather
than inserting a random gap into the stack mapping, the entire stack
mapping itself will be randomized in the same way that other mappings
are when ASLR is enabled.

No functional change intended, as the stack gap implementation is
currently disabled by default.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33704
2022-01-17 16:11:54 -05:00
Mark Johnston
706f4a81a8 exec: Introduce the PROC_PS_STRINGS() macro
Rather than fetching the ps_strings address directly from a process'
sysentvec, use this macro.  With stack address randomization the
ps_strings address is no longer fixed.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33704
2022-01-17 16:11:54 -05:00
Mark Johnston
3fc21fdd5f sysent: Add a sv_psstringssz field to struct sysentvec
The size of the ps_strings structure varies between ABIs, so this is
useful for computing the address of the ps_strings structure relative to
the top of the stack when stack address randomization is enabled.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33704
2022-01-17 11:42:07 -05:00
Corvin Köhne
6171e026be bhyve: add support for MTRR
Some guests or driver might depend on MTRR to work properly. E.g. the
nvidia gpu driver won't work without MTRR.

Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	Beckhoff Automation GmbH & Co. KG
Differential Revision:	https://reviews.freebsd.org/D33333
2022-01-14 12:41:44 +01:00
John Baldwin
bd7630ef61 ia32: Sync signal context type names with i386.
- Use ia32_freebsd4_* instead of ia32_*4.
- Use ia32_o* instead of ia32_*3.

Reviewed by:	brooks, imp, kib
Sponsored by:	The University of Cambridge, Google Inc.
Differential Revision:	https://reviews.freebsd.org/D33882
2022-01-13 17:17:21 -08:00
Brooks Davis
0910a41ef3 Revert "syscallarg_t: Add a type for system call arguments"
Missed issues in truss on at least armv7 and powerpcspe need to be
resolved before recommit.

This reverts commit 3889fb8af0.
This reverts commit 1544e0f5d1.
2022-01-12 23:29:20 +00:00
Brooks Davis
3889fb8af0 sysent: regen for syscallarg_t 2022-01-12 22:51:25 +00:00
Brooks Davis
1544e0f5d1 syscallarg_t: Add a type for system call arguments
This more clearly differentiates system call arguments from integer
registers and return values. On current architectures it has no effect,
but on architectures where pointers are not integers (CHERI) and may
not even share registers (CHERI-MIPS) it is necessiary to differentiate
between system call arguments (syscallarg_t) and integer register values
(register_t).

Obtained from:	CheriBSD

Reviewed by:	imp, kib
Differential Revision:	https://reviews.freebsd.org/D33780
2022-01-12 22:51:25 +00:00
Vitaliy Gusev
c72e914cf1 vmm: vlapic resume can eat 100% CPU by vlapic_callout_handler
Suspend/Resume of Win10 leads that CPU0 is busy on handling interrupts.

Win10 does not use LAPIC timer to often and in most cases, and I see it
is disabled by writing 0 to Initial Count Register (for Timer).

During resume, restart timer only for enabled LAPIC and enabled timer
for that LAPIC.

Reviewed by:	markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D33448
2022-01-11 09:27:45 -05:00
Konstantin Belousov
15964f1cb3 amd64 pmap: preset A and M bits for pmap_qenter() and pmap_kenter() mappings
This removes one or two atomic update of the pte on access. Also it
makes the pte constant during its lifecycle.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33778
2022-01-08 06:34:18 +02:00
Konstantin Belousov
720a892ac6 amd64 pmap: simplify vtopte() and vtopde()
Pre-calculate masks and page table/page directory bases, for LA48 and
LA57 cases, trading a conditional for the memory accesses.

This shaves 672 bytes of .text, by the cost of 32 .data bytes.  Note
that eliminated code contained one conditional, and several costly
movabs instructions, which are traded by two memory fetches per
vtop{t,d}e() inlines.

Reviewed by:	markj
Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D33776
2022-01-08 06:34:18 +02:00
Warner Losh
9a498f9a14 lio: remove from NOIP
It doesn't build here. This was masked earlier by other build breakage.

Sponsored by:		Netflix
2022-01-05 14:19:34 -07:00
Edward Tomasz Napierala
f7b04c53de linux(4): Reduce diffs between linux_rt_sendsig() and sendsig()
No functional changes (except for the uprintf).

Discussed With:	kib
Sponsored By:	EPSRC
2022-01-04 14:39:19 +00:00
Edward Tomasz Napierala
562bc0a943 Unstaticize {get,set}_fpcontext() on amd64
This will be used to fix Linux signal delivery.

Discussed With:	kib
Sponsored By:	EPSRC
2022-01-04 13:25:12 +00:00
Konstantin Belousov
642f77be1d amd64 sigtramp: comment-out annotations for registers with DWARF number >= 32
Sponsored by:	The FreeBSD Foundation
2022-01-02 21:00:52 +02:00
Konstantin Belousov
76ef4f6348 amd64 get_mcontext(): third argument to get_fpcontext() is a pointer
Use NULL instead of raw 0

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2022-01-01 05:58:21 +02:00
Mark Johnston
f04a096049 exec: Simplify sv_copyout_strings implementations a bit
Simplify control flow around handling of the execpath length and signal
trampoline.  Cache the sysentvec pointer in a local variable.

No functional change intended.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33703
2021-12-31 12:50:15 -05:00
Stefan Eßer
e2650af157 Make CPU_SET macros compliant with other implementations
The introduction of <sched.h> improved compatibility with some 3rd
party software, but caused the configure scripts of some ports to
assume that they were run in a GLIBC compatible environment.

Parts of sched.h were made conditional on -D_WITH_CPU_SET_T being
added to ports, but there still were compatibility issues due to
invalid assumptions made in autoconfigure scripts.

The differences between the FreeBSD version of macros like CPU_AND,
CPU_OR, etc. and the GLIBC versions was in the number of arguments:
FreeBSD used a 2-address scheme (one source argument is also used as
the destination of the operation), while GLIBC uses a 3-adderess
scheme (2 source operands and a separately passed destination).

The GLIBC scheme provides a super-set of the functionality of the
FreeBSD macros, since it does not prevent passing the same variable
as source and destination arguments. In code that wanted to preserve
both source arguments, the FreeBSD macros required a temporary copy of
one of the source arguments.

This patch set allows to unconditionally provide functions and macros
expected by 3rd party software written for GLIBC based systems, but
breaks builds of externally maintained sources that use any of the
following macros: CPU_AND, CPU_ANDNOT, CPU_OR, CPU_XOR.

One contributed driver (contrib/ofed/libmlx5) has been patched to
support both the old and the new CPU_OR signatures. If this commit
is merged to -STABLE, the version test will have to be extended to
cover more ranges.

Ports that have added -D_WITH_CPU_SET_T to build on -CURRENT do
no longer require that option.

The FreeBSD version has been bumped to 1400046 to reflect this
incompatible change.

Reviewed by:	kib
MFC after:	2 weeks
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D33451
2021-12-30 12:20:32 +01:00
John Baldwin
254e4e5b77 Simplify swi for bus_dma.
When a DMA request using bounce pages completes, a swi is triggered to
schedule pending DMA requests using the just-freed bounce pages.  For
a long time this bus_dma swi has been tied to a "virtual memory" swi
(swi_vm).  However, all of the swi_vm implementations are the same and
consist of checking a flag (busdma_swi_pending) which is always true
and if set calling busdma_swi.  I suspect this dates back to the
pre-SMPng days and that the intention was for swi_vm to serve as a
mux.  However, in the current scheme there's no need for the mux.

Instead, remove swi_vm and vm_ih.  Each bus_dma implementation that
uses bounce pages is responsible for creating its own swi (busdma_ih)
which it now schedules directly.  This swi invokes busdma_swi directly
removing the need for busdma_swi_pending.

One consequence is that the swi now works on RISC-V which had previously
failed to invoke busdma_swi from swi_vm.

Reviewed by:	imp, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D33447
2021-12-28 13:51:25 -08:00
Alan Cox
e161dfa918 Fix pmap_is_prefaultable() on arm64 and riscv
The current implementations never correctly return TRUE. In all cases,
when they currently return TRUE, they should have returned FALSE.  And,
in some cases, when they currently return FALSE, they should have
returned TRUE.  Except for its effects on performance, specifically,
additional page faults and pointless calls to pmap_enter_quick() that
abort, this error is harmless.  That is why it has gone unnoticed.

Add a comment to the amd64, arm64, and riscv implementations
describing how their return values are computed.

Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D33659
2021-12-27 19:17:14 -06:00
Kyle Evans
e6f760f0e8 sysent: regenerate 2021-12-16 20:56:28 -06:00
Kyle Evans
8494666658 sysent: move away from allowing all compat options for other ABIs
Notably, the current compat_options only makes sense for native and
freebsd32 ABIs.  For the others, it just adds cruft. Switch to having
sets of compat options, and default to the native set.  Setup the other
ABIs where it doesn't make sense to opt-out of the native set.

This removes some redundant COMPAT_FREEBSD* stuff from Linuxolator bits.

line_expr in makesyscalls.lua is fixed to allow empty strings to be
specified, since they're harmless.

Reviewed by:	brooks, kib (both earlier version)
Differential Revision:	https://reviews.freebsd.org/D33356
2021-12-16 20:56:28 -06:00
Warner Losh
b4fba31b63 Remove references to PCMCIA
Remove more references to PCMCIA in kernel config files. We no longer
support PC Card devices.

Sponsored by:		Netflix
2021-12-14 15:27:47 -07:00
Konstantin Belousov
0e6b06d5c8 x86: add a comment providing source for numbers in legacy XSAVE area layout
Suggested by:	Michael Pratt <mpratt@google.com>
Reviewed by:	Michael Pratt <mpratt@google.com>, emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D33423
2021-12-14 19:14:40 +02:00
Konstantin Belousov
73b357be92 amd64: correct size of the SSE area in the xsave layout
PR:	https://github.com/golang/go/issues/46272
Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-12-12 20:20:31 +02:00
Warner Losh
21e22be91a ed: Remove options
ed(4) was removed some time ago, but these options relevant to only it
weren't GC'd at the time. Remove them.

Sponsored by:		Netflix
2021-12-09 17:41:39 -07:00
John Baldwin
1a62e9bc00 Add <machine/tls.h> header to hold MD constants and helpers for TLS.
The header exports the following:

- Definition of struct tcb.
- Helpers to get/set the tcb for the current thread.
- TLS_TCB_SIZE (size of TCB)
- TLS_TCB_ALIGN (alignment of TCB)
- TLS_VARIANT_I or TLS_VARIANT_II
- TLS_DTV_OFFSET (bias of pointers in dtv[])
- TLS_TP_OFFSET (bias of "thread pointer" relative to TCB)

Note that TLS_TP_OFFSET does not account for if the unbiased thread
pointer points to the start of the TCB (arm and x86) or the end of the
TCB (MIPS, PowerPC, and RISC-V).

Note also that for amd64, the struct tcb does not include the unused
tcb_spare field included in the current structure in libthr.  libthr
does not use this field, and the existing calls in libc and rtld that
allocate a TCB for amd64 assume it is the size of 3 Elf_Addr's (and
thus do not allocate room for tcb_spare).

A <sys/_tls_variant_i.h> header is used by architectures using
Variant I TLS which uses a common struct tcb.

Reviewed by:	kib (older version of x86/tls.h), jrtc27
Sponsored by:	The University of Cambridge, Google Inc.
Differential Revision:	https://reviews.freebsd.org/D33351
2021-12-09 13:17:13 -08:00
Mateusz Guzik
8c3149be54 amd64: plug set-but-not-used vars in pmap
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2021-12-09 16:37:26 +00:00
Konstantin Belousov
b7c55487ff Regen 2021-12-09 02:49:10 +02:00
Brooks Davis
547566526f Make struct syscall_args machine independent
After a round of cleanups in late 2020, all definitions are
functionally identical.

This removes a rotted __aligned(8) on arm. It was added in
b7112ead32 and was intended to align the
args member so that 64-bit types (off_t, etc) could be safely read on
armeb compiled with clang. With the removal of armev, this is no
longer needed (armv7 requires that 32-bit aligned reads of 64-bit
values be supported and we enable such support on armv6).  As further
evidence this is unnecessary, cleanups to struct syscall_args have
resulted in args being 32-bit aligned on 32-bit systems.  The sole
effect is to bloat the struct by 4 bytes.

Reviewed by:	kib, jhb, imp
Differential Revision:	https://reviews.freebsd.org/D33308
2021-12-08 18:45:33 +00:00
Konstantin Belousov
8a4bd7f818 amd64 ia32 vdso: add unwind annotations to the signal trampoline
Reviewed by:	emaste
Discussed with:	jhb, jrtc27
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D32960
2021-12-06 20:47:24 +02:00
Konstantin Belousov
5b8918fac6 amd64 native vdso: add unwind annotations to the signal trampoline
Reviewed by:	emaste
Discussed with:	jhb, jrtc27
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D32960
2021-12-06 20:47:17 +02:00
Konstantin Belousov
98c8b62524 vdso for ia32 on amd64
Reviewed by:	emaste
Discussed with:	jrtc27
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D32960
2021-12-06 20:46:49 +02:00
Konstantin Belousov
ab4524b3d7 amd64: wrap 64bit sigtramp into vdso
Reviewed by:	emaste
Discussed with:	jrtc27
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D32960
2021-12-06 20:46:49 +02:00
Mark Johnston
f06f1d1fdb x86: Deduplicate clock.h
The headers were mostly identical on amd64 and i386.

No functional change intended.

Reviewed by:	cperciva, mav, imp, kib, jhb
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33205
2021-12-06 10:39:08 -05:00
Alan Cox
4c57d6d555 amd64/pmap: fix user page table page accounting
When a superpage mapping is destroyed and the original page table page
containing 4KB mappings that was being held in reserve is deallocated,
the recently introduced user page table page count was not being
decremented.  Consequentially, the count was wrong and would grow over
time.  For example, after multiple iterations of "buildworld", I was
seeing implausible counts, like the following:

vm.pmap.kernel_pt_page_count: 2184
vm.pmap.user_pt_page_count: 2280849
vm.pmap.pv_page_count: 106

With this change, I now see:

vm.pmap.kernel_pt_page_count: 2183
vm.pmap.user_pt_page_count: 344
vm.pmap.pv_page_count: 105

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D33276
2021-12-05 19:13:43 -06:00
Mitchell Horne
03b3d7bbec x86: remove unused T_USER flag
It stopped being used in 3c256f5395, when trap() was reorganized to
have separate switch statements for user and kernel traps. Remove the
two leftover references and the flag itself.

Reviewed by:	kib
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D33253
2021-12-05 11:12:40 -04:00
Ed Maste
28dcccc129 x86 GENERIC/MINIMAL: group sc(4) devices together
The vga and splash devices are part of the sc(4) system console. vt(4)
uses the vt_vga driver instead, and has some limited splash support
directly in vt_core.c.  Leave the sc(4) options in GENERIC/MINIMAL (for
now) but group them together under an sc(4) comment.

Sponsored by:	The FreeBSD Foundation
2021-11-28 14:38:41 -05:00
Ed Maste
777526ed83 Remove options VESA from x86 MINIMAL
Followup to b8cf1c5c30, remove from MINIMAL in addition to GENERIC.

options VESA / vesa.ko provides VESA Bios Extensions (VBE) support for
the legacy sc(4) console.  It is not used by the default console, vt(4).

PR:		253733
Fixes:		b8cf1c5c30 ("Remove options VESA from x86 GENERIC")
Relnotes:	Yes
Sponsored by:	The FreeBSD Foundation
2021-11-28 14:37:46 -05:00
Ed Maste
b8cf1c5c30 Remove options VESA from x86 GENERIC
options VESA / vesa.ko provides VESA Bios Extensions (VBE) support for
the legacy sc(4) console.  It is not used by the default console, vt(4).

There is a report[1] of an incompatibility between VESA and the Nvidia
driver breaking suspend/resume.  Since VESA is not used by the default
configuration anyway, just remove options VESA from GENERIC.  The kernel
module is still available and may be loaded by sc(4) users who want to
select a VBE mode.

(Note that vt(4) does not support selecting a VBE mode.  The loader can
set a VBE mode and vt(4) will use it via the vt_vbefb driver.)

[1] https://lists.freebsd.org/archives/freebsd-hackers/2021-November/000469.html

PR:		253733
Reported by:	Stefan Blachmann [1]
Reviewed by:	imp, manu, tsoome
Relnotes:	Yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33141
2021-11-28 11:29:17 -05:00
Ed Maste
228e020a3b Correct syscons description in i386 and amd64 configs
Commit 2d6f6d6373 switched to vt(4) as the default console.

Sponsored by:	The FreeBSD Foundation
2021-11-27 16:22:42 -05:00
Mateusz Guzik
af4051d250 linux: remove the always curthread argument from lconvpath 2021-11-25 22:50:42 +00:00
Warner Losh
8722e05ae1 twa: Remove
Belatedly remove twa(4). It was supposed to go before 13.0, but was
overlooked.

Sponsored by:		Netflix
Relnotes:		yes
Reviewed by:		scottl
Differential Revision:	https://reviews.freebsd.org/D33114
2021-11-25 00:45:13 -07:00
Warner Losh
0d5935af8f esp: Remove
Belatedly remove esp(4). It was tagged as gone in 13, but was overlooked
until now.

Sponsored by:		Netflix
Reviewed by:		scottl
Differential Revision:	https://reviews.freebsd.org/D33115
2021-11-25 00:45:12 -07:00
Warner Losh
60de2867c9 amr: remove
Belatedly remove amr(4). It was slated to depart before 13.0 but was
overlooked until now.

Sponsored by:		Netflix
Relnotes:		yes
Reviewed by:		scottl
Differential Revision:	https://reviews.freebsd.org/D33113
2021-11-25 00:45:12 -07:00
Warner Losh
399188a2c6 iir: Remove
Belatedly remove iir(4). It was slated to go before 13, but was
overlooked.

Sponsored by:		Netflix
Relnotes:		yes
Reviewed by:		scottl
Differential Revision:	https://reviews.freebsd.org/D33112
2021-11-25 00:45:12 -07:00
Warner Losh
a9620045a5 mly: Remove.
We'd said this was going away in 13, but was overlooked. Belatedly
remove.

Sponsored by:		Netflix
Relnotes:		yes
Reviewed by:		scottl
Differential Revision:	https://reviews.freebsd.org/D33111
2021-11-25 00:45:12 -07:00
Mark Johnston
ecbbe83144 netinet: Deduplicate most in_cksum() implementations
in_cksum() and related routines are implemented separately for each
platform, but only i386 and arm have optimized versions.  Other
platforms' copies of in_cksum.c are identical except for style
differences and support for big-endian CPUs.

Deduplicate the implementations for the rest of the platforms.  This
will make it easier to implement in_cksum() for unmapped mbufs.  On arm
and i386, define HAVE_MD_IN_CKSUM to mean that the MI implementation is
not to be compiled.

No functional change intended.

Reviewed by:	kp, glebius
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33095
2021-11-24 13:31:16 -05:00
Mark Johnston
09100f936b netinet: Remove in_cksum_update()
It was never implemented on powerpc or riscv and appears to have been
unused since it was added in 1998.  No functional change intended.

Reviewed by:	kp, glebius, cy
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D33093
2021-11-24 13:31:15 -05:00
Brooks Davis
6b7c23a026 syscalls: regen 2021-11-22 22:36:57 +00:00
Brooks Davis
f0cfbffc36 syscalls: regen 2021-11-22 22:36:56 +00:00
Mitchell Horne
10fe6f80a6 minidump: Use the provided dump bitset
When constructing the set of dumpable pages, use the bitset provided by
the state argument, rather than assuming vm_page_dump invariably. For
normal kernel minidumps this will be a pointer to vm_page_dump, but when
dumping the live system it will not.

To do this, the functions in vm_dumpset.h are extended to accept the
desired bitset as an argument. Note that this provided bitset is assumed
to be derived from vm_page_dump, and therefore has the same size.

Reviewed by:	kib, markj, jhb
MFC after:	2 weeks
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D31992
2021-11-19 15:05:52 -04:00
Mitchell Horne
1d2d1418b4 minidump: Use provided msgbuf pointer
Don't assume we are dumping the global message buffer, but use the one
provided by the state argument. While here, drop superfluous
cast to char *.

Reviewed by:	markj, jhb
MFC after:	2 weeks
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D31991
2021-11-19 15:05:52 -04:00
Mitchell Horne
681bd71047 minidump: reduce the amount direct accesses to page tables
During a live dump, we may race with updates to the kernel page tables.
This is generally okay; we accept that the state of the system while
dumping may be somewhat inconsistent with its state when the dump was
invoked. However, when walking the kernel page tables, it is important
that we load each PDE/PTE only once while operating on it. Otherwise, it
is possible to have the relevant PTE change underneath us. For example,
after checking the valid bit, but before reading the physical address.

Convert the loads to atomics, and add some validation around the
physical addresses, to ensure that we do not try to dump a non-existent
or non-canonical physical address.

Similarly, don't read kernel_vm_end more than once, on the off chance
that pmap_growkernel() is called between the two page table walks.

Reviewed by:	kib, markj
MFC after:	2 weeks
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D31990
2021-11-19 15:05:52 -04:00
Mitchell Horne
90d4da6225 amd64: provide PHYS_IN_DMAP() and VIRT_IN_DMAP()
It is useful for quickly checking an address against the DMAP region.
These definitions exist already on arm64 and riscv.

Reviewed by:	kib, markj
MFC after:	3 days
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D32962
2021-11-19 15:05:52 -04:00
Mitchell Horne
1adebe3cd6 minidump: Parameterize minidumpsys()
The minidump code is written assuming that certain global state will not
change, and rightly so, since it executes from a kernel debugger
context. In order to support taking minidumps of a live system, we
should allow copies of relevant global state that is likely to change to
be passed as parameters to the minidumpsys() function.

This patch does the work of parameterizing this function, by adding a
struct minidumpstate argument. For now, this struct allows for copies of
the kernel message buffer, and the bitset that tracks which pages should
be dumped (vm_page_dump). Follow-up changes will actually make use of
these arguments.

Notably, dump_avail[] does not need a snapshot, since it is not expected
to change after system initialization.

The existing minidumpsys() definitions are renamed, and a thin MI
wrapper is added to kern_dump.c, which handles the construction of
the state struct. Thus, calling minidumpsys() remains as simple as
before.

Reviewed by:	kib, markj, jhb
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D31989
2021-11-19 15:05:52 -04:00
Brooks Davis
ad58266704 freebsd32: remove redundant no-arg syscalls
pipe requires no special handling.

ofreebsd32_sigpending did differ from osigpending in that it acted
on the siglist rather than the sigqueue, but this appears to be an
oversight in 3fbdb3c215.

ogetpagesize could theoretically have ABI-dependent results, but in
practice does not. If it does it would be easy handle in the central
implementation and be the least of the problems in changing the value of
PAGE_SIZE.

Reviewed by:	kevans
2021-11-17 20:12:24 +00:00
Kristof Provost
4e85b64890 Add a COMPAT_FREEBSD13 kernel option
Use it wherever COMPAT_FREEBSD11 is currently specified.

Reviewed by:	jhb (previous version)
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D33005
2021-11-17 03:08:40 +01:00
Mark Johnston
ab12e8db29 amd64: Reduce the amount of cpuset copying done for TLB shootdowns
We use pmap_invalidate_cpu_mask() to get the set of active CPUs.  This
(32-byte) set is copied by value through multiple frames until we get to
smp_targeted_tlb_shootdown(), where it is copied yet again.

Avoid this copying by having smp_targeted_tlb_shootdown() make a local
copy of the active CPUs for the pmap, and drop the cpuset parameter,
simplifying callers.  Also leverage the use of the non-destructive
CPU_FOREACH_ISSET to avoid unneeded copying within
smp_targeted_tlb_shootdown().

Reviewed by:	alc, kib
Tested by:	pho
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32792
2021-11-15 13:01:31 -05:00
Mark Johnston
71e6e9da22 amd64: Initialize kernel_pmap's active CPU set to all_cpus
This is in preference to simply filling the cpuset, and allows the
conditional in pmap_invalidate_cpu_mask() to be elided.

Also export pmap_invalidate_cpu_mask() outside of pmap.c for use in a
subsequent commit.

Suggested by:	kib
Reviewed by:	alc, kib
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32792
2021-11-15 13:01:30 -05:00
Mark Johnston
42c2cd1ffb amd64: Annotate an unlikely condition in smp_targeted_tlb_shootdown()
Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-11-15 13:01:30 -05:00
Konstantin Belousov
c5658876b4 amd64/ia32/ia32_signal.c: Use ANSI C functions definitions
Remove MPSAFE annotations.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-11-13 19:31:53 +02:00
Warner Losh
7e3c9ec906 tcp: better congestion control defaults
Define CC_NEWRENO in all the appropriate DEFAULTS and std.* config
files. It's the default congestion control algorithm.  Add code to cc.c
so that CC_DEFAULT is "newreno" if it's not overriden in the config
file.

Sponsored by: Netflix
Fixes: b8d60729de ("tcp: Congestion control cleanup.")
Revired by: manu, hselasky, jhb, glebius, tuexen
Differential Revision:	https://reviews.freebsd.org/D32964
2021-11-12 12:16:11 -07:00
Warner Losh
eb6b6622fe Update MINIMAL to have CC options
The MINIMAL configs were overlooked. They are compiled as part of
universe, so this broke universe builds. Add the same defafults as for
GENERIC.

Sponsored by:		Netflix
2021-11-12 09:33:18 -07:00
Randall Stewart
b8d60729de tcp: Congestion control cleanup.
NOTE: HEADS UP read the note below if your kernel config is not including GENERIC!!

This patch does a bit of cleanup on TCP congestion control modules. There were some rather
interesting surprises that one could get i.e. where you use a socket option to change
from one CC (say cc_cubic) to another CC (say cc_vegas) and you could in theory get
a memory failure and end up on cc_newreno. This is not what one would expect. The
new code fixes this by requiring a cc_data_sz() function so we can malloc with M_WAITOK
and pass in to the init function preallocated memory. The CC init is expected in this
case *not* to fail but if it does and a module does break the
"no fail with memory given" contract we do fall back to the CC that was in place at the time.

This also fixes up a set of common newreno utilities that can be shared amongst other
CC modules instead of the other CC modules reaching into newreno and executing
what they think is a "common and understood" function. Lets put these functions in
cc.c and that way we have a common place that is easily findable by future developers or
bug fixers. This also allows newreno to evolve and grow support for its features i.e. ABE
and HYSTART++ without having to dance through hoops for other CC modules, instead
both newreno and the other modules just call into the common functions if they desire
that behavior or roll there own if that makes more sense.

Note: This commit changes the kernel configuration!! If you are not using GENERIC in
some form you must add a CC module option (one of CC_NEWRENO, CC_VEGAS, CC_CUBIC,
CC_CDG, CC_CHD, CC_DCTCP, CC_HTCP, CC_HD). You can have more than one defined
as well if you desire. Note that if you create a kernel configuration that does not
define a congestion control module and includes INET or INET6 the kernel compile will
break. Also you need to define a default, generic adds 'options CC_DEFAULT=\"newreno\"
but you can specify any string that represents the name of the CC module (same names
that show up in the CC module list under net.inet.tcp.cc). If you fail to add the
options CC_DEFAULT in your kernel configuration the kernel build will also break.

Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
RELNOTES:YES
Differential Revision: https://reviews.freebsd.org/D32693
2021-11-11 06:28:18 -05:00
Edward Tomasz Napierala
0bf8d5d5f4 linux: Replace ifdefs in ptrace with per-architecture callbacks
It's a cleanup; no (intended) functional changes.

Sponsored By:	EPSRC
Reviewed By:	kib
Differential Revision:	https://reviews.freebsd.org/D32888
2021-11-09 11:59:17 +00:00
Edward Tomasz Napierala
a90ff3c4bc linux: Add ptrace(2) support on arm64
This moves linux_ptrace.c from sys/amd64/linux/ to sys/compat/linux/,
making it possible to use it on architectures other than amd64.
It also enables Linux ptrace(2) on arm64.

Relnotes:	yes
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32868
2021-11-07 08:39:24 +00:00
Edward Tomasz Napierala
3be6e606d7 linux: Fix another amd64-specific piece of linux_ptrace.c
This was missed in c91d0e59be.  No functional changes.

Sponsored By:	EPSRC
2021-11-06 08:28:11 +00:00
Mark Johnston
175d3380a3 amd64: Deduplicate routines for expanding KASAN/KMSAN shadow maps
When working on the ports these functions were slightly different, but
now there's no reason for them to be separate.

No functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-11-03 12:36:02 -04:00
Edward Tomasz Napierala
c91d0e59be linux: Make linux_ptrace.c portable
Make sys/amd64/linux/linux_ptrace.c machine-independent,
in preparation for moving it into sys/compat/linux/.
No functional changes.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32756
2021-11-03 08:54:35 +00:00
Edward Tomasz Napierala
f0d9a6a781 linux: make PTRACE_SETREGS use a correct struct
Note that this is largely untested at this point, as was
the previous version; I'm committing this mostly to get
rid of `struct linux_pt_reg`.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32735
2021-10-30 10:13:37 +01:00
Edward Tomasz Napierala
ad0379660d linux: make PTRACE_GETREGS return correct struct
Previously it returned a shorter struct.  I can't find any
modern software that uses it, but tests/ptrace from strace(1)
repo complained.

Differential Revision: https://reviews.freebsd.org/D32601
2021-10-29 16:18:28 +01:00
Edward Tomasz Napierala
f939dccfd7 linux: Make PTRACE_GETREGSET return proper buffer size
This fixes Chrome warning:

[1022/152319.328632:ERROR:ptracer.cc(476)] Unexpected registers size 0 != 216, 68

Reviewed By:	emaste
Sponsored By:	EPSRC
Differential Revision: https://reviews.freebsd.org/D32616
2021-10-29 15:31:33 +01:00
Edward Tomasz Napierala
6547153e46 linux: Fix ptrace panic with ERESTART
Translate ERESTART into Linux "internal" errno ERESTARTSYS.
This fixes the erestartsys.gen.test from strace(1).

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32623
2021-10-29 14:55:59 +01:00
Konstantin Belousov
0b3bc72889 amd64 pmap: adjust the empty pmap optimization in pmap_remove()
to match the added accounting of the top-level page table pages.

Reviewed by:	markj
Tested by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32569
2021-10-28 22:01:58 +03:00
Konstantin Belousov
e93b5adb6b amd64 pmap: account for the top-level pages
both for kernel and user page tables, the later exist in the PTI case.

Reviewed by:	markj
Tested by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32569
2021-10-28 22:01:58 +03:00
Edward Tomasz Napierala
2ec26ae402 linux: Improve debug for PTRACE_GETEVENTMSG
No functional changes.

Sponsored By:	EPSRC
2021-10-23 19:53:12 +01:00
Edward Tomasz Napierala
6e66030c4c linux: implement PTRACE_EVENT_EXEC
This fixes strace(1) from Ubuntu Focal.

Reviewed By:	jhb
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32367
2021-10-23 19:46:26 +01:00
Edward Tomasz Napierala
2558bb8e91 linux: Make PTRACE_GET_SYSCALL_INFO handle EJUSTRETURN
This fixes panic when trying to run strace(8) from Focal.

Reviewed By:	kib
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32355
2021-10-23 18:56:39 +01:00
Edward Tomasz Napierala
e3a83df119 linux: Improve debug for PTRACE_GETREGSET
No functional changes.

Sponsored By:	EPSRC
2021-10-23 09:30:06 +01:00
Edward Tomasz Napierala
3417c29851 linux: Constify bsd_to_linux_regset()
No functional changes.

Reviewed By:	emaste
Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32599
2021-10-23 08:33:58 +01:00
Mark Johnston
4c812fe61b vlapic: Schedule callouts on the local CPU
The virtual LAPIC driver uses callouts to implement the LAPIC timer.
Callouts are armed using callout_reset_sbt(), which currently puts
everything on CPU 0.  On systems running many bhyve VMs this results in
a large amount of contention for CPU 0's callout lock.

Modify vlapic to schedule callouts on the local CPU instead.  This
allows timer interrupts to be scheduled more evenly among CPUs where
bhyve is running.

Reviewed by:	grehan, jhb
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32559
2021-10-19 21:22:57 -04:00
Mark Johnston
34fac29e98 amd64: Add comments to pmap_pinit_type()
... explaining why we don't pass the pmap pointer to
pmap_alloc_pt_page().

Reported by:	alc
Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32528
2021-10-19 21:22:57 -04:00
Mark Johnston
ff93447d8e Use the vm_radix_init() helper when initializing pmaps
No functional change intended.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32527
2021-10-19 21:22:56 -04:00
Mark Johnston
84c3922243 Convert consumers to vm_page_alloc_noobj_contig()
Remove now-unneeded page zeroing.  No functional change intended.

Reviewed by:	alc, hselasky, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32006
2021-10-19 21:22:56 -04:00
Mark Johnston
a4667e09e6 Convert vm_page_alloc() callers to use vm_page_alloc_noobj().
Remove page zeroing code from consumers and stop specifying
VM_ALLOC_NOOBJ.  In a few places, also convert an allocation loop to
simply use VM_ALLOC_WAITOK.

Similarly, convert vm_page_alloc_domain() callers.

Note that callers are now responsible for assigning the pindex.

Reviewed by:	alc, hselasky, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D31986
2021-10-19 21:22:56 -04:00
Mark Johnston
de8554295b cpuset(9): Add CPU_FOREACH_IS(SET|CLR) and modify consumers to use it
This implementation is faster and doesn't modify the cpuset, so it lets
us avoid some unnecessary copying as well.  No functional change
intended.

This is a re-application of commit
9068f6ea69.

Reviewed by:	cem, kib, jhb
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32029
2021-10-18 09:56:58 -04:00
Mark Johnston
b0423d0f5e amd64: Zero the PML5 PTI page when initializing a pmap
The root page is not zeroed at allocation time since with 4-level tables
each entry is copied from a template.  However, with 5-level tables only
a single entry is filled, so the rest need to be cleared.

Reported by:	alc
Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32525
2021-10-18 09:50:42 -04:00
Edward Tomasz Napierala
a03d4d73e4 linux: Improve debugging for PTRACE_GETREGSET
It's triggered by gdb(1).

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32456
2021-10-17 12:53:16 +01:00
Edward Tomasz Napierala
f9246e1484 linux: Implement some bits of PTRACE_PEEKUSER
This makes Linux gdb from Bionic a little less broken.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32455
2021-10-17 12:20:21 +01:00
Edward Tomasz Napierala
75a9d95b4d linux: Adjust PTRACE_GET_SYSCALL_INFO buffer size semantics
The tests/ptrace_syscall_info test from strace(1) complained
about this.

Sponsored By:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D32368
2021-10-17 11:49:46 +01:00
Konstantin Belousov
e81e77c5a0 Enable PPS_SYNC on amd64, arm64 and armv7
Remove the option from NOTES/LINT, and add to NOTES for powerpc and
riscv.

PR:	259036
Requested by:	John Hay <john@sanren.ac.za>
Discussed with:	ian, imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2021-10-10 22:34:40 +03:00
Konstantin Belousov
ce21d4bff1 amd64 efirt: do not flush cache for runtime pages
We actually do not know is it safe or not to flush cache for random
BAR/register page existing in the system.  It is well-known that for
instance LAPICs cannot tolerate cache flush.  As report indicates,
there are more such devices.

This issue typically affects AMD machines which do not report self-snoop,
causing real CLFLUSH invocation on the mapped pages.  Intels do self-snoop,
so this change should be nop for them, and unsafe devices, if any, are
already ignored.

Reported and tested by:	manu
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32318
2021-10-06 05:53:20 +03:00
Konstantin Belousov
33c17670af amd64: add pmap_page_set_memattr_noflush()
Similar to pmap_page_set_memattr() by setting MD page cache attribute
to the argument.  Unlike pmap_page_set_memattr(), does not flush cache
for the direct mapping of the page.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32318
2021-10-06 05:53:12 +03:00
Mitchell Horne
ab4ed843a3 minidump: De-duplicate the progress bar
The implementation of the progress bar is simple, but duplicated for
most minidump implementations. Extract the common bits to kern_dump.c.
Ensure that the bar is reset with each subsequent dump; this was only
done on some platforms previously.

Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D31885
2021-09-29 16:42:21 -03:00
Mitchell Horne
31991a5a45 minidump: De-duplicate is_dumpable()
The function is identical in each minidump implementation, so move it to
vm_phys.c. The only slight exception is powerpc where the function was
public, for use in moea64_scan_pmap().

Reviewed by:	kib, markj, imp (earlier version)
MFC after:	2 weeks
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D31884
2021-09-29 16:41:52 -03:00
Kirk McKusick
1c8d670cb6 Bring the tags and links entries for amd64 up to date.
MFC after:    1 week
Sponsored by: Netflix
2021-09-27 20:04:51 -07:00
Konstantin Belousov
b1e2f063ae amd64 sendsig: fix context corruption
Drop fpstate only after copying out xfpustate from the thread usermode
save area. Otherwise a context switch between get_fpcontext(), which now
returns the pointer directly into user save area, and copyout, would
cause reinit of the save area, loosing user registers.

Reported, reviewed, and tested by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D32159
2021-09-27 20:12:46 +03:00
Mark Johnston
f766826fe3 amd64: Remove proc0_tf, the bootstrap trapframe
It no longer serves any purpose as thread0's td_frame field is now
initialized during fpuinitstate().  No functional change intended.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32057
2021-09-25 10:18:52 -04:00
Mark Johnston
ca1e447b10 amd64: Avoid copying td_frame from kernel procs
When creating a new thread, we unconditionally copy td_frame from the
creating thread.  For threads which never return to user mode, this is
unnecessary since td_frame just points to the base of the stack or a
random interrupt frame.

If KASAN is configured this copying may also trigger false positives
since the td_frame region may contain poisoned stack regions.  It was
not noticed before since thread0 used a dummy proc0_tf trapframe, and
kernel procs are generally created by thread0.  Since commit
df8dd6025a, though, we call
cpu_thread_alloc(&thread0) when initializing FPU state, which
reinitializes thread0.td_frame.

Work around the problem by not copying the frame unless the copying
thread came from user mode.  While here, de-duplicate the copying and
remove redundant re(initialization) of td_frame.

Reported by:	syzbot+2ec89312bffbf38d9aec@syzkaller.appspotmail.com
Reviewed by:	kib
Fixes:		df8dd6025a
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32057
2021-09-25 10:18:30 -04:00
Konstantin Belousov
e36d0e86e3 Revert "linux32: add a hack to avoid redefining the type of the savefpu tag"
This reverts commit 0f6829488e.
Also it changes the type of md_usr_fpu_save struct mdthread member
to void *, which is what uncovered this trouble.  Now the save area
is untyped, but since it is hidden behind accessors, it is not too
significant.  Since apparently there are consumers affected outside
the tree, this hack is better than one from the reverted revision.

PR:	258678
Reported by:	cy
Reviewed by:	cy, kevans, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D32060
2021-09-22 23:17:47 +03:00
Konstantin Belousov
cf0ee8738e Drop cloudabi
According to https://github.com/NuxiNL/cloudlibc:
CloudABI is no longer being maintained. It was an awesome experiment,
but it never got enough traction to be sustainable.

There is no reason to keep it in FreeBSD.

Approved by:	ed (private mail)
Reviewed by:	emaste
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D31923
2021-09-22 00:18:44 +03:00
Konstantin Belousov
c2ee4dfd04 ia32_get_fpcontext(): xfpusave can be legitimately NULL
Reported by:	cy
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Fixes:	bd9e0f5df6
2021-09-22 00:17:06 +03:00
Mark Johnston
bcdc599dc2 Revert "cpuset(9): Add CPU_FOREACH_IS(SET|CLR) and modify consumers to use it"
This reverts commit 9068f6ea69.

The underlying macro needs to be reworked to avoid problems with control
flow statements.

Reported by:	rlibby
2021-09-21 13:51:42 -04:00
Konstantin Belousov
2e79a21632 amd64: consistently use uprintf() to report weird situations in sigreturn
Reviewed by:	jhb
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31954
2021-09-21 20:20:15 +03:00
Konstantin Belousov
bd9e0f5df6 amd64: eliminate td_md.md_fpu_scratch
For signal send, copyout from the user FPU save area directly.

For sigreturn, we are in sleepable context and can do temporal
allocation of the transient save area.  We cannot copying from userspace
directly to user save area because XSAVE state needs to be validated,
also partial copyins can corrupt it.

Requested by:	jhb
Reviewed by:	jhb, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31954
2021-09-21 20:20:15 +03:00
Konstantin Belousov
df8dd6025a amd64: stop using top of the thread' kernel stack for FPU user save area
Instead do one more allocation at the thread creation time.  This frees
a lot of space on the stack.

Also do not use alloca() for temporal storage in signal delivery sendsig()
function and signal return syscall sys_sigreturn().  This saves equal
amount of space, again by the cost of one more allocation at the thread
creation time.

A useful experiment now would be to reduce KSTACK_PAGES.

Reviewed by:	jhb, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31954
2021-09-21 20:20:15 +03:00
Konstantin Belousov
9151abe323 exec_machdep.c: some style, use ANSI C definition for sys_sigreturn()
Reviewed by:	jhb, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31954
2021-09-21 20:20:15 +03:00
Konstantin Belousov
12ca33f44f amd64: move signal handling and register structures manipulations into exec_machdep.c
from machdep.c which is too large pile of unrelated things.
Some ptrace functions are moved from machdep.c to ptrace_machdep.c.

Now machdep.c contains code mostly related to the low level initialization
and regular low level operation of the architecture, while signal MD code
and registers handling is placed in exec_machdep.c.

Reviewed by:	jhb, markj
Discussed with:	jrtc27
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31954
2021-09-21 20:20:15 +03:00
Konstantin Belousov
a42d362bb5 amd64: centralize definitions of CS_SECURE and EFL_SECURE
Requested by	markj
Reviewed by:	jhb, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31954
2021-09-21 20:20:14 +03:00
Mark Johnston
9068f6ea69 cpuset(9): Add CPU_FOREACH_IS(SET|CLR) and modify consumers to use it
This implementation is faster and doesn't modify the cpuset, so it lets
us avoid some unnecessary copying as well.  No functional change
intended.

Reviewed by:	cem, kib, jhb
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D32029
2021-09-21 12:07:47 -04:00
Konstantin Belousov
f575573ca5 Remove PT_GET_SC_ARGS_ALL
Reimplement bdf0f24bb1 by checking for the caller' ABI in
the implementation of PT_GET_SC_ARGS, and copying out everything if
it is Linuxolator.

Also fix a minor information leak: if PT_GET_SC_ARGS_ALL is done on the
thread reused after other process, it allows to read some number of that
thread last syscall arguments. Clear td_sa.args in thread_alloc().

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D31968
2021-09-16 20:11:27 +03:00
Edward Tomasz Napierala
bdf0f24bb1 linux: implement PTRACE_GET_SYSCALL_INFO
This is one of the pieces required to make modern (ie Focal)
strace(1) work.

Reviewed By:	jhb (earlier version)
Sponsored by:	EPSRC
Differential Revision:	https://reviews.freebsd.org/D28212
2021-09-14 20:19:55 +00:00
Konstantin Belousov
1c56781cc9 amd64 wakeup: rework trampoline page allocation
There is no need to restrict trampoline page table to low 1M, it
should work with any pages below 4G.  Only wakeup code itself should
be below 1M.

Do not waste level 5 page when LA48 mode is used.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31931
2021-09-14 00:23:15 +03:00
Konstantin Belousov
2b6eec531a x86: duplicate acpi_wakeup.c per i386 and amd64
The file as is is the maze of #ifdef passages, all slightly different.
Divorcing i386 and amd64 version actually makes changing the code
easier, also no changes for i386 are planned.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D31931
2021-09-14 00:23:14 +03:00
Alexander Motin
7af4475a6e vmd(4): Major driver refactoring
- Re-implement pcib interface to use standard pci bus driver on top of
vmd(4) instead of custom one.
 - Re-implement memory/bus resource allocation to properly handle even
complicated configurations.
 - Re-implement interrupt handling to evenly distribute children's MSI/
MSI-X interrupts between available vmd(4) MSI-X vectors and setup them
to be handled by standard OS mechanisms with minimal overhead, except
sharing when unavoidable.

Successfully tested on Dell XPS 13 laptop with Core i7-1185G7 CPU (VMD
device ID 0x9a0b) and single NVMe SSD, dual-booting with Windows 10.

Successfully tested on Supermicro X11DPI-NT motherboard with Xeon(R)
Gold 6242R CPUs (VMD device ID 0x201d), simultaneously handling NVMe
SSD on one PCIe port and PLX bridge with 3 NVMe and 1 AHCI SSDs on
another.  Handles SSD hot-plug (except Optane 905p for some reason,
which are not detected until manual bus rescan) and enabled IOMMU
(directly connected SSDs work, but ones connected to the PLX fail
without errors from IOMMU).

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential revision:	https://reviews.freebsd.org/D31762
2021-09-02 20:58:02 -04:00
Konstantin Belousov
9939af1a16 amd64: correctly calculate KVA of the preloaded ucode blob
when kernphys != 2M

Reported and tested by:	kbowling
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2021-08-31 04:46:12 +03:00