Commit Graph

8879 Commits

Author SHA1 Message Date
Konstantin Belousov
e0612ed490 amd64 pmap: add comment explaining why INVLPG is functional for PCID config
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36919
2022-10-11 00:33:17 +03:00
Konstantin Belousov
273d0715f6 amd64: remove useless addr2 variables in page range invalidation handlers
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36919
2022-10-11 00:33:12 +03:00
Mark Johnston
98d920d9cf bhyve: Annotate unused function parameters
MFC after:	1 week
2022-10-08 11:33:21 -04:00
John Baldwin
4d90a5afc5 sys: Consolidate common implementation details of PV entries.
Add a <sys/_pv_entry.h> intended for use in <machine/pmap.h> to
define struct pv_entry, pv_chunk, and related macros and inline
functions.

Note that powerpc does not yet use this as while the mmu_radix pmap
in powerpc uses the new scheme (albeit with fewer PV entries in a
chunk than normal due to an used pv_pmap field in struct pv_entry),
the Book-E pmaps for powerpc use the older style PV entries without
chunks (and thus require the pv_pmap field).

Suggested by:	kib
Reviewed by:	kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D36685
2022-10-07 10:14:03 -07:00
Mitchell Horne
b05b1ecbef amd64, arm64 pmap: fix a comment typo
There is no such error code.

Fixes:	1d5ebad06c ("pmap: optimize MADV_WILLNEED on existing superpages")
2022-10-06 19:04:54 -03:00
Konstantin Belousov
85b715baae amd64/db_trace.c: remove stray prototype
Sponsored by:	NVIDIA networking
MFC after:	1 week
2022-10-04 01:50:30 +03:00
Mitchell Horne
754cb545b6 ddb: de-duplicate decode_syscall()
Only i386 and amd64 print the decoded syscall name in the backtrace.
This de-duplication facilitates further changes and adoption by other
platforms.

Reviewed by:	jrtc27, markj, jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D36565
2022-10-03 13:49:54 -03:00
Alan Cox
1d5ebad06c pmap: optimize MADV_WILLNEED on existing superpages
Specifically, avoid pointless calls to pmap_enter_quick_locked() when
madvise(MADV_WILLNEED) is applied to an existing superpage mapping.

Reported by:	mhorne
Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D36801
2022-09-30 12:14:05 -05:00
John Baldwin
a35572b16e linux32: binutils as requires %eflags instead of %flags for CFI.
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D36781
2022-09-29 15:06:01 -07:00
Konstantin Belousov
648fa3558c amd64: Initialize IPI scoreboard earlier
Scoreboard is needed a moment when smp_started == true.  If some kernel
daemon thread is started before scoreboard is inited, and does some pmap
operation that requires TLB maintanence, which races with SMP startup,
we might dereference NULL invl_scoreboard.  This is particularly easy
to trigger when EARLY_AP_STARTUP is not defined.

Reported by:	glebius
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36766
2022-09-28 16:23:52 +03:00
Mark Johnston
4551cbbe99 amd64: Ignore 1GB mappings in pmap_advise()
This assertion can be triggered by usermode since vm_map_madvise()
doesn't force advice to be applied to an entire largepage mapping.  I
can't see any reason not to permit it, however, since MADV_DONTNEED and
_FREE are advisory and we can simply do nothing when a 1GB mapping is
encountered.

Reviewed by:	alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D36675
2022-09-24 09:28:41 -04:00
Mark Johnston
6c2e9f4c32 amd64: Handle 1GB mappings in pmap_enter_quick_locked()
This code path can be triggered by applying MADV_WILLNEED to a 1GB
mapping.

Reviewed by:	alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D36674
2022-09-24 09:28:41 -04:00
Mark Johnston
0b29f5efcc amd64: Make it possible to grow the KERNBASE region of KVA
pmap_growkernel() may be called when mapping a region above KERNBASE,
typically for a kernel module.  If we have enough PTPs left over from
bootstrap, pmap_growkernel() does nothing.  However, it's possible to
run out, and in this case pmap_growkernel() will try to grow the kernel
map all the way from kernel_vm_end to somewhere past KERNBASE, which can
easily run the system out of memory.  This happens with large kernel
modules such as the nvidia GPU driver.  There is also a WIP dtrace
provider which needs to map KVA in the region above KERNBASE (to provide
trampolines which allow a copy of traced kernel instruction to be
executed), and its allocations could potentially trigger this scenario.

This change modifies pmap_growkernel() to manage the two regions
separately, allowing them to grow independently.  The end of the
KERNBASE region is tracked by modifying "nkpt".

PR:		265019
Reviewed by:	alc, imp, kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D36673
2022-09-24 09:27:50 -04:00
John Baldwin
f49fd63a6a kmem_malloc/free: Use void * instead of vm_offset_t for kernel pointers.
Reviewed by:	kib, markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D36549
2022-09-22 15:09:19 -07:00
John Baldwin
7ae99f80b6 pmap_unmapdev/bios: Accept a pointer instead of a vm_offset_t.
This matches the return type of pmap_mapdev/bios.

Reviewed by:	kib, markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D36548
2022-09-22 15:08:52 -07:00
Richard Scheffenegger
bb1d472d79 tcp: make CUBIC the default congestion control mechanism.
This changes the default TCP Congestion Control (CC) to CUBIC.
For small, transactional exchanges (e.g. web objects <15kB), this
will not have a material effect. However, for long duration data
transfers, CUBIC allocates a slightly higher fraction of the
available bandwidth, when competing against NewReno CC.

Reviewed By: tuexen, mav, #transport, guest-ccui, emaste
Relnotes: Yes
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D36537
2022-09-13 12:09:21 +02:00
Alan Cox
8d7ee2047c pmap: don't recompute mpte during promotion
When attempting to promote 4KB user-space mappings to a 2MB user-space
mapping, the address of the struct vm_page representing the page table
page that contains the 4KB mappings is already known to the caller.
Pass that address to the promotion function rather than making the
promotion function recompute it, which on arm64 entails iteration over
the vm_phys_segs array by PHYS_TO_VM_PAGE().  And, while I'm here,
eliminate unnecessary arithmetic from the calculation of the first PTE's
address on arm64.

MFC after:	1 week
2022-09-11 01:19:22 -05:00
Emmanuel Vadot
3fc174845c Revert "vmm: permit some IPIs to be handled by userspace"
This reverts commit a5a918b7a9.

This cause some problem with vm using bhyveload.

Reported by:	pho, kp
2022-09-09 15:55:01 +02:00
Emmanuel Vadot
83b65d0ae1 Revert "vmm: Remove unneeded variable maxcpus"
This reverts commit 653c36179d.
2022-09-09 15:54:56 +02:00
Emmanuel Vadot
653c36179d vmm: Remove unneeded variable maxcpus
Reported by:	FreeBSD User <freebsd@walstatt-de.de>
Fixes:	a5a918b7a9 ("vmm: permit some IPIs to be handled by userspace")
2022-09-07 11:41:16 +02:00
Corvin Köhne
a5a918b7a9 vmm: permit some IPIs to be handled by userspace
Add VM_EXITCODE_IPI to permit returning unhandled IPIs to userland.
INIT and Startup IPIs are now returned to userland. Due to backward
compatibility reasons, a new capability is added for enabling
VM_EXITCODE_IPI.

MFC after:              2 weeks
Differential Revision:  https://reviews.freebsd.org/D35623
Sponsored by:           Beckhoff Automation GmbH & Co. KG
2022-09-07 09:07:03 +02:00
Warner Losh
991aef9795 acpi: Move some errors with RSDP and XSLT out from under bootverbose
Failure to map RSDP, XSLT and checksum failures are events that can't
happen unless something has gone wrong. As such, they should be reported
always, and not in bootverbose. This has been this way since it was
originally brought in to parse APIC tables.

Sponsored by:		Netflix
Reviewed by:		andrew
Differential Revision:	https://reviews.freebsd.org/D36406
2022-09-01 10:40:15 -06:00
Warner Losh
a14b26a6bd acpi: Unmap RSDP in more error cases
Add missing pmap_unmapbios() calls for when we return 0. Otherwise we
can leave the table mapped when it is of no use.

Sponsored by:		Netflix
Reviewed by:		andrew
Differential Revision:	https://reviews.freebsd.org/D36405
2022-09-01 10:39:20 -06:00
Konstantin Belousov
c1a0ab5ec5 amd64: update comment for casueword/casueword32, mentioning return value 1
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2022-08-26 02:41:48 +03:00
Mateusz Guzik
e621cb0be2 amd64: dump standard registers when crashing
Sample output:

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0x2
fault code              = supervisor write data, page not present
instruction pointer     = 0x20:0xffffffff80556853
stack pointer           = 0x28:0xffffffff8141bf50
frame pointer           = 0x28:0xffffffff8141bfa0
code segment            = base 0x0, limit 0xfffff, type 0x1b
		        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (swapper)
rdi: fffff80002c9c400 rsi: ffffffff80b89183 rdx:                0
rcx:                2  r8:               fe  r9:                1
rax: fffff80002c9c400 rbx:                1 rbp: ffffffff8141bfa0
r10:                0 r11: ffffffff80b97f8c r12:                0
r13:                0 r14:                0 r15:                0
trap number             = 12
panic: page fault
cpuid = 1
time = 1

Reviewed by:	kib
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D36348
2022-08-25 17:33:07 +00:00
Konstantin Belousov
ff32a05554 x86: improve machdep.uprintf_signal
Print %eax/%rax.
Use better format strings, like %#x.

Reviewed by:	jhb
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36302
2022-08-24 22:12:45 +03:00
Konstantin Belousov
01a33b2af5 x86: print trap name in addition of trap number
for the "trap with interrupts disabled" warning.

Reviewed by:	jhb
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36302
2022-08-24 22:12:37 +03:00
John Baldwin
e663907366 Define _NPCM and the last PC_FREEn constant in terms of _NPCPV.
This applies one of the changes from
5567d6b441 to other architectures
besides arm64.

Reviewed by:	kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D36263
2022-08-23 13:31:02 -07:00
John Baldwin
c94f30ea85 bhyve: Validate host PAs used to map passthrough BARs.
Reject attempts to map host physical address ranges that are not
subsets of a passthrough device's BAR into a guest.

Reviewed by:	markj, emaste
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D36238
2022-08-19 15:03:01 -07:00
Maxim Sobolev
6a70a0c8bf Document implicit dependencies of the mlx5(4) & friends.
MFC after:      2 weeks
2022-08-11 16:33:09 -07:00
Mateusz Guzik
648edd6378 x86: remove MP_WATCHDOG
It does not work with ULE, which is the default scheduler for over a
decade.

Reviewed by:	emaste, kib
Differential Revision:	https://reviews.freebsd.org/D36094
2022-08-11 21:35:32 +00:00
Konstantin Belousov
c6d31b8306 AST: rework
Make most AST handlers dynamically registered.  This allows to have
subsystem-specific handler source located in the subsystem files,
instead of making subr_trap.c aware of it.  For instance, signal
delivery code on return to userspace is now moved to kern_sig.c.

Also, it allows to have some handlers designated as the cleanup (kclear)
type, which are called both at AST and on thread/process exit.  For
instance, ast(), exit1(), and NFS server no longer need to be aware
about UFS softdep processing.

The dynamic registration also allows third-party modules to register AST
handlers if needed.  There is one caveat with loadable modules: the
code does not make any effort to ensure that the module is not unloaded
before all threads processed through AST handler in it.  In fact, this
is already present behavior for hwpmc.ko and ufs.ko.  I do not think it
is worth the efforts and the runtime overhead to try to fix it.

Reviewed by:	markj
Tested by:	emaste (arm64), pho
Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D35888
2022-08-02 21:11:09 +03:00
Konstantin Belousov
4a5ec55af6 amd64: expicitly re-init td_frame in copy_thread()
Otherwise we are using whatever the value was left from the previous
thread run on kernel entry from usermode. Typically it would be the
desired value as is, but it is not guaranteed.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D35888
2022-08-02 21:11:09 +03:00
Corvin Köhne
4eadbef924 vmm: emulate INVD by ignoring it
On physical systems the ram isn't initialized on boot. So, coreboot uses
the cache as ram in this boot phase. When exiting cache as ram, coreboot
calls INVD for making the cache consistent.

In a virtual environment ram is always initialized and the cache is
always consistent. So, we can safely ignore this call.

Reviewed by:	jhb, imp
Differential Revision:	https://reviews.freebsd.org/D35620
Sponsored by:	Beckhoff Automation GmbH & Co. KG
2022-07-27 18:20:47 +02:00
Mark Johnston
f4f56ff43d qat: Rename to qat_c2xxx and remove support for modern chipsets
A replacement QAT driver will be imported, but this replacement does not
support Atom C2xxx hardware.  So, the existing driver will be kept
around to provide opencrypto offload support for those chipsets.

Reviewed by:	pauamma, emaste
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35817
2022-07-27 11:10:52 -04:00
Dimitry Andric
7a1f289bd2 Fix unused variable warning in amd64's pmap.c
With clang 15, the following -Werror warning is produced:

    sys/amd64/amd64/pmap.c:8274:22: error: variable 'freed' set but not used [-Werror,-Wunused-but-set-variable]
            int allfree, field, freed, i, idx;
                                ^

The 'freed' variable is only used when PV_STATS is defined. Ensure it is
only declared and set in that case.

MFC after:	3 days
2022-07-26 22:08:10 +02:00
Mitchell Horne
c84c5e00ac ddb: annotate some commands with DB_CMD_MEMSAFE
This is not completely exhaustive, but covers a large majority of
commands in the tree.

Reviewed by:	markj
Sponsored by:	Juniper Networks, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D35583
2022-07-18 22:06:09 +00:00
Kornel Dulęba
361971fbca Rework how shared page related data is stored
Store the shared page address in struct vmspace.
Also instead of storing absolute addresses of various shared page
segments save their offsets with respect to the shared page address.
This will be more useful when the shared page address is randomized.

Approved by:	mw(mentor)
Sponsored by:	Stormshield
Obtained from:	Semihalf
Reviewed by:	kib
Differential Revision: https://reviews.freebsd.org/D35393
2022-07-18 16:27:32 +02:00
Kornel Dulęba
f6ac79fb12 Introduce the PROC_SIGCODE() macro
Use a getter macro instead of fetching the sigcode address directly
from a sysent of a given process. It assumes that the sigcode is stored
in the shared page, which is true in all cases, except for a.out
binaries. This will be later useful when the shared page address
randomization is introduced.
No functional change intended.

Approved by:	mw(mentor)
Sponsored by:	Stormshield
Obtained from:	Semihalf
Reviewed by:	kib
Differential Revision: https://reviews.freebsd.org/D35392
2022-07-18 16:27:26 +02:00
Colin Percival
07007f3147 uart: Don't check SPCR tables if !late_console
On x86 systems, the debug.late_console tunable makes it possible to set
up the console before we call pmap_bootstrap.  (The tunable is turned
on by default; setting late_console=0 results in consoles being probed
early.)

Unfortunately this is not compatible with using the ACPI SPCR table to
find the console, since consulting ACPI tables requires mapping memory
addresses.  As such, we skip the call to uart_cpu_acpi_spcr from
uart_cpu_x86 in the !late_console case.

Reviewed by:	imp
Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D35794
2022-07-13 23:17:44 -07:00
Mark Johnston
6b38974085 vm_object: Modify various drivers to allocate OBJT_SWAP objects
This is in preparation for removal of OBJT_DEFAULT.  In particular, it
is now cheap to check whether an OBJT_SWAP object has any swap blocks
allocated, so the benefit of having a separate OBJT_DEFAULT type is
quite marginal, and the OBJT_DEFAULT->SWAP transition is a source of
bugs.

Reviewed by:	alc, hselasky, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35779
2022-07-12 09:10:15 -04:00
Warner Losh
ea5b2d6242 MIMIMAL: add uart
While uart could be detected completely through plug and play means, add
it here for two reasons. First, we don't do that from the loader, so
it's not available as a console. Second, even if we did do it from the
loader, there's a limitation in the system today that console drivers
must be compiled into the kernel because the console is selected before
external modules are linked into the kernel. Adding it only increases
the kernel size by ~14k as well.

Sponsored by:		Netflix
Idea liked by:		des, rpokala, brooks, jhb
2022-07-01 11:24:51 -06:00
Mihai Burcea
9aa02d5120 vmm: Fix snapshots for AMD CPUs
This patch fixes the AMD implementation for snapshotting.  It removes
unnecessary vmcb fields that should not be saved and duplicates.

Reviewed by:	jhb
Differential Revision: https://reviews.freebsd.org/D33431
2022-06-30 16:13:55 -07:00
Vitaliy Gusev
5afcca138f vmm: Cherry pick illumos commit '13361 bhyve should mask RDT cpuid info'
Summary:
    commit  1a5f1879be09d3de900b2510692dd12003784d84
    Author: Patrick Mooney <pmooney@pfmooney.com>
    Date:   2020-12-16T20:02:23.000Z

        13361 bhyve should mask RDT cpuid info
        Reviewed by: Andy Fiddaman <andy@omnios.org>
        Reviewed by: Toomas Soome <tsoome@me.com>
        Approved by: Robert Mustacchi <rm@fingolfin.org>

    https://github.com/illumos/illumos-gate/commit/1a5f1879be09d3de900b2510692dd12003784d8

----

We saw similar warning of GP (on Intel Xeon CPU E5-2630 v4 and VM with Ubuntu 20.04 5.4.0-113-generic)  until this commit is applied:

```
[    1.658880] kernel: unchecked MSR access error: WRMSR to 0xc8f (tried to write 0x0000000000000000) at rIP: 0xffffffffacc735b4 (native_write_msr+0x4/0x30)
[    1.662734] kernel: Call Trace:
[    1.663885] kernel:  ? clear_closid_rmid.isra.0+0x36/0x40
[    1.665501] kernel:  resctrl_online_cpu+0xdc/0x3f0
[    1.666952] kernel:  ? __switch_to_asm+0x40/0x70
[    1.668358] kernel:  ? __switch_to+0x7f/0x480
[    1.669693] kernel:  ? cat_wrmsr+0x70/0x70
[    1.670970] kernel:  cpuhp_invoke_callback+0x9b/0x580
[    1.672541] kernel:  ? __schedule+0x2eb/0x740
[    1.673893] kernel:  cpuhp_thread_fun+0xb8/0x120
[    1.675304] kernel:  smpboot_thread_fn+0xd0/0x170
[    1.676685] kernel:  kthread+0x104/0x140
[    1.677948] kernel:  ? sort_range+0x30/0x30
[    1.679299] kernel:  ? kthread_park+0x90/0x90
[    1.680570] kernel:  ret_from_fork+0x35/0x40
[    1.682000] kernel: *** VALIDATE rdt ***
[    1.683454] kernel: resctrl: L3 monitoring detected
```

Reviewed by:	markj, jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D35442
2022-06-30 10:27:27 -07:00
Dmitry Chagin
050f5a8405 amd64: Reload CPU ext features after resume or cr4 changes
Reviewed by:		kib
Differential revision:	https://reviews.freebsd.org/D35555
MFC after:		2 weeks
2022-06-29 10:34:43 +03:00
Roger Pau Monné
881c145431 elfnote: place note in a PT_NOTE program header
Some tools (firecraker loader) only check for notes in PT_NOTE program
headers, so make sure the notes added using the ELFNOTE macro end up
in such header.

Output from readelf -Wl for and amd64 kernel after the change:

Elf file type is EXEC (Executable file)
Entry point 0xffffffff8038a000
There are 11 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0xffffffff80200040 0x0000000000200040 0x000268 0x000268 R   0x8
  INTERP         0x0002a8 0xffffffff802002a8 0x00000000002002a8 0x00000d 0x00000d R   0x1
      [Requesting program interpreter: /red/herring]
  LOAD           0x000000 0xffffffff80200000 0x0000000000200000 0x189e28 0x189e28 R   0x200000
  LOAD           0x18a000 0xffffffff8038a000 0x000000000038a000 0xe447e8 0xe447e8 R E 0x200000
  LOAD           0xfce7f0 0xffffffff811ce7f0 0x00000000011ce7f0 0x6b955c 0x6b955c R   0x200000
  LOAD           0x1800000 0xffffffff81a00000 0x0000000001a00000 0x000140 0x000140 RW  0x200000
  LOAD           0x1801000 0xffffffff81a01000 0x0000000001a01000 0x1c8480 0x5ff000 RW  0x200000
  DYNAMIC        0x1800000 0xffffffff81a00000 0x0000000001a00000 0x000140 0x000140 RW  0x8
  GNU_RELRO      0x1800000 0xffffffff81a00000 0x0000000001a00000 0x000140 0x000140 R   0x1
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
  NOTE           0x1687ae0 0xffffffff81887ae0 0x0000000001887ae0 0x0001c0 0x0001c0 R   0x4

 Section to Segment mapping:
  Segment Sections...
[...]
   10     .note.gnu.build-id .note.Xen

Reported by: cperciva
Fixes: 1a9cdd373a ('xen: add PV/PVH kernel entry point')
Fixes: 93ee134a24 ('Integrate support for xen in to i386 common code.')
Sponsored by: Citrix Systems R&D
Reviewed by: emaste
Differential revision: https://reviews.freebsd.org/D35611
2022-06-28 09:51:57 +02:00
Dmitry Chagin
d416ee86c7 linux(4): To reuse MD linux.h hide kernel dependencies unde _KERNEL constraint
MFC after:		2 weeks
2022-06-22 14:28:24 +03:00
Alexander Motin
6210ac95a1 amd64: Stop using REP MOVSB for backward memmove()s.
Enhanced REP MOVSB feature of CPUs starting from Ivy Bridge makes
REP MOVSB the fastest way to copy memory in most of cases. However
Intel Optimization Reference Manual says: "setting the DF to force
REP MOVSB to copy bytes from high towards low addresses will expe-
rience significant performance degradation". Measurements on Intel
Cascade Lake and Alder Lake, same as on AMD Zen3 show that it can
drop throughput to as low as 2.5-3.5GB/s, comparing to ~10-30GB/s
of REP MOVSQ or hand-rolled loop, used for non-ERMS CPUs.

This patch keeps ERMS use for forward ordered memory copies, but
removes it for backward overlapped moves where it does not work.

Reviewed by:	mjg
MFC after:	2 weeks
2022-06-16 13:46:34 -04:00
Mark Johnston
756bc3adc5 kasan: Create a shadow for the bootstack prior to hammer_time()
When the kernel is compiled with -asan-stack=true, the address sanitizer
will emit inline accesses to the shadow map.  In other words, some
shadow map accesses are not intercepted by the KASAN runtime, so they
cannot be disabled even if the runtime is not yet initialized by
kasan_init() at the end of hammer_time().

This went unnoticed because the loader will initialize all PML4 entries
of the bootstrap page table to point to the same PDP page, so early
shadow map accesses do not raise a page fault, though they are silently
corrupting memory.  In fact, when the loader does not copy the staging
area, we do get a page fault since in that case only the first and last
PML4Es are populated by the loader.  But due to another bug, the loader
always treated KASAN kernels as non-relocatable and thus always copied
the staging area.

It is not really practical to annotate hammer_time() and all callees
with __nosanitizeaddress, so instead add some early initialization which
creates a shadow for the boot stack used by hammer_time().  This is only
needed by KASAN, not by KMSAN, but the shared pmap code handles both.

Reported by:	mhorne
Reviewed by:	kib
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35449
2022-06-15 11:39:10 -04:00
Mark Johnston
c6d092b510 pmap: Keep PTI page table pages busy
PTI page table pages are allocated from a VM object, so must be
exclusively busied when they are freed, e.g., when a thread loses a race
in pmap_pti_pde().  Simply keep PTPs busy at all times, as was done for
some other kernel allocators in commit
e9ceb9dd11.

Also remove some redundant assertions on "ref_count":
vm_page_unwire_noq() already asserts that the page's reference count is
greater than zero.

Reported by:	syzkaller
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D35466
2022-06-15 11:38:04 -04:00