VM_ALLOC_WAITOK and vm_page_unwire_noq(), have eliminated the need for
many of the #includes.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D27052
It is almost never needed and adds an avoidable branch.
While here do minior clean ups in preparation for larger changes.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D27019
Foundation copyrights, approved by emaste@. It does not include
files which carry other people's copyrights; if you're one
of those people, feel free to make similar change.
Reviewed by: emaste, imp, gbe (manpages)
Differential Revision: https://reviews.freebsd.org/D26980
Linux execve() gets audited as AUE_EXECVE as well, we should also interpret
the return from this correctly for the same reasoning as in r367002.
MFC with: r367002
Improve the code reconstructing en_tw in struct fpreg32 from FXSAVE
results so that all register states are indicated correctly. The
previous code unconditionally mapped non-empty register state to
'normalized value' constant. The new code explicitly distinguishes
the 'zero value' and 'special value' constants as well. This improves
consistency between real FSAVE and translation from FXSAVE, and
ensures that tests using PT_GETFPREGS can rely on a single correct
value independently of the underlying implementation.
PR: 250454
Sponsored by: The FreeBSD Foundation
Obtained from: Moritz Systems
Submitted by: Michał Górny <mgorny@moritz.systems>
Discussed with: emaste
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26856
Currently, this supports SHA1 and SHA2-{224,256,384,512} both as plain
hashes and in HMAC mode on both amd64 and i386. It uses the SHA
intrinsics when present similar to aesni(4), but uses SSE/AVX
instructions when they are not.
Note that some files from OpenSSL that normally wrap the assembly
routines have been adapted to export methods usable by 'struct
auth_xform' as is used by existing software crypto routines.
Reviewed by: gallatin, jkim, delphij, gnn
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D26821
Rewrite the code that maintains pm_active and invalidates EPTP-tagged
TLB entries in C. Previously this work was done in vmx_enter_guest(),
in assembly, but there is no good reason for that and it makes the TLB
invalidation algorithm for nested page tables harder to review.
No functional change intended. Now, an error from the invept
instruction results in a kernel panic rather than a vmexit. Such errors
should occur only as a result of VMM bugs.
Reviewed by: grehan, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26830
The former clobbers some registers that shouldn't be touched.
Reviewed by: kib (earlier version)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26406
on some more advanced C features.
This fixes gcc-toolchain build of exception.S.
Reported and tested by: kevans
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Hiding this feature behind RB_VERBOSE is gratuitous. The tunable is enough
to limit its use to only those who explicitly request it.
Suggested by: kevans
RIght now PCB_KERNFPU is used both as indication that kernel prepared
hardware FPU context to use and that the thread is fpu-kern
thread. This also breaks fpu_kern_enter(FPU_KERN_NOCTX), since
fpu_kern_leave() then clears PCB_KERNFPU.
Introduce new flag PCB_KERNFPU_THR which indicates that the thread is
fpu-kern. Do not clear PCB_KERNFPU if fpu-kern thread leaves noctx
fpu region.
Reported and tested by: jhb (amd64)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D25511
From Linux sources and several datasheets I looked at, it seems that
the workaround is only needed on families 0xf and 0x10. For instance,
Ryzens do not implement the accessed MSR at all, it is documented as
reserved. Also, hypervisors should not allow guest to put CPU into
idle state, so activate workaround only when on bare hardware.
While there, style the code:
move MSR defines to specialreg.h
move identification to initcpu.c
Reported by: whu
Reviewed by: avg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26470
Move dump_avail[] extern declaration and inlines into a new header
vm/vm_dumpset.h. This fixes default gcc build for mips.
Reviewed by: alc, scottph
Tested by: kevans (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26741
On the target side TLB shootdown IPI handler, prevent the compiler
from performing a forward store optimization which may mask a
subsequent update to the scoreboard by the initiator.
Reported by: Max Laier, Anton Rang
Discussed with: kib
Sponsored by: Dell EMC Isilon
This patch has the driver for 10Gigabit Ethernet controller in AMD
SoC. This driver is written compatible to the Iflib framework. The
existing driver is for the old version of hardware. The submitted
driver here is for the recent versions of the hardware where the Ethernet
controller is PCI-E based.
Submitted by: Rajesh Kumar <rajesh1.kumar@amd.com>
MFC after: 1 month
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D25793
Push the root seed version to userspace through the VDSO page, if
the RANDOM_FENESTRASX algorithm is enabled. Otherwise, there is no
functional change. The mechanism can be disabled with
debug.fxrng_vdso_enable=0.
arc4random(3) obtains a pointer to the root seed version published by
the kernel in the shared page at allocation time. Like arc4random(9),
it maintains its own per-process copy of the seed version corresponding
to the root seed version at the time it last rekeyed. On read requests,
the process seed version is compared with the version published in the
shared page; if they do not match, arc4random(3) reseeds from the
kernel before providing generated output.
This change does not implement the FenestrasX concept of PCPU userspace
generators seeded from a per-process base generator. That change is
left for future discussion/work.
Reviewed by: kib (previous version)
Approved by: csprng (me -- only touching FXRNG here)
Differential Revision: https://reviews.freebsd.org/D22839
Now that config(8) has supported include for 19 years, transition to
including the NOTES files. include support didn't exist at the time,
nor did the envvar stuff recently added. Now that it does, eliminate
the building of LINT files by just including everything you need.
Note: This may cause conflicts with updating in some cases.
find sys -name LINT\* -rm
is suggested across this commit to remove the generated LINT
files.
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D26540
The boot metadata (also referred to as modinfo, or preload metadata)
provides information about the size and location of the kernel,
pre-loaded modules, and other metadata (e.g. the EFI framebuffer) to be
consumed during by the kernel during early boot. It is encoded as a
series of type-length-value entries and is usually constructed by
loader(8) and passed to the kernel. It is also faked on some
architectures when booted by other means.
Although much of the module information is available via kldstat(8),
there is no easy way to debug the metadata in its entirety. Add some
routines to parse this data and allow it to be printed to the console
during early boot or output via a sysctl.
Since the output can be lengthly, printing to the console is gated
behind the debug.dump_modinfo_at_boot kenv variable as well as the
BOOTVERBOSE flag. The sysctl to print the metadata is named
debug.dump_modinfo.
Reviewed by: tsoome
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26687
It is unlikely, but possible, that an unrecognized or unsupported
relocation type is encountered while trying to load a kernel module. If
this occurs we should offer the symbol index as a hint to the user.
While here, fix some small style issues.
Reviewed by: markj, kib (amd64 part, in D26701)
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
If current process is 64bit, use rex-prefixed version of XSAVE
(XSAVE64). If current process is 32bit and CPU supports saving
segment registers cs/ds in the FPU save area, use non-prefixed variant
of XSAVE.
Reported and tested by: Michał Górny <mgorny@mgorny@moritz.systems>
PR: 250043
Reviewed by: emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26643
After r354889 stack got struct nmi_pcpu at top, which makes IST top
not page-aligned. Since pmap_pti_add_kva() truncates/rounds up
addresses, it erronously entered a page mapped before double fault
stack into the pti page table.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Per the Intel manuals, CPUID is supposed to unconditionally zero the
upper 32 bits of the involved (rax/rbx/rcx/rdx) registers.
Previously, the emulation would cast pointers to the 64-bit register
values down to `uint32_t`, which while properly manipulating the lower
bits, would leave any garbage in the upper bits uncleared. While no
existing guest OSes seem to stumble over this in practice, the bhyve
emulation should match x86 expectations.
This was discovered through alignment warnings emitted by gcc9, while
testing it against SmartOS/bhyve.
SmartOS bug: https://smartos.org/bugview/OS-8168
Submitted by: Patrick Mooney
Reviewed by: rgrimes
Differential Revision: https://reviews.freebsd.org/D24727
This is mostly needed for a common arm64/amd64 iommu code.
Reviewed by: kib
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D26587
The NOTES files have a bunch of hint lines that are removed when
generating LINT. However, we can achieve the same effect by prepending
each of the lines with 'envvar' so the NOTES files become standard
config(8) files. No functional changes as the sed script to generate
the LINT files filters these either way.
Suggested by: kevans
On Ampere Altra systems, the sparse population of RAM within the
physical address space causes the vm_page_dump bitmap to be much
larger than necessary, increasing the size from ~8 Mib to > 2 Gib
(and overflowing `int` for the size).
Changing the page dump bitmap also changes the minidump file
format, so changes are also necessary in libkvm.
Reviewed by: jhb
Approved by: scottl (implicit)
MFC after: 1 week
Sponsored by: Ampere Computing, Inc.
Differential Revision: https://reviews.freebsd.org/D26131
These definitions were repeated by all architectures, with small
variations. Consolidate the common definitons in machine
independent code and use bitset(9) macros for manipulation. Many
opportunities for deduplication remain in the machine dependent
minidump logic. The only intended functional change is increasing
the bit index type to vm_pindex_t, allowing the indexing of pages
with address of 8 TiB and greater.
Reviewed by: kib, markj
Approved by: scottl (implicit)
MFC after: 1 week
Sponsored by: Ampere Computing, Inc.
Differential Revision: https://reviews.freebsd.org/D26129
Possible in LA57 pmap config.
Noted by: alc
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26492
- Move assertions out of the main loop to avoid duplicate conditional
expressions, and improve assertion messages.
- Fix va_next updates. In some cases we were not doing the wraparound
check before continuing the loop.
- Use the right va_next. In pmap_advise() and pmap_copy() we would step
through 1G pages 2M at a time.
- Copy 1G mappings in pmap_copy().
Reviewed by: alc, kib
MFC with: r365518
Sponsored by: Juniper Networks, Inc., Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26463
Fix the logic so it works as it appears.
Reported by: Coverity
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
Differential Revision: D26211 (in progress, so omitting full URL)
Intercept and report #UD to VM on SVM/AMD in case VM tried to execute an
SVM instruction. Otherwise, SVM allows execution of them, and instructions
operate on host physical addresses despite being executed in guest mode.
Reported by: Maxime Villard <max@m00nbsd.net>
admbug: 972
CVE: CVE-2020-7467
Reviewed by: grehan, markj
Differential revision: https://reviews.freebsd.org/D26313
The flag requests entry of non-managed superpage mapping of size
pagesizes[psind] into the page table.
Pmap supports fake wiring of the largepage mappings. Only attributes
of the largepage mapping can be changed by calling pmap_enter(9) over
existing mapping, physical address of the page must be unchanged.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D24652
Currently we use a single bit to indicate whether the virtual page is
part of a superpage. To support a forthcoming implementation of
non-transparent 1GB superpages, it is useful to provide more detailed
information about large page sizes.
The change converts MINCORE_SUPER into a mask for MINCORE_PSIND(psind)
values, indicating a mapping of size psind, where psind is an index into
the pagesizes array returned by getpagesizes(3), which in turn comes
from the hw.pagesizes sysctl. MINCORE_PSIND(1) is equal to the old
value of MINCORE_SUPER.
For now, two bits are used to record the page size, permitting values
of MAXPAGESIZES up to 4.
Reviewed by: alc, kib
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26238
This allows privileged userspace processes to find information about the
physical page backing a given mapping. It is useful in applications
such as DPDK which perform some of their own memory management.
Reviewed by: kib, jhb (previous version)
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara Inc.
Differential Revision: https://reviews.freebsd.org/D26237
If the call to _pmap_allocpte() is not sleepable, it is possible that
allocation of PML4 or PDP page is successful but either PDP or PD page
is not. Restructured code in _pmap_allocpte() leaves zero-referenced
page in the paging structure.
Handle it by checking refcount of the page one level above failed
alloc and free that page if its reference count is zero.
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26293
We can switch into long mode directly with LA57 enabled.
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D25273
Since LA57 was moved to the main SDM document with revision 072, it
seems that we should have a support for it, and silicons are coming.
This patch makes pmap support both LA48 and LA57 hardware. The
selection of page table level is done at startup, kernel always
receives control from loader with 4-level paging. It is not clear how
UEFI spec would adapt LA57, for instance it could hand out control in
LA57 mode sometimes.
To switch from LA48 to LA57 requires turning off long mode, requesting
LA57 in CR4, then re-entering long mode. This is somewhat delicate
and done in pmap_bootstrap_la57(). AP startup in LA57 mode is much
easier, we only need to toggle a bit in CR4 and load right value in CR3.
I decided to not change kernel map for now. Single PML5 entry is
created that points to the existing kernel_pml4 (KML4Phys) page, and a
pml5 entry to create our recursive mapping for vtopte()/vtopde().
This decision is motivated by the fact that we cannot overcommit for
KVA, so large space there is unusable until machines start providing
wider physical memory addressing. Another reason is that I do not
want to break our fragile autotuning, so the KVA expansion is not
included into this first step. Nice side effect is that minidumps are
compatible.
On the other hand, (very) large address space is definitely
immediately useful for some userspace applications.
For userspace, numbering of pte entries (or page table pages) is
always done for 5-level structures even if we operate in 4-level mode.
The pmap_is_la57() function is added to report the mode of the
specified pmap, this is done not to allow simultaneous 4-/5-levels
(which is not allowed by hw), but to accomodate for EPT which has
separate level control and in principle might not allow 5-leve EPT
despite x86 paging supports it. Anyway, it does not seems critical to
have 5-level EPT support now.
Tested by: pho (LA48 hardware)
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D25273
Coverity has identified the line in this change as "Potential integer
overflowing expression" due to the variable i declared as an int
and used in an expression with vm_paddr_t, a 64bit variable.
This change has very little effect as when this line is execute
nkpt is small and phys_addr is a the beginning of physical memory.
But there is no explicit protection that the above is true.
Submitted by: bret_ketchum@dell.com
Reported by: Coverity
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D26141
The ACPI table-mapping code used pmap_kenter_temporary() to create
mappings, which in turn uses the fixed-size crashdump map. Moreover,
the code was not verifying that the table fits in this map, so when
mapping large tables we could clobber adjacent mappings. This use of
pmap_kenter_temporary() appears to predate support in pmap_mapbios() for
creating early mappings, but that restriction no longer applies.
PR: 248746
Reviewed by: kib, mav
Tested by: gallatin, Curtis Villamizar <curtis@ipv6.occnc.com>
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26125
Those messages were printed hundreds of times during boot, often multiple
times for each table. We already print information about the tables in
more organized form once to not duplicate it when random ACPI drivers are
attaching.
MFC after: 1 week
This is a step towards facilitating jails with only Linux binaries.
Supporting emul_path adds path lookups which are completely spurious
if the binary at hand runs in a Linux-based root directory.
It defaults to on (== current behavior).
make -C /root/linux-5.3-rc8 -s -j 1 bzImage:
use_emul_path=1: 101.65s user 68.68s system 100% cpu 2:49.62 total
use_emul_path=0: 101.41s user 64.32s system 100% cpu 2:45.02 total
Recent versions of UEFI have moved local APIC timer initialization into
the early SEC phase which runs out of ROM, prior to self-relocating
into RAM. This results in a hypervisor exit.
Currently bhyve prevents instruction emulation from segments that aren't
marked as "sysmem" aka guest RAM, with the vm_gpa_hold() routine failing.
However, there is no reason for this restriction: the hypervisor already
controls whether EPT mappings are marked as executable.
Fix by dropping the redundant check of sysmem.
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D25955
so we don't ifdef for every arch in busdma_iommu.c;
o No need to include specialreg.h for x86, remove it.
Requested by: andrew
Reviewed by: kib
Sponsored by: DARPA/AFRL
Differential Revision: https://reviews.freebsd.org/D25957
For purposes of handling hardware error reported via NMIs I need a way to
escape NMI context, being too restrictive to do something significant.
To do it this change introduces new swi_sched() flag SWI_FROMNMI, making
it careful about used KPIs. On platforms allowing IPI sending from NMI
context (x86 for now) it immediately wakes clk_intr_event via new IPI_SWI,
otherwise it works just like SWI_DELAY. To handle the delayed SWIs this
patch calls clk_intr_event on every hardclock() tick.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D25754
Being able to use tmpfs without kernel modules is very useful when building
small MFS_ROOT kernels without a real file system.
Including TMPFS also matches arm/GENERIC and the MIPS std.MALTA configs.
Compiling TMPFS only adds 4 .c files so this should not make much of a
difference to NO_MODULES build times (as we do for our minimal RISC-V
images).
Reviewed By: br (earlier version for riscv), brooks, emaste
Differential Revision: https://reviews.freebsd.org/D25317
The only part of nmi_handle_intr() depending on ISA is isa_nmi(), which is
already wrapped. Entering debugger on NMI does not really depend on ISA.
MFC after: 2 weeks
Limit manipulations to use %rax as scratch to the pti portion of the
syscall entry code.
Submitted by: alc
Reviewed by: markj
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D25722
When pmap operates in PTI mode, we must reload %cr3 on return to
userspace. In non-PCID mode the reload always flushes all non-global
TLB entries and we take advantage of it by only invalidating the KPT
TLB entries (there is no cached UPT entries at all).
In PCID mode, we flush both KPT and UPT TLB explicitly, but we can
take advantage of the fact that PCID mode command to reload %cr3
includes a flag to flush/not flush target TLB. In particular, we can
avoid the flush for UPT, instead record that load of pc_ucr3 into %cr3
on return to usermode should be flushing. This is done by providing
either all-1s or ~CR3_PCID_MASK in pc_ucr3_load_mask. The mask is
automatically reset to all-1s on return to usermode.
Similarly, we can avoid flushing UPT TLB on context switch, replacing
it by setting pc_ucr3_load_mask. This unifies INVPCID and non-INVPCID
PTI ifunc, leaving only 4 cases instead of 6. This trick is also
applicable both to the TLB shootdown IPI handlers, since handlers
interrupt the target thread.
But then we need to check pc_curpmap in handlers, and this would
reopen the same race for INVPCID machines as was fixed in r306350 for
non-INVPCID. To not introduce the same bug, unconditionally do
spinlock_enter() in pmap_activate().
Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D25483
Subsequent to r240317, kmem_free() was replaced with kva_free() (r254025).
kva_free() releases the KVA allocation for the mapped region, but no longer
clears the pmap (pagetable) entries.
An affected pmap_unmapdev operation would leave the still-pmap'd VA space
free for allocation by other KVA consumers. However, this bug easily
avoided notice for ~7 years because most devices (1) never call
pmap_unmapdev and (2) on amd64, mostly fit within the DMAP and do not need
KVA allocations. Other affected arch are less popular: i386, MIPS, and
PowerPC. Arm64, arm32, and riscv are not affected.
Reported by: Don Morris <dgmorris AT earthlink.net>
Submitted by: Don Morris (amd64 part)
Reviewed by: kib, markj, Don (!amd64 parts)
MFC after: I don't intend to, but you might want to
Sponsored by: Dell Isilon
Differential Revision: https://reviews.freebsd.org/D25689
This removes SCTP from in-tree kernel configuration files. Now, SCTP
can be enabled by simply loading the module, as discussed on
freebsd-net@.
Reviewed by: tuexen
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25611
This shortens fdalloc by over 60 bytes. Correctness verified by running both
variants at the same time and comparing the result of each call.
Note someone(tm) should make a pass at converting everything else feasible.
Stop using smp_ipi_mtx to protect global shootdown state, and
move/multiply the global state into pcpu. Now each CPU can initiate
shootdown IPI independently from other CPUs. Initiator enters
critical section, then fills its local PCPU shootdown info
(pc_smp_tlb_XXX), then clears scoreboard generation at location (cpu,
my_cpuid) for each target cpu. After that IPI is sent to all targets
which scan for zeroed scoreboard generation words. Upon finding such
word the shootdown data is read from corresponding cpu' pcpu, and
generation is set. Meantime initiator loops waiting for all zeroed
generations in scoreboard to update.
Initiator does not disable interrupts, which should allow
non-invalidation IPIs from deadlocking, it only needs to disable
preemption to pin itself to the instance of the pcpu smp_tlb data.
The generation is set before the actual invalidation is performed in
handler. It is safe because target CPU cannot return to userspace
before handler finishes. In principle only NMI can preempt the
handler, but NMI would see the kernel handler frame and not touch
not-invalidated user page table.
Handlers loop until they do not see zeroed scoreboard generations.
This, together with hardware keeping one pending IPI in LAPIC IRR
should prevent lost shootdowns.
Notes.
1. The code does protect writes to LAPIC ICR with exclusion. I believe
this is fine because we in fact do not send IPIs from interrupt
handlers. More for !x2APIC mode where ICR access for write requires
two registers write, we disable interrupts around it. If considered
incorrect, I can add per-cpu spinlock around ipi_send().
2. Scoreboard lines owned by given target CPU can be padded to the
cache line, to reduce ping-pong.
Reviewed by: markj (previous version)
Discussed with: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D25510
On architectures that use RELA relocations it is safe to rerun the ifunc
resolvers on after all CPUs have started, but while they are sill parked.
On arm64 with big.LITTLE this is needed as some SoCs have shipped with
different ID register values the big and little clusters meaning we were
unable to rely on the register values from the boot CPU.
Add support for rerunning the resolvers on arm64 and amd64 as these are
both RELA using architectures.
Reviewed by: kib
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D25455
Like other types of allocation, fpu_kern_ctx are frequently allocated per-cpu.
Provide the API and sketch some example consumers.
fpu_kern_alloc_ctx_domain() preferentially allocates memory from the
provided domain, and falls back to other domains if that one is empty
(DOMAINSET_PREF(domain) policy).
Maybe it makes more sense to just shove one of these in the DPCPU area
sooner or later -- left for future work.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D22053
Take advantage of Warner's nice new real GEOM aliasing system and use it for
aliased partition names that actually work.
Our canonical EBR partition name is the weird, not-default-on-x86-prior-to-
this-revision "da1p4+00001234." However, if compatibility mode (tunable
kern.geom.part.ebr.compat_aliases) is enabled (1, default), we continue to
provide the alias names like "da1p5" in addition to the weird canonical
names.
Naming partition providers was just one aspect of the COMPAT knob; in
addition it limited mutability, in part because it did not preserve existing
EBR header content aside from that of LBA 0. This change saves the EBR
header for LBA 0, as well as for every EBR partition encountered. That way,
when we write out the EBR partition table on modification, we can restore
any bootloader or other metadata in both LBA0 (the first data-containing EBR
may start after 0) as well as every logical EBR we read from the disk, and
only update the geometry metadata and linked list pointers that describe the
actual partitioning.
(This change does not add support for the 'bootcode' verb to EBR.)
PR: 232463
Reported by: Manish Jain <bourne.identity AT hotmail.com>
Discussed with: ae (no objection)
Relnotes: maybe
Differential Revision: https://reviews.freebsd.org/D24939
This effectively mirrors our libc implementation, but with minor fudging --
name needs to be copied in from userspace, so we just copy it straight into
stack-allocated memfd_name into the correct position rather than allocating
memory that needs to be cleaned up.
The sealing-related fcntl(2) commands, F_GET_SEALS and F_ADD_SEALS, have
also been implemented now that we support them.
Note that this implementation is still not quite at feature parity w.r.t.
the actual Linux version; some caveats, from my foggy memory:
- Need to implement SHM_GROW_ON_WRITE, default for memfd (in progress)
- LTP wants the memfd name exposed to fdescfs
- Linux allows open() of an fdescfs fd with O_TRUNC to truncate after dup.
(?)
Interested parties can install and run LTP from ports (devel/linux-ltp) to
confirm any fixes.
PR: 240874
Reviewed by: kib, trasz
Differential Revision: https://reviews.freebsd.org/D21845
in vanilla Linux git tree.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25385
If userspace has a newer bhyve than the kernel, it may be able to decode
and emulate some instructions vmm.ko is unaware of. In this scenario,
reset decoder state and try again.
Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D24464
It turns out relocating the symbol table itself can cause issues, like fbt
crashing because it applies the offsets to the kernel twice.
This had been previously brought up in rS333447 when the stoffs hack was
added, but I had been unaware of this and reimplemented symtab relocation.
Instead of relocating the symbol table, keep track of the relocation base
in ddb, so the ddb symbols behave like the kernel linker-provided symbols.
This is intended to be NFC on platforms other than PowerPC, which do not
use fully relocatable kernels. (The relbase will always be 0)
* Remove the rest of the stoffs hack.
* Remove my half-baked displace_symbol_table() function.
* Extend ddb initialization to cope with having a relocation offset on the
kernel symbol table.
* Fix my kernel-as-initrd hack to work with booke64 by using a temporary
mapping to access the data.
* Fix another instance of __powerpc__ that is actually RELOCATABLE_KERNEL.
* Change the behavior or X_db_symbol_values to apply the relocation base
when updating valp, to match link_elf_symbol_values() behavior.
Reviewed by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D25223