Improve the code reconstructing en_tw in struct fpreg32 from FXSAVE
results so that all register states are indicated correctly. The
previous code unconditionally mapped non-empty register state to
'normalized value' constant. The new code explicitly distinguishes
the 'zero value' and 'special value' constants as well. This improves
consistency between real FSAVE and translation from FXSAVE, and
ensures that tests using PT_GETFPREGS can rely on a single correct
value independently of the underlying implementation.
PR: 250454
Sponsored by: The FreeBSD Foundation
Obtained from: Moritz Systems
Submitted by: Michał Górny <mgorny@moritz.systems>
Discussed with: emaste
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26856
Currently, this supports SHA1 and SHA2-{224,256,384,512} both as plain
hashes and in HMAC mode on both amd64 and i386. It uses the SHA
intrinsics when present similar to aesni(4), but uses SSE/AVX
instructions when they are not.
Note that some files from OpenSSL that normally wrap the assembly
routines have been adapted to export methods usable by 'struct
auth_xform' as is used by existing software crypto routines.
Reviewed by: gallatin, jkim, delphij, gnn
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D26821
RIght now PCB_KERNFPU is used both as indication that kernel prepared
hardware FPU context to use and that the thread is fpu-kern
thread. This also breaks fpu_kern_enter(FPU_KERN_NOCTX), since
fpu_kern_leave() then clears PCB_KERNFPU.
Introduce new flag PCB_KERNFPU_THR which indicates that the thread is
fpu-kern. Do not clear PCB_KERNFPU if fpu-kern thread leaves noctx
fpu region.
Reported and tested by: jhb (amd64)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D25511
From Linux sources and several datasheets I looked at, it seems that
the workaround is only needed on families 0xf and 0x10. For instance,
Ryzens do not implement the accessed MSR at all, it is documented as
reserved. Also, hypervisors should not allow guest to put CPU into
idle state, so activate workaround only when on bare hardware.
While there, style the code:
move MSR defines to specialreg.h
move identification to initcpu.c
Reported by: whu
Reviewed by: avg
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D26470
Move dump_avail[] extern declaration and inlines into a new header
vm/vm_dumpset.h. This fixes default gcc build for mips.
Reviewed by: alc, scottph
Tested by: kevans (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26741
arm64 has a similar wrapper. This permits defining <machine/fpu.h> as
the standard header for fpu_kern_*.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D26753
Push the root seed version to userspace through the VDSO page, if
the RANDOM_FENESTRASX algorithm is enabled. Otherwise, there is no
functional change. The mechanism can be disabled with
debug.fxrng_vdso_enable=0.
arc4random(3) obtains a pointer to the root seed version published by
the kernel in the shared page at allocation time. Like arc4random(9),
it maintains its own per-process copy of the seed version corresponding
to the root seed version at the time it last rekeyed. On read requests,
the process seed version is compared with the version published in the
shared page; if they do not match, arc4random(3) reseeds from the
kernel before providing generated output.
This change does not implement the FenestrasX concept of PCPU userspace
generators seeded from a per-process base generator. That change is
left for future discussion/work.
Reviewed by: kib (previous version)
Approved by: csprng (me -- only touching FXRNG here)
Differential Revision: https://reviews.freebsd.org/D22839
Now that config(8) has supported include for 19 years, transition to
including the NOTES files. include support didn't exist at the time,
nor did the envvar stuff recently added. Now that it does, eliminate
the building of LINT files by just including everything you need.
Note: This may cause conflicts with updating in some cases.
find sys -name LINT\* -rm
is suggested across this commit to remove the generated LINT
files.
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D26540
APM BIOS was relevant only to early laptops (approximately P166 or
P200 and slower). These have not been relevant for a long time, and
this code has been untested for a long time (as far as I can
tell). The APM compat code in ACPI and the apm(8) command is not being
retired. Both of these items are still in use (apm(8) is more
scriptable than the replacement acpiconf, for the most part). This has
been commented out of i386 GENERIC since 2002. This code is not
relevant to any other port.
Discussed on: arch@
It is unlikely, but possible, that an unrecognized or unsupported
relocation type is encountered while trying to load a kernel module. If
this occurs we should offer the symbol index as a hint to the user.
While here, fix some small style issues.
Reviewed by: markj, kib (amd64 part, in D26701)
Sponsored by: NetApp, Inc.
Sponsored by: Klara, Inc.
This is mostly needed for a common arm64/amd64 iommu code.
Reviewed by: kib
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D26587
The NOTES files have a bunch of hint lines that are removed when
generating LINT. However, we can achieve the same effect by prepending
each of the lines with 'envvar' so the NOTES files become standard
config(8) files. No functional changes as the sed script to generate
the LINT files filters these either way.
Suggested by: kevans
On Ampere Altra systems, the sparse population of RAM within the
physical address space causes the vm_page_dump bitmap to be much
larger than necessary, increasing the size from ~8 Mib to > 2 Gib
(and overflowing `int` for the size).
Changing the page dump bitmap also changes the minidump file
format, so changes are also necessary in libkvm.
Reviewed by: jhb
Approved by: scottl (implicit)
MFC after: 1 week
Sponsored by: Ampere Computing, Inc.
Differential Revision: https://reviews.freebsd.org/D26131
These definitions were repeated by all architectures, with small
variations. Consolidate the common definitons in machine
independent code and use bitset(9) macros for manipulation. Many
opportunities for deduplication remain in the machine dependent
minidump logic. The only intended functional change is increasing
the bit index type to vm_pindex_t, allowing the indexing of pages
with address of 8 TiB and greater.
Reviewed by: kib, markj
Approved by: scottl (implicit)
MFC after: 1 week
Sponsored by: Ampere Computing, Inc.
Differential Revision: https://reviews.freebsd.org/D26129
Currently we use a single bit to indicate whether the virtual page is
part of a superpage. To support a forthcoming implementation of
non-transparent 1GB superpages, it is useful to provide more detailed
information about large page sizes.
The change converts MINCORE_SUPER into a mask for MINCORE_PSIND(psind)
values, indicating a mapping of size psind, where psind is an index into
the pagesizes array returned by getpagesizes(3), which in turn comes
from the hw.pagesizes sysctl. MINCORE_PSIND(1) is equal to the old
value of MINCORE_SUPER.
For now, two bits are used to record the page size, permitting values
of MAXPAGESIZES up to 4.
Reviewed by: alc, kib
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D26238
This allows privileged userspace processes to find information about the
physical page backing a given mapping. It is useful in applications
such as DPDK which perform some of their own memory management.
Reviewed by: kib, jhb (previous version)
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara Inc.
Differential Revision: https://reviews.freebsd.org/D26237
Add deprecation notice for apm bios, aka the apm(4) device. The apm(8)
command will remain, at least for a while, since ACPI emulates the apm
ioctl interface.
Discussed on: arch@
Relnotes: yes
MFC After: 3 days
The ACPI table-mapping code used pmap_kenter_temporary() to create
mappings, which in turn uses the fixed-size crashdump map. Moreover,
the code was not verifying that the table fits in this map, so when
mapping large tables we could clobber adjacent mappings. This use of
pmap_kenter_temporary() appears to predate support in pmap_mapbios() for
creating early mappings, but that restriction no longer applies.
PR: 248746
Reviewed by: kib, mav
Tested by: gallatin, Curtis Villamizar <curtis@ipv6.occnc.com>
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D26125
Those messages were printed hundreds of times during boot, often multiple
times for each table. We already print information about the tables in
more organized form once to not duplicate it when random ACPI drivers are
attaching.
MFC after: 1 week
This is a step towards facilitating jails with only Linux binaries.
Supporting emul_path adds path lookups which are completely spurious
if the binary at hand runs in a Linux-based root directory.
It defaults to on (== current behavior).
make -C /root/linux-5.3-rc8 -s -j 1 bzImage:
use_emul_path=1: 101.65s user 68.68s system 100% cpu 2:49.62 total
use_emul_path=0: 101.41s user 64.32s system 100% cpu 2:45.02 total
so we don't ifdef for every arch in busdma_iommu.c;
o No need to include specialreg.h for x86, remove it.
Requested by: andrew
Reviewed by: kib
Sponsored by: DARPA/AFRL
Differential Revision: https://reviews.freebsd.org/D25957
For purposes of handling hardware error reported via NMIs I need a way to
escape NMI context, being too restrictive to do something significant.
To do it this change introduces new swi_sched() flag SWI_FROMNMI, making
it careful about used KPIs. On platforms allowing IPI sending from NMI
context (x86 for now) it immediately wakes clk_intr_event via new IPI_SWI,
otherwise it works just like SWI_DELAY. To handle the delayed SWIs this
patch calls clk_intr_event on every hardclock() tick.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D25754
Being able to use tmpfs without kernel modules is very useful when building
small MFS_ROOT kernels without a real file system.
Including TMPFS also matches arm/GENERIC and the MIPS std.MALTA configs.
Compiling TMPFS only adds 4 .c files so this should not make much of a
difference to NO_MODULES build times (as we do for our minimal RISC-V
images).
Reviewed By: br (earlier version for riscv), brooks, emaste
Differential Revision: https://reviews.freebsd.org/D25317
The only part of nmi_handle_intr() depending on ISA is isa_nmi(), which is
already wrapped. Entering debugger on NMI does not really depend on ISA.
MFC after: 2 weeks
Subsequent to r240317, kmem_free() was replaced with kva_free() (r254025).
kva_free() releases the KVA allocation for the mapped region, but no longer
clears the pmap (pagetable) entries.
An affected pmap_unmapdev operation would leave the still-pmap'd VA space
free for allocation by other KVA consumers. However, this bug easily
avoided notice for ~7 years because most devices (1) never call
pmap_unmapdev and (2) on amd64, mostly fit within the DMAP and do not need
KVA allocations. Other affected arch are less popular: i386, MIPS, and
PowerPC. Arm64, arm32, and riscv are not affected.
Reported by: Don Morris <dgmorris AT earthlink.net>
Submitted by: Don Morris (amd64 part)
Reviewed by: kib, markj, Don (!amd64 parts)
MFC after: I don't intend to, but you might want to
Sponsored by: Dell Isilon
Differential Revision: https://reviews.freebsd.org/D25689
This removes SCTP from in-tree kernel configuration files. Now, SCTP
can be enabled by simply loading the module, as discussed on
freebsd-net@.
Reviewed by: tuexen
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25611
Stop using smp_ipi_mtx to protect global shootdown state, and
move/multiply the global state into pcpu. Now each CPU can initiate
shootdown IPI independently from other CPUs. Initiator enters
critical section, then fills its local PCPU shootdown info
(pc_smp_tlb_XXX), then clears scoreboard generation at location (cpu,
my_cpuid) for each target cpu. After that IPI is sent to all targets
which scan for zeroed scoreboard generation words. Upon finding such
word the shootdown data is read from corresponding cpu' pcpu, and
generation is set. Meantime initiator loops waiting for all zeroed
generations in scoreboard to update.
Initiator does not disable interrupts, which should allow
non-invalidation IPIs from deadlocking, it only needs to disable
preemption to pin itself to the instance of the pcpu smp_tlb data.
The generation is set before the actual invalidation is performed in
handler. It is safe because target CPU cannot return to userspace
before handler finishes. In principle only NMI can preempt the
handler, but NMI would see the kernel handler frame and not touch
not-invalidated user page table.
Handlers loop until they do not see zeroed scoreboard generations.
This, together with hardware keeping one pending IPI in LAPIC IRR
should prevent lost shootdowns.
Notes.
1. The code does protect writes to LAPIC ICR with exclusion. I believe
this is fine because we in fact do not send IPIs from interrupt
handlers. More for !x2APIC mode where ICR access for write requires
two registers write, we disable interrupts around it. If considered
incorrect, I can add per-cpu spinlock around ipi_send().
2. Scoreboard lines owned by given target CPU can be padded to the
cache line, to reduce ping-pong.
Reviewed by: markj (previous version)
Discussed with: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D25510
Take advantage of Warner's nice new real GEOM aliasing system and use it for
aliased partition names that actually work.
Our canonical EBR partition name is the weird, not-default-on-x86-prior-to-
this-revision "da1p4+00001234." However, if compatibility mode (tunable
kern.geom.part.ebr.compat_aliases) is enabled (1, default), we continue to
provide the alias names like "da1p5" in addition to the weird canonical
names.
Naming partition providers was just one aspect of the COMPAT knob; in
addition it limited mutability, in part because it did not preserve existing
EBR header content aside from that of LBA 0. This change saves the EBR
header for LBA 0, as well as for every EBR partition encountered. That way,
when we write out the EBR partition table on modification, we can restore
any bootloader or other metadata in both LBA0 (the first data-containing EBR
may start after 0) as well as every logical EBR we read from the disk, and
only update the geometry metadata and linked list pointers that describe the
actual partitioning.
(This change does not add support for the 'bootcode' verb to EBR.)
PR: 232463
Reported by: Manish Jain <bourne.identity AT hotmail.com>
Discussed with: ae (no objection)
Relnotes: maybe
Differential Revision: https://reviews.freebsd.org/D24939
This effectively mirrors our libc implementation, but with minor fudging --
name needs to be copied in from userspace, so we just copy it straight into
stack-allocated memfd_name into the correct position rather than allocating
memory that needs to be cleaned up.
The sealing-related fcntl(2) commands, F_GET_SEALS and F_ADD_SEALS, have
also been implemented now that we support them.
Note that this implementation is still not quite at feature parity w.r.t.
the actual Linux version; some caveats, from my foggy memory:
- Need to implement SHM_GROW_ON_WRITE, default for memfd (in progress)
- LTP wants the memfd name exposed to fdescfs
- Linux allows open() of an fdescfs fd with O_TRUNC to truncate after dup.
(?)
Interested parties can install and run LTP from ports (devel/linux-ltp) to
confirm any fixes.
PR: 240874
Reviewed by: kib, trasz
Differential Revision: https://reviews.freebsd.org/D21845