This reverts r177661. The change is no longer very useful since
out-of-tree KLDs will be built to target SMP kernels anyway. Moveover
it breaks the KBI in !SMP builds since cpuset_t's layout depends on the
value of MAXCPU, and several kernel interfaces, notably
smp_rendezvous_cpus(), take a cpuset_t as a parameter.
PR: 243711
Reviewed by: jhb, kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23512
Submitted by: Bora Özarslan <borako.ozarslan@gmail.com>
Submitted by: Yang Wang <2333@outlook.jp>
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19917
messages for some of the unimplemented syscalls, in particular
the AIO-related ones.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23231
After r355784 the td_oncpu field is no longer synchronized by the thread
lock, so the stack capture interrupt cannot be delievered precisely.
Fix this using a loop which drops the thread lock and restarts if the
wrong thread was sampled from the stack capture interrupt handler.
Change the implementation to use a regular interrupt instead of an NMI.
Now that we drop the thread lock, there is no advantage to the latter.
Simplify the KPIs. Remove stack_save_td_running() and add a return
value to stack_save_td(). On platforms that do not support stack
capture of running threads, stack_save_td() returns EOPNOTSUPP. If the
target thread is running in user mode, stack_save_td() returns EBUSY.
Reviewed by: kib
Reported by: mjg, pho
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23355
Borrow the trick from memset and memmove and use the scale/index/base addressing
to avoid branches.
If a mismatch is found, the routine has to calculate the difference. Make sure
there is always up to 8 bytes to inspect. This replaces the previous loop which
would operate over up to 16 bytes with an unrolled list of 8 tests.
Speed varies a lot, but this is a net win over the previous routine with probably
a lot more to gain.
Validated with glibc test suite.
The Linux32 system call argument fetcher places each argument (passed in
registers in the Linux x86 system call convention) into an entry in the
generic system call args array. Each member of this array is 8 bytes
wide, so this approach is broken for system calls that take off_t
arguments.
Fix the problem by splitting l_loff_t arguments in the 32-bit system
call descriptions, the same as we do for FreeBSD32. Change entry points
to handle this using the PAIR32TO64 macro.
Move linux_ftruncate64() into compat/linux.
PR: 243155
Reported by: Alex S <iwtcex@gmail.com>
Reviewed by: kib (previous version)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D23210
As a new x86 CPU vendor, Chengdu Haiguang IC Design Co., Ltd (Hygon)
is a joint venture between AMD and Haiguang Information Technology Co.,
Ltd., aims at providing x86 processors for China server market.
The first generation Hygon processor(Dhyana) shares most architecture
with AMD's family 17h, but with different CPU vendor ID("HygonGenuine")
and PCI vendor ID(0x1d94) and family series number 18h(Hygon negotiated
with AMD to confirm that only Hygon use family 18h).
To enable Hygon Dhyana support in FreeBSD, add new definitions
HYGON_VENDOR_ID("HygonGenuine") and X86_VENDOR_HYGON(0x1d94) to identify
Hygon Dhyana CPU.
Initialize the CPU features(topology, local APIC ext, MSI, TSC, hwpstate,
MCA, DEBUG_CTL, etc) for amd64 and i386 mode by sharing the code path of
AMD family 17h.
The changes have been applied on FreeBSD 13.0-CURRENT and tested
successfully on Hygon Dhyana processor.
References:
[1] Linux kernel patches for Hygon Dhyana, merged in 4.20:
https://git.kernel.org/tip/c9661c1e80b609cd038db7c908e061f0535804ef
[2] MSR and CPUID definition:
https://www.amd.com/system/files/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf
Submitted by: Pu Wen <puwen@hygon.cn>
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D23163
r355473 vastly improved the readability and cleanliness of these Makefiles.
Every single one of them follows the same pattern and duplicates the exact
same logic.
Now that we have GENERATED/SRCS, split SRCS up into the two parameters we'll
use for ${MAKESYSCALLS} rather than assuming a specific ordering of SRCS and
include a common sysent.mk to handle the rest. This makes it less tedious to
make sweeping changes.
Some default values are provided for GENERATED/SYSENT_*; almost all of these
just use a 'syscalls.master' and 'syscalls.conf' in cwd, and they all use
effectively the same filenames with an arbitrary prefix. Most ABIs will be
able to get away with just setting GENERATED_PREFIX and including
^/sys/conf/sysent.mk, while others only need light additions. kern/Makefile
is the notable exception, as it doesn't take a SYSENT_CONF and the generated
files are spread out between ^/sys/kern and ^/sys/sys, but it otherwise fits
the pattern enough to use the common version.
Reviewed by: brooks, imp
Nice!: emaste
Differential Revision: https://reviews.freebsd.org/D23197
When either makesyscalls.lua or syscalls.master changes, all of the
${GENERATED} targets are now out-of-date. With make jobs > 1, this means we
will run the makesyscalls script in parallel for the same ABI, generating
the same set of output files.
Prior to r356603 , there is a large window for interlacing output for some
of the generated files that we were generating in-place rather than staging
in a temp dir. After that, we still should't need to run the script more
than once per-ABI as the first invocation should update all of them. Add
.ORDER to do so cleanly.
Reviewed by: brooks
Discussed with: sjg
Differential Revision: https://reviews.freebsd.org/D23099
vm.kvm_size and vm.kvm_free are read only and marked as MPSAFE on i386
already. Mark them as that on amd64 and arm64 too to avoid locking Giant.
Reviewed by: kib (mentor)
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D23039
mapping to the old read-only page with a mapping to the new read-write page.
To destroy the old mapping, pmap_enter() must destroy its page table and PV
entries and invalidate its TLB entry. This change simply invalidates that
TLB entry a little earlier, specifically, on amd64 and arm64, before the PV
list lock is held.
Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D23027
syscall is to query the CPU number and the NUMA domain the calling
thread is currently running on. The third argument is ignored.
It doesn't do anything regarding scheduling - it's literally
just a way to query the current state, without any guarantees
you won't get rescheduled an opcode later.
This unbreaks Java from CentOS 8
(java-11-openjdk-11.0.5.10-0.el8_0.x86_64).
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22972
copy_file_range(2) is implemented natively since r350315, make it available
for Linux binaries too.
Reviewed by: kib (mentor), trasz (previous version)
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D22959
incrementing (and decrementing) the ref_count on kernel page table pages.
They should not do this. Kernel page table pages are expected to have a
fixed ref_count. Address this problem by refactoring pmap_alloc{_l2,pde}()
and their callers. This also eliminates some duplicated code from the
callers.
Correctly implement PMAP_ENTER_NOREPLACE in pmap_enter_{l2,pde}() on kernel
mappings.
Reduce code duplication by defining a function, pmap_abort_ptp(), for
handling a common error case.
Handle a possible page table page leak in pmap_copy(). Suppose that we are
determining whether to copy a superpage mapping. If we abort because there
is already a mapping in the destination pmap at the current address, then
simply decrementing the page table page's ref_count is correct, because the
page table page must have a ref_count > 1. However, if we abort because we
failed to allocate a PV entry, this might be a just allocated page table
page that has a ref_count = 1, so we should call pmap_abort_ptp().
Simplify error handling in pmap_enter_quick_locked().
Reviewed by: kib, markj (an earlier)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22763
than "/compat/linux". Useful when you have several compat directories
with different Linux versions and you don't want to clash with files
installed by linux-c7 packages.
Reviewed by: bcr (manpages)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22574
over the usual fsync(2).
This silences some warnings when running "apt-get upgrade".
Reviewed by: brooks, emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22371
- Allow the userland hypervisor to intercept breakpoint exceptions
(BP#) in the guest. A new capability (VM_CAP_BPT_EXIT) is used to
enable this feature. These exceptions are reported to userland via
a new VM_EXITCODE_BPT that includes the length of the original
breakpoint instruction. If userland wishes to pass the exception
through to the guest, it must be explicitly re-injected via
vm_inject_exception().
- Export VMCS_ENTRY_INST_LENGTH as a VM_REG_GUEST_ENTRY_INST_LENGTH
pseudo-register. Injecting a BP# on Intel requires setting this to
the length of the breakpoint instruction. AMD SVM currently ignores
writes to this register (but reports success) and fails to read it.
- Rework the per-vCPU state tracked by the debug server. Rather than
a single 'stepping_vcpu' global, add a structure for each vCPU that
tracks state about that vCPU ('stepping', 'stepped', and
'hit_swbreak'). A global 'stopped_vcpu' tracks which vCPU is
currently reporting an event. Event handlers for MTRAP and
breakpoint exits loop until the associated event is reported to the
debugger.
Breakpoint events are discarded if the breakpoint is not present
when a vCPU resumes in the breakpoint handler to retry submitting
the breakpoint event.
- Maintain a linked-list of active breakpoints in response to the GDB
'Z0' and 'z0' packets.
Reviewed by: markj (earlier version)
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D20309
This is a 32-bit structure embedded in each vm_page, consisting mostly
of page queue state. The use of a structure makes it easy to store a
snapshot of a page's queue state in a stack variable and use cmpset
loops to update that state without requiring the page lock.
This change merely adds the structure and updates references to atomic
state fields. No functional change intended.
Reviewed by: alc, jeff, kib
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D22650
Partially revert r354741 and r354754 and go back to allocating a
fixed-size chunk of stack space for the auxiliary vector. Keep
sv_copyout_auxargs but change it to accept the address at the end of
the environment vector as an input stack address and no longer
allocate room on the stack. It is now called at the end of
copyout_strings after the argv and environment vectors have been
copied out.
This should fix a regression in r354754 that broke the stack alignment
for newer Linux amd64 binaries (and probably broke Linux arm64 as
well).
Reviewed by: kib
Tested on: amd64 (native, linux64 (only linux-base-c7), and i386)
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D22695
... after the initial common TSS is copied into its final location
during PCPU reallocation.
Reported by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Use the power of variable to avoid spelling out source and generated
files too many times. The previous Makefiles were hard to read, hard to
edit, and badly formatted.
Reviewed by: kevans, emaste
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22714
They're in both the old and new places in HEAD for the moment for
discussion and transition. The old locations will be garbage collected
in 4 weeks. MFCs to 12 an 11 will keep the old and new for transition
purposes.
Reviewed by: kib
MFC after: 4 weeks
Sponsored by: Intel
Differential Revision: https://reviews.freebsd.org/D22590
o Remove All Rights Reserved from my notices
o imp@FreeBSD.org everywhere
o regularize punctiation, eliminate date ranges
o Make sure that it's clear that I don't claim All Rights reserved by listing
All Rights Reserved on same line as other copyright holders (but not
me). Other such holders are also listed last where it's clear.
- Use ustringp for the location of the argv and environment strings
and allow destp to travel further down the stack for the stackgap
and auxv regions.
- Update the Linux copyout_strings variants to move destp down the
stack as was done for the native ABIs in r263349.
- Stop allocating a space for a stack gap in the Linux ABIs. This
used to hold translated system call arguments, but hasn't been used
since r159992.
Reviewed by: kib
Tested on: md64 (amd64, i386, linux64), i386 (i386, linux)
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D22501
tightening constraints on busy as a precursor to lockless page lookup and
should largely be a NOP for these cases.
Reviewed by: alc, kib, markj
Differential Revision: https://reviews.freebsd.org/D22611
No need to log all the commands in command ring but only the last one for which completion failed.
Reported by: np@freebsd.org
Reviewed by: np, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22566
Update the NetBSD Kernel Concurrency Sanitizer (KCSAN) runtime to work in
the FreeBSD kernel. It is a useful tool for finding data races between
threads executing on different CPUs.
This can be enabled by enabling KCSAN in the kernel config, or by using the
GENERIC-KCSAN amd64 kernel. It works on amd64 and arm64, however the later
needs a compiler change to allow -fsanitize=thread that KCSAN uses.
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22315
Typical reasons for doublefault faults are either kernel stack
overflow or bugs in the code that manipulates protection CPU state.
The later code is the code which often has to set up gsbase for
kernel. Switching to explicit load of GSBASE MSR in the fault handler
makes it more probable to output a useful information.
Now all IST handlers have nmi_pcpu structure on top of their stacks.
It would be even more useful to save gsbase value at the moment of the
fault. I did not this because I do not want to modify PCB layout now.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
flua is bootstrapped as part of the build for those on older
versions/revisions that don't yet have flua installed. Once upgraded past
r354833, "make sysent" will again naturally work as expected.
Reviewed by: brooks
Differential Revision: https://reviews.freebsd.org/D21894
The purpose of this option is to make it easier to track down memory
corruption bugs by reducing the number of malloc(9) types that might
have recently been associated with a given chunk of memory. However, it
increases fragmentation and is disabled in release kernels.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
This CVE has already been announced in FreeBSD SA-19:26.mcu.
Mitigation for TAA involves either turning off TSX or turning on the
VERW mitigation used for MDS. Some CPUs will also be self-mitigating
for TAA and require no software workaround.
Control knobs are:
machdep.mitigations.taa.enable:
0 - no software mitigation is enabled
1 - attempt to disable TSX
2 - use the VERW mitigation
3 - automatically select the mitigation based on processor
features.
machdep.mitigations.taa.state:
inactive - no mitigation is active/enabled
TSX disable - TSX is disabled in the bare metal CPU as well as
- any virtualized CPUs
VERW - VERW instruction clears CPU buffers
not vulnerable - The CPU has identified itself as not being
vulnerable
Nothing in the base FreeBSD system uses TSX. However, the instructions
are straight-forward to add to custom applications and require no kernel
support, so the mitigation is provided for users with untrusted
applications and tenants.
Reviewed by: emaste, imp, kib, scottph
Sponsored by: Intel
Differential Revision: 22374
Change the FreeBSD ELF ABIs to use this new hook to copyout ELF auxv
instead of doing it in the sv_fixup hook. In particular, this new
hook allows the stack space to be allocated at the same time the auxv
values are copied out to userland. This allows us to avoid wasting
space for unused auxv entries as well as not having to recalculate
where the auxv vector is by walking back up over the argv and
environment vectors.
Reviewed by: brooks, emaste
Tested on: amd64 (amd64 and i386 binaries), i386, mips, mips64
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D22355
This driver allows to usage of the paravirt SCSI controller
in VMware products like ESXi. The pvscsi driver provides a
substantial performance improvement in block devices versus
the emulated mpt and mps SCSI/SAS controllers.
Error handling in this driver has not been extensively tested
yet.
Submitted by: vbhakta@vmware.com
Relnotes: yes
Sponsored by: VMware, Panzura
Differential Revision: D18613
from usermode.
If CPU supports RDFSBASE, the flag also means that userspace fsbase
and gsbase are already written into pcb, which might be not true when
we handle #gp from kernel.
The offender is rdmsr_safe(), and the visible result is corrupted
userspace TLS base.
Reported by: pstef
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Disable the use of executable 2M page mappings in EPT-format page
tables on affected CPUs. For bhyve virtual machines, this effectively
disables all use of superpage mappings on affected CPUs. The
vm.pmap.allow_2m_x_ept sysctl can be set to override the default and
enable mappings on affected CPUs.
Alternate approaches have been suggested, but at present we do not
believe the complexity is warranted for typical bhyve's use cases.
Reviewed by: alc, emaste, markj, scottl
Security: CVE-2018-12207
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D21884
to the size of hardware gdt.
Reviewed by: jhb, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22302
On some AMD CPUs, in particular, machines that do not implement
CLFLUSHOPT but do provide CLFLUSH, the CLFLUSH instruction is only
synchronized with MFENCE.
Code using CLFLUSH typicall needs to brace it with MFENCE both before
and after flush, see for instance pmap_invalidate_cache_range(). If
context switch occurs while inside the protected region, we need to
ensure visibility of flushes done on the old CPU, to new CPU.
For all other machines, locked operation done to lock switched thread,
should be enough. For case of different address spaces, reload of
%cr3 is serializing.
Reviewed by: cem, jhb, scottph
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22007
Besides the confusing name, this type is effectively unused.
In all cases where it could be set, the INTERRUPT type is set by the
earlier code. The conditions for TRAP_INTERRUPT are a subset of the
conditions for INTERRUPT.
Reviewed by: kib, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22305
This saves some memory, around 256K I think. It removes some code,
e.g. KPTI does not need to specially map common_tss anymore. Also,
common_tss become domain-local.
Reviewed by: jhb
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22231
This saves 320 bytes of the precious stack space.
The only negative aspect of the change I can think of is that the
struct thread increased by 320 bytes obviously, and that 320 bytes are
not swapped out anymore. I believe the freed stack space is much more
important than that. Also, current struct thread size is 1392 bytes
on amd64, so UMA will allocate two thread structures per (4KB) slab,
which leaves a space for pcb without increasing zone memory use.
Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D22138
This significantly reduces contention since chunks get created and removed
all the time. See the review for sample results.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21976
No functional change (in program code). Additional DWARF metadata is
generated in the .eh_frame section. Also, it is now a compile-time
requirement that machine/asm.h ENTRY() and END() macros are paired. (This
is subject to ongoing discussion and may change.)
This DWARF metadata allows llvm-libunwind to unwind program stacks when the
program is executing the function. The goal is to collect accurate
userspace stacktraces when programs have entered syscalls.
(The motivation for "Call Frame Information," or CFI for short -- not to be
confused with Control Flow Integrity -- is to sufficiently annotate assembly
functions such that stack unwinders can unwind out of the local frame
without the requirement of a dedicated framepointer register; i.e.,
-fomit-frame-pointer. This is necessary for C++ exception handling or
collecting backtraces.)
For the curious, a more thorough description of the metadata and some
examples may be found at [1] and documentation at [2]. You can also look at
'cc -S -o - foo.c | less' and search for '.cfi_' to see the CFI directives
generated by your C compiler.
[1]: https://www.imperialviolet.org/2017/01/18/cfi.html
[2]: https://sourceware.org/binutils/docs/as/CFI-directives.html
Reviewed by: emaste, kib (with reservations)
Differential Revision: https://reviews.freebsd.org/D22122
Instead of superpages use. The current code employs superpage-wide locking
regardless and the better locking granularity is welcome with NUMA enabled
even when superpage support is not used.
Requested by: alc
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21982
After removing wmb(), vm_set_rendezvous_func() became super trivial, so
there was no point in keeping it.
The wmb (sfence on amd64, lock nop on i386) was not needed. This can be
explained from several points of view.
First, wmb() is used for store-store ordering (although, the primitive
is undocumented). There was no obvious subsequent store that needed the
barrier.
Second, x86 has a memory model with strong ordering including total
store order. An explicit store barrier may be needed only when working
with special memory (device, special caching mode) or using special
instructions (non-temporal stores). That was not the case for this
code.
Third, I believe that there is a misconception that sfence "flushes" the
store buffer in a sense that it speeds up the propagation of stores from
the store buffer to the global visibility. I think that such
propagation always happens as fast as possible. sfence only makes
subsequent stores wait for that propagation to complete. So, sfence is
only useful for ordering of stores and only in the situations described
above.
Reviewed by: jhb
MFC after: 23 days
Differential Revision: https://reviews.freebsd.org/D21978
- We load the kernel at 0x200000. Memory below that address need not
be executable, so do not map it as such.
- Remove references to .ldata and related sections in the kernel linker
script. They come from ld.bfd's default linker script, but are not
used, and we now use ld.lld to link the amd64 kernel. lld does not
contain a default linker script.
- Pad the .bss to a 2MB as we do between .text and .data. This
forces the loader to load additional files starting in the following
2MB page, preserving the use of superpage mappings for kernel data.
- Map memory above the kernel image with NX. The kernel linker now
upgrades protections as needed, and other preloaded file types
(e.g., entropy, microcode) need not be mapped with execute permissions
in the first place.
Reviewed by: kib
MFC after: 1 month
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21859
Move futex_mtx to linux_common.ko for amd64 and aarch64 along
with respective list/mutex init/destroy.
PR: 240989
Reported by: Alex S <iwtcex@gmail.com>
NetGDB(4) is a component of a system using a panic-time network stack to
remotely debug crashed FreeBSD kernels over the network, instead of
traditional serial interfaces.
There are three pieces in the complete NetGDB system.
First, a dedicated proxy server must be running to accept connections from
both NetGDB and gdb(1), and pass bidirectional traffic between the two
protocols.
Second, the NetGDB client is activated much like ordinary 'gdb' and
similarly to 'netdump' in ddb(4) after a panic. Like other debugnet(4)
clients (netdump(4)), the network interface on the route to the proxy server
must be online and support debugnet(4).
Finally, the remote (k)gdb(1) uses 'target remote <proxy>:<port>' (like any
other TCP remote) to connect to the proxy server.
The NetGDB v1 protocol speaks the literal GDB remote serial protocol, and
uses a 1:1 relationship between GDB packets and sequences of debugnet
packets (fragmented by MTU). There is no encryption utilized to keep
debugging sessions private, so this is only appropriate for local
segments or trusted networks.
Submitted by: John Reimer <john.reimer AT emc.com> (earlier version)
Discussed some with: emaste, markj
Relnotes: sure
Differential Revision: https://reviews.freebsd.org/D21568
Debugnet is a simplistic and specialized panic- or debug-time reliable
datagram transport. It can drive a single connection at a time and is
currently unidirectional (debug/panic machine transmit to remote server
only).
It is mostly a verbatim code lift from netdump(4). Netdump(4) remains
the only consumer (until the rest of this patch series lands).
The INET-specific logic has been extracted somewhat more thoroughly than
previously in netdump(4), into debugnet_inet.c. UDP-layer logic and up, as
much as possible as is protocol-independent, remains in debugnet.c. The
separation is not perfect and future improvement is welcome. Supporting
INET6 is a long-term goal.
Much of the diff is "gratuitous" renaming from 'netdump_' or 'nd_' to
'debugnet_' or 'dn_' -- sorry. I thought keeping the netdump name on the
generic module would be more confusing than the refactoring.
The only functional change here is the mbuf allocation / tracking. Instead
of initiating solely on netdump-configured interface(s) at dumpon(8)
configuration time, we watch for any debugnet-enabled NIC for link
activation and query it for mbuf parameters at that time. If they exceed
the existing high-water mark allocation, we re-allocate and track the new
high-water mark. Otherwise, we leave the pre-panic mbuf allocation alone.
In a future patch in this series, this will allow initiating netdump from
panic ddb(4) without pre-panic configuration.
No other functional change intended.
Reviewed by: markj (earlier version)
Some discussion with: emaste, jhb
Objection from: marius
Differential Revision: https://reviews.freebsd.org/D21421
This updates the protection attributes of subranges of the kernel map.
Unlike pmap_protect(), which is typically used for user mappings,
pmap_change_prot() does not perform lazy upgrades of protections.
pmap_change_prot() also updates the aliasing range of the direct map.
Reviewed by: kib
MFC after: 1 month
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21758
After r352110 the page lock no longer protects a page's identity, so
there is no purpose in locking the page in pmap_mincore(). Instead,
if vm.mincore_mapped is set to the non-default value of 0, re-lookup
the page after acquiring its object lock, which holds the page's
identity stable.
The change removes the last callers of vm_page_pa_tryrelock(), so
remove it.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21823
callers hold it.
This simplifies pmap code and removes a dependency on the object lock.
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21596
Atomics are used for page busy and valid state when the shared busy is
held. The details of the locking protocol and valid and dirty
synchronization are in the updated vm_page.h comments.
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21594
busy acquires while held.
This allows code that would need to acquire and release a very large number
of page busy locks to use the old mechanism where busy is only checked and
not held. This comes at the cost of false positives but never false
negatives which the single consumer, vm_fault_soft_fast(), handles.
Reviewed by: kib
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21592
There are provisions to do it already with pv_dummy, but new locking code
did not account for it. Previous one did not have the problem because
it hashed the address into the lock array.
While here annotate common vars with __read_mostly and __exclusive_cache_line.
Reported by: Thomas Laus
Tesetd by: jkim, Thomas Laus
Fixes: r353149 ("amd64 pmap: implement per-superpage locks")
Sponsored by: The FreeBSD Foundation
starting at the max. domain, and then work down. Then existing FreeBSD
drivers will attach. Interrupt routing from the VMD MSI-X to the NVME
drive is not well known, so any interrupt is sent to all children that
register.
VROC used Intel meta data so graid(8) works with it. However, graid(8)
supports RAID 0,1,10 for read and write. I have some early code to
support writes with RAID 5. Note that RAID 5 can have life issues
with SSDs since it can cause write amplification from updating the parity
data.
Hot plug support needs a change to skip the following check to work:
if (pcib_request_feature(dev, PCI_FEATURE_HP) != 0) {
in sys/dev/pci/pci_pci.c.
Looked at by: imp, rpokala, bcr
Differential Revision: https://reviews.freebsd.org/D21383
ABI already guarantees the direction is forward. Note this does not take care
of i386-specific cld's.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21906
This matches the state prior to r353149 and fixes crashes with DRM
modules.
Reported and tested by: cy, garga, Krasznai Andras
Fixes: r353149 ("amd64 pmap: implement per-superpage locks")
Sponsored by: The FreeBSD Foundation
This provides a framework to define a template describing
a set of "variables of interest" and the intended way for
the framework to maintain them (for example the maximum, sum,
t-digest, or a combination thereof). Afterwards the user
code feeds in the raw data, and the framework maintains
these variables inside a user-provided, opaque stats blobs.
The framework also provides a way to selectively extract the
stats from the blobs. The stats(3) framework can be used in
both userspace and the kernel.
See the stats(3) manual page for details.
This will be used by the upcoming TCP statistics gathering code,
https://reviews.freebsd.org/D20655.
The stats(3) framework is disabled by default for now, except
in the NOTES kernel (for QA); it is expected to be enabled
in amd64 GENERIC after a cool down period.
Reviewed by: sef (earlier version)
Obtained from: Netflix
Relnotes: yes
Sponsored by: Klara Inc, Netflix
Differential Revision: https://reviews.freebsd.org/D20477
The current 256-lock sized array is a problem in the following ways:
- it's way too small
- there are 2 locks per cacheline
- it is not NUMA-aware
Solve these issues by introducing per-superpage locks backed by pages
allocated from respective domains.
This significantly reduces contention e.g. during poudriere -j 104.
See the review for results.
Reviewed by: kib
Discussed with: jeff
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21833
Four drivers (hpt27xx, hptmv, hptnr, hptrr, hpt27xx) include precompiled
binary objects; have users load them as modules if they are needed.
Additional work (i.e., integrating devmatch) required before MFC.
Reviewed by: markj
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21865