mapping to the old read-only page with a mapping to the new read-write page.
To destroy the old mapping, pmap_enter() must destroy its page table and PV
entries and invalidate its TLB entry. This change simply invalidates that
TLB entry a little earlier, specifically, on amd64 and arm64, before the PV
list lock is held.
Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D23027
Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D21427
syscall is to query the CPU number and the NUMA domain the calling
thread is currently running on. The third argument is ignored.
It doesn't do anything regarding scheduling - it's literally
just a way to query the current state, without any guarantees
you won't get rescheduled an opcode later.
This unbreaks Java from CentOS 8
(java-11-openjdk-11.0.5.10-0.el8_0.x86_64).
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22972
copy_file_range(2) is implemented natively since r350315, make it available
for Linux binaries too.
Reviewed by: kib (mentor), trasz (previous version)
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D22959
Due to clang and LLD's tendency to use a PLT for builtins, and as they
don't have full support for EABI, we sometimes have to deal with a PLT in
.ko files in a clang-built kernel.
As such, augment the in-kernel linker to support jump table processing.
As there is no particular reason to support lazy binding in kernel modules,
only implement Secure-PLT immediate binding.
As part of these changes, add elf_cpu_parse_dynamic() to the MD API of the
in-kernel linker (except on platforms that use raw object files.)
The new function will allow MD code to act on MD tags in _DYNAMIC.
Use this new function in the PowerPC MD code to ensure BSS-PLT modules using
PLT will be rejected during insertion, and to poison the runtime resolver to
ensure we get a clear panic reason if a call is made to the resolver.
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D22608
fix an assert violation introduced in r355784. Without this spinlock_exit()
may see owepreempt and switch before reducing the spinlock count. amd64
had been optimized to do a single critical enter/exit regardless of the
number of spinlocks which avoided the problem and this optimization had
not been applied elsewhere.
Reported by: emaste
Suggested by: rlibby
Discussed with: jhb, rlibby
Tested by: manu (arm64)
than "/compat/linux". Useful when you have several compat directories
with different Linux versions and you don't want to clash with files
installed by linux-c7 packages.
Reviewed by: bcr (manpages)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22574
over the usual fsync(2).
This silences some warnings when running "apt-get upgrade".
Reviewed by: brooks, emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22371
s/BIT_NAND/BIT_ANDNOT/, and for CPU and DOMAINSET too. The actual
implementation is "and not" (or "but not"), i.e. A but not B.
Fortunately this does appear to be what all existing callers want.
Don't supply a NAND (not (A and B)) operation at this time.
Discussed with: jeff
Reviewed by: cem
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D22791
This is a 32-bit structure embedded in each vm_page, consisting mostly
of page queue state. The use of a structure makes it easy to store a
snapshot of a page's queue state in a stack variable and use cmpset
loops to update that state without requiring the page lock.
This change merely adds the structure and updates references to atomic
state fields. No functional change intended.
Reviewed by: alc, jeff, kib
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D22650
Partially revert r354741 and r354754 and go back to allocating a
fixed-size chunk of stack space for the auxiliary vector. Keep
sv_copyout_auxargs but change it to accept the address at the end of
the environment vector as an input stack address and no longer
allocate room on the stack. It is now called at the end of
copyout_strings after the argv and environment vectors have been
copied out.
This should fix a regression in r354754 that broke the stack alignment
for newer Linux amd64 binaries (and probably broke Linux arm64 as
well).
Reviewed by: kib
Tested on: amd64 (native, linux64 (only linux-base-c7), and i386)
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D22695
Use the power of variable to avoid spelling out source and generated
files too many times. The previous Makefiles were hard to read, hard to
edit, and badly formatted.
Reviewed by: kevans, emaste
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22714
o Remove All Rights Reserved from my notices
o imp@FreeBSD.org everywhere
o regularize punctiation, eliminate date ranges
o Make sure that it's clear that I don't claim All Rights reserved by listing
All Rights Reserved on same line as other copyright holders (but not
me). Other such holders are also listed last where it's clear.
- Use ustringp for the location of the argv and environment strings
and allow destp to travel further down the stack for the stackgap
and auxv regions.
- Update the Linux copyout_strings variants to move destp down the
stack as was done for the native ABIs in r263349.
- Stop allocating a space for a stack gap in the Linux ABIs. This
used to hold translated system call arguments, but hasn't been used
since r159992.
Reviewed by: kib
Tested on: md64 (amd64, i386, linux64), i386 (i386, linux)
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D22501
flua is bootstrapped as part of the build for those on older
versions/revisions that don't yet have flua installed. Once upgraded past
r354833, "make sysent" will again naturally work as expected.
Reviewed by: brooks
Differential Revision: https://reviews.freebsd.org/D21894
The purpose of this option is to make it easier to track down memory
corruption bugs by reducing the number of malloc(9) types that might
have recently been associated with a given chunk of memory. However, it
increases fragmentation and is disabled in release kernels.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Change the FreeBSD ELF ABIs to use this new hook to copyout ELF auxv
instead of doing it in the sv_fixup hook. In particular, this new
hook allows the stack space to be allocated at the same time the auxv
values are copied out to userland. This allows us to avoid wasting
space for unused auxv entries as well as not having to recalculate
where the auxv vector is by walking back up over the argv and
environment vectors.
Reviewed by: brooks, emaste
Tested on: amd64 (amd64 and i386 binaries), i386, mips, mips64
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D22355
This driver allows to usage of the paravirt SCSI controller
in VMware products like ESXi. The pvscsi driver provides a
substantial performance improvement in block devices versus
the emulated mpt and mps SCSI/SAS controllers.
Error handling in this driver has not been extensively tested
yet.
Submitted by: vbhakta@vmware.com
Relnotes: yes
Sponsored by: VMware, Panzura
Differential Revision: D18613
Pointer arguments should be of the form "<type> *..." and not "<type>* ...".
No functional change.
Reviewed by: kevans
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22373
Save the address of the trap frame in %ebp on kernel entry. This
automatically provides it in struct i386_frame.f_frame to unwinder.
While there, more accurately handle the terminating frames,
Reviewed by: avg, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D22321
This change removes TRAP_INTERRUPT and TRAP_TIMERINT frame types.
Their names are a bit confusing: trap + interrupt, what is that?
The TRAP_TIMERINT name is too specific -- can it only be used for timer
"trap-interrupts"? What is so special about them?
My understanding of the code is that INTERRUPT, TRAP_INTERRUPT and
TRAP_TIMERINT differ only in how an offset from callee's frame pointer to a
trap frame on the stack is calculated. And that depends on a number of
arguments that a special handler passes to a callee (a function with a
normal C calling convention).
So, this change makes that logic explicit and collapses all interrupt frame
types into the INTERRUPT type.
Reviewed by: markj
Discussed with: kib, jhb
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D22303
a tmpfs to be mounted there, and because they like to verify it's
actually a mountpoint, a symlink won't do.
Reviewed by: dchagin (earlier version)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20333
Move futex_mtx to linux_common.ko for amd64 and aarch64 along
with respective list/mutex init/destroy.
PR: 240989
Reported by: Alex S <iwtcex@gmail.com>
NetGDB(4) is a component of a system using a panic-time network stack to
remotely debug crashed FreeBSD kernels over the network, instead of
traditional serial interfaces.
There are three pieces in the complete NetGDB system.
First, a dedicated proxy server must be running to accept connections from
both NetGDB and gdb(1), and pass bidirectional traffic between the two
protocols.
Second, the NetGDB client is activated much like ordinary 'gdb' and
similarly to 'netdump' in ddb(4) after a panic. Like other debugnet(4)
clients (netdump(4)), the network interface on the route to the proxy server
must be online and support debugnet(4).
Finally, the remote (k)gdb(1) uses 'target remote <proxy>:<port>' (like any
other TCP remote) to connect to the proxy server.
The NetGDB v1 protocol speaks the literal GDB remote serial protocol, and
uses a 1:1 relationship between GDB packets and sequences of debugnet
packets (fragmented by MTU). There is no encryption utilized to keep
debugging sessions private, so this is only appropriate for local
segments or trusted networks.
Submitted by: John Reimer <john.reimer AT emc.com> (earlier version)
Discussed some with: emaste, markj
Relnotes: sure
Differential Revision: https://reviews.freebsd.org/D21568
Debugnet is a simplistic and specialized panic- or debug-time reliable
datagram transport. It can drive a single connection at a time and is
currently unidirectional (debug/panic machine transmit to remote server
only).
It is mostly a verbatim code lift from netdump(4). Netdump(4) remains
the only consumer (until the rest of this patch series lands).
The INET-specific logic has been extracted somewhat more thoroughly than
previously in netdump(4), into debugnet_inet.c. UDP-layer logic and up, as
much as possible as is protocol-independent, remains in debugnet.c. The
separation is not perfect and future improvement is welcome. Supporting
INET6 is a long-term goal.
Much of the diff is "gratuitous" renaming from 'netdump_' or 'nd_' to
'debugnet_' or 'dn_' -- sorry. I thought keeping the netdump name on the
generic module would be more confusing than the refactoring.
The only functional change here is the mbuf allocation / tracking. Instead
of initiating solely on netdump-configured interface(s) at dumpon(8)
configuration time, we watch for any debugnet-enabled NIC for link
activation and query it for mbuf parameters at that time. If they exceed
the existing high-water mark allocation, we re-allocate and track the new
high-water mark. Otherwise, we leave the pre-panic mbuf allocation alone.
In a future patch in this series, this will allow initiating netdump from
panic ddb(4) without pre-panic configuration.
No other functional change intended.
Reviewed by: markj (earlier version)
Some discussion with: emaste, jhb
Objection from: marius
Differential Revision: https://reviews.freebsd.org/D21421
After r352110 the page lock no longer protects a page's identity, so
there is no purpose in locking the page in pmap_mincore(). Instead,
if vm.mincore_mapped is set to the non-default value of 0, re-lookup
the page after acquiring its object lock, which holds the page's
identity stable.
The change removes the last callers of vm_page_pa_tryrelock(), so
remove it.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21823
callers hold it.
This simplifies pmap code and removes a dependency on the object lock.
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21596
busy acquires while held.
This allows code that would need to acquire and release a very large number
of page busy locks to use the old mechanism where busy is only checked and
not held. This comes at the cost of false positives but never false
negatives which the single consumer, vm_fault_soft_fast(), handles.
Reviewed by: kib
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21592
At the moment i386 does not provide 64-bit atomic operations in
userland. Exposing some atomic_*_64 defines can cause unnecessary
confusion.
Discussed with: kib
MFC after: 2 weeks
Four drivers (hpt27xx, hptmv, hptnr, hptrr, hpt27xx) include precompiled
binary objects; have users load them as modules if they are needed.
Additional work (i.e., integrating devmatch) required before MFC.
Reviewed by: markj
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21865
Centralize calculation of signal and ucode delivered on unhandled page
fault in new function vm_fault_trap(). MD trap_pfault() now almost
always uses the signal numbers and error codes calculated in
consistent MI way.
This introduces the protection fault compatibility sysctls to all
non-x86 architectures which did not have that bug, but apparently they
were already much more wrong in selecting delivered signals on
protection violations.
Change the delivered signal for accesses to mapped area after the
backing object was truncated. According to POSIX description for
mmap(2):
The system shall always zero-fill any partial page at the end of an
object. Further, the system shall never write out any modified
portions of the last page of an object which are beyond its
end. References within the address range starting at pa and
continuing for len bytes to whole pages following the end of an
object shall result in delivery of a SIGBUS signal.
An implementation may generate SIGBUS signals when a reference
would cause an error in the mapped object, such as out-of-space
condition.
Adjust according to the description, keeping the existing
compatibility code for SIGSEGV/SIGBUS on protection failures.
For situations where kernel cannot handle page fault due to resource
limit enforcement, SIGBUS with a new error code BUS_OBJERR is
delivered. Also, provide a new error code SEGV_PKUERR for SIGSEGV on
amd64 due to protection key access violation.
vm_fault_hold() is renamed to vm_fault(). Fixed some nits in
trap_pfault()s like mis-interpreting Mach errors as errnos. Removed
unneeded truncations of the fault addresses reported by hardware.
PR: 211924
Reviewed by: alc
Discussed with: jilles, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21566
Convert all remaining references to that field to "ref_count" and update
comments accordingly. No functional change intended.
Reviewed by: alc, kib
Sponsored by: Intel, Netflix
Differential Revision: https://reviews.freebsd.org/D21768
by defining pg_nx as zero for non-PAE and correspondingly simplifying
some expressions.
Suggested and reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21757
- Remove a dead variable from the amd64 pmap_extract_and_hold().
- Fix grammar in the vm_page_wire man page.
Reported by: alc
Reviewed by: alc, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21639