It turns out that this value is not used within the system call code
under normal conditions, except when using tracing tools like ktrace.
If we forget to set this value, it is set to random garbage. This may
cause ktrace to hang indefinitely, making it impossible to kill.
Reported by: Michael Plass
PR: 210800
MFC before: 11.0-RELEASE
mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in
the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try,
for example, to run tasks on CPUs that did not exist or to allocate too
few buffers on systems with sparse CPU IDs in which there are holes in the
range and mp_maxid > mp_ncpus. Such circumstances generally occur on
systems with SMT, but on which SMT is disabled. This patch restores system
operation at least on POWER8 systems configured in this way.
There are a number of other places in the kernel with potential problems
in these situations, but where sparse CPU IDs are not currently known
to occur, mostly in the ARM machine-dependent code. These will be fixed
in a follow-up commit after the stable/11 branch.
PR: kern/210106
Reviewed by: jhb
Approved by: re (glebius)
threads, to make it less confusing and using modern kernel terms.
Rename the functions to reflect current use of the functions, instead
of the historic KSE conventions:
cpu_set_fork_handler -> cpu_fork_kthread_handler (for kthreads)
cpu_set_upcall -> cpu_copy_thread (for forks)
cpu_set_upcall_kse -> cpu_set_upcall (for new threads creation)
Reviewed by: jhb (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Approved by: re (hrs)
Differential revision: https://reviews.freebsd.org/D6731
does not cover the dynamically registered ficititious ranges, and
fictitious pages mappings are not promoted. Offer a dummy struct
md_page to fetch constant superpage pv list generation to satisfy
logic. Also, by initializing the pv_dummy pv_list to empty, we can
remove several explicit PG_FICTITIOUS tests.
Reported and tested by: Michael Butler <imb@protected-networks.net>
(previous version)
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D6728
Approved by: re (hrs)
Do not try to change attributes for DMAP when working on a mapping
which is not covered by the DMAP. This was reported on real system
where a BAR of a device (NTB) was mapped outside the PCI window.
Reported and tested by: mav
Reviewed by: jhb, mav
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D6668
The only current purpose of the pvh lock was explained there
On Wed, Jan 09, 2013 at 11:46:13PM -0600, Alan Cox wrote:
> Let me lay out one example for you in detail. Suppose that we have
> three processors and two of these processors are actively using the same
> pmap. Now, one of the two processors sharing the pmap performs a
> pmap_remove(). Suppose that one of the removed mappings is to a
> physical page P. Moreover, suppose that the other processor sharing
> that pmap has this mapping cached with write access in its TLB. Here's
> where the trouble might begin. As you might expect, the processor
> performing the pmap_remove() will acquire the fine-grained lock on the
> PV list for page P before destroying the mapping to page P. Moreover,
> this processor will ensure that the vm_page's dirty field is updated
> before releasing that PV list lock. However, the TLB shootdown for this
> mapping may not be initiated until after the PV list lock is released.
> The processor performing the pmap_remove() is not problematic, because
> the code being executed by that processor won't presume that the mapping
> is destroyed until the TLB shootdown has completed and pmap_remove() has
> returned. However, the other processor sharing the pmap could be
> problematic. Specifically, suppose that the third processor is
> executing the page daemon and concurrently trying to reclaim page P.
> This processor performs a pmap_remove_all() on page P in preparation for
> reclaiming the page. At this instant, the PV list for page P may
> already be empty but our second processor still has a stale TLB entry
> mapping page P. So, changes might still occur to the page after the
> page daemon believes that all mappings have been destroyed. (If the PV
> entry had still existed, then the pmap lock would have ensured that the
> TLB shootdown completed before the pmap_remove_all() finished.) Note,
> however, the page daemon will know that the page is dirty. It can't
> possibly mistake a dirty page for a clean one. However, without the
> current pvh global locking, I don't think anything is stopping the page
> daemon from starting the laundering process before the TLB shootdown has
> completed.
>
> I believe that a similar example could be constructed with a clean page
> P' and a stale read-only TLB entry. In this case, the page P' could be
> "cached" in the cache/free queues and recycled before the stale TLB
> entry is flushed.
TLBs for addresses with updated PTEs are always flushed before pmap
lock is unlocked. On the other hand, amd64 pmap code does not always
flushes TLBs before PV list locks are unlocked, if previously PTEs
were cleared and PV entries removed.
To handle the situations where a thread might notice empty PV list but
third thread still having access to the page due to TLB invalidation
not finished yet, introduce delayed invalidation. Comparing with the
pvh_global_lock, DI does not block entered thread when
pmap_remove_all() or pmap_remove_write() (callers of
pmap_delayed_invl_wait()) are executed in parallel. But _invl_wait()
callers are blocked until all previously noted DI blocks are leaved,
thus ensuring that neccessary TLB invalidations were performed before
returning from pmap_remove_all() or pmap_remove_write().
See comments for detailed description of the mechanism, and also for
the explanations why several pmap methods, most important
pmap_enter(), do not need DI protection.
Reviewed by: alc, jhb (turnstile KPI usage)
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D5747
bus_get_cpus() returns a specified set of CPUs for a device. It accepts
an enum for the second parameter that indicates the type of cpuset to
request. Currently two valus are supported:
- LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to
the device when DEVICE_NUMA is enabled)
- INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core)
For systems that do not support NUMA (or if it is not enabled in the kernel
config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus'
by default. The idea is that INTR_CPUS should always return a valid set.
Device drivers which want to use per-CPU interrupts should start using
INTR_CPUS instead of simply assigning interrupts to all available CPUs.
In the future we may wish to add tunables to control the policy of
INTR_CPUS (e.g. should it be local-only or global, should it ignore
SMT threads or not).
The x86 nexus driver exposes the internal set of interrupt CPUs from the
the x86 interrupt code via INTR_CPUS.
The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable
LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and
the global INTR_CPUS set from the nexus driver with the per-domain set from
_PXM to generate a local INTR_CPUS set for child devices.
Compared to the r298933, this version uses 'struct _cpuset' in
<sys/bus.h> instead of 'cpuset_t' to avoid requiring <sys/param.h>
(<sys/_cpuset.h> still requires <sys/param.h> for MAXCPU even though
<sys/_bitset.h> does not after recent changes).
PCI-express HotPlug support is implemented via bits in the slot
registers of the PCI-express capability of the downstream port along
with an interrupt that triggers when bits in the slot status register
change.
This is implemented for FreeBSD by adding HotPlug support to the
PCI-PCI bridge driver which attaches to the virtual PCI-PCI bridges
representing downstream ports on HotPlug slots. The PCI-PCI bridge
driver registers an interrupt handler to receive HotPlug events. It
also uses the slot registers to determine the current HotPlug state
and drive an internal HotPlug state machine. For simplicty of
implementation, the PCI-PCI bridge device detaches and deletes the
child PCI device when a card is removed from a slot and creates and
attaches a PCI child device when a card is inserted into the slot.
The PCI-PCI bridge driver provides a bus_child_present which claims
that child devices are present on HotPlug-capable slots only when a
card is inserted. Rather than requiring a timeout in the RC for
config accesses to not-present children, the pcib_read/write_config
methods fail all requests when a card is not present (or not yet
ready).
These changes include support for various optional HotPlug
capabilities such as a power controller, mechanical latch,
electro-mechanical interlock, indicators, and an attention button.
It also includes support for devices which require waiting for
command completion events before initiating a subsequent HotPlug
command. However, it has only been tested on ExpressCard systems
which support surprise removal and have none of these optional
capabilities.
PCI-express HotPlug support is conditional on the PCI_HP option
which is enabled by default on arm64, x86, and powerpc.
Reviewed by: adrian, imp, vangyzen (older versions)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D6136
call pmap_invalidate_page() even though they are not destroying a leaf-
level page table entry.
Eliminate some bogus white-space characters in a comment.
Reviewed by: kib
bus_get_cpus() returns a specified set of CPUs for a device. It accepts
an enum for the second parameter that indicates the type of cpuset to
request. Currently two valus are supported:
- LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to
the device when DEVICE_NUMA is enabled)
- INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core)
For systems that do not support NUMA (or if it is not enabled in the kernel
config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus'
by default. The idea is that INTR_CPUS should always return a valid set.
Device drivers which want to use per-CPU interrupts should start using
INTR_CPUS instead of simply assigning interrupts to all available CPUs.
In the future we may wish to add tunables to control the policy of
INTR_CPUS (e.g. should it be local-only or global, should it ignore
SMT threads or not).
The x86 nexus driver exposes the internal set of interrupt CPUs from the
the x86 interrupt code via INTR_CPUS.
The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable
LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and
the global INTR_CPUS set from the nexus driver with the per-domain set from
_PXM to generate a local INTR_CPUS set for child devices.
Reviewed by: wblock (manpage)
Differential Revision: https://reviews.freebsd.org/D5519
Summary:
The Initial Local APIC ID is returned by CPUID function 1 (in EBX).
On AMD Family 10h systems the way that ID is built is controlled by
an MSR bit (InitApicIdCpuIdLo). BKDG instructs BIOS to set it in a
certain way, but a BIOS can be buggy. In that case the ID can confuse
tools that use it, e.g. hwloc.
For example, on a system that I own real Local APIC IDs are configured
as 0, 1, 2, 3, but IDs reported via CPUID.1 are 0, 0x40, 0x80, 0xc0.
See: https://github.com/open-mpi/hwloc/issues/183
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6060
rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.
This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.
(And 4Kn minidump support, but only for amd64.)
Make sure all I/O to the dump device is of the native sector size. To
that end, we keep a native sector sized buffer associated with dump
devices (di->blockbuf) and use it to pad smaller objects as needed (e.g.
kerneldumpheader).
Add dump_write_pad() as a convenience API to dump smaller objects with
zero padding. (Rather than pull in NPM leftpad, we wrote our own.)
Savecore(1) has been updated to deal with these dumps. The format for
512-byte sector dumps should remain backwards compatible.
Minidumps for other architectures are left as an exercise for the
reader.
PR: 194279
Submitted by: ambrisko@
Reviewed by: cem (earlier version), rpokala
Tested by: rpokala (4Kn/512 except 512 fulldump), cem (512 fulldump)
Relnotes: yes
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D5848
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: jhb, kib, sephe
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5910
doreti provides the common code path for returning from interrupt
andlers on x86. Exposing doreti as a global symbol allows kernel
modules to include low-level interrupt handlers instead of requiring
all low-level handlers to be statically compiled into the kernel.
Submitted by: Howard Su <howard0su@gmail.com>
Reviewed by: kib
Some BIOSes disable AMD Topology extension on AMD Family 15h notebook
processors. We re-enable the extension, so that we can properly discover
core and cache topology. Linux seems to do the same.
Reported by: Johannes Dieterich <dieterich.joh@gmail.com>
Reviewed by: jhb, kib
Tested by: Johannes Dieterich <dieterich.joh@gmail.com>
(earlier version)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D5883
DTrace-related exceptions in userland code are handled elsewhere.
One practical problem was a crash in dtrace_invop_start() when saved
%rsp pointed to a virtual address that was not backed.
i386 code already ignored userland exceptions.
Reviewed by: markj, kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D5906
We're currently seeing how hard it would be to run CloudABI binaries on
operating systems cannot be modified easily (Windows, Mac OS X). The
idea is that we want to just run them without any sandboxing. Now
that CloudABI executables are PIE, this is already a bit easier, but TLS
is still problematic:
- CloudABI executables want to write to the %fs, which typically
requires extra system calls by the emulator every time it needs to
switch between CloudABI's and its own TLS.
- If CloudABI executables overwrite the %fs base unconditionally, it
also becomes harder for the emulator to store a backup of the old
value of %fs. To solve this, let's no longer overwrite %fs, but just
%fs:0.
As CloudABI's C library does not use a TCB, this space can now be used
by an emulator to keep track of its internal state. The executable can
now safely overwrite %fs:0, as long as it makes sure that the TCB is
copied over to the new TLS area.
Ensure that there is an initial TLS area set up when the process starts,
only containing a bogus TCB. We don't really care about its contents on
FreeBSD.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D5836
kern.features.linux: 1 meaning linux 32 bits binaries are supported
kern.features.linux64: 1 meaning linux 64 bits binaries are supported
The goal here is to help 3rd party applications (including ports) to determine
if the host do support linux emulation
Reviewed by: dchagin
MFC after: 1 week
Relnotes: yes
Differential Revision: D5830
- Set BI_CAN_EXEC_DYN, so we can execute ET_DYN ELF files in addition to
regular ET_EXECs.
- Provide an AT_BASE entry in the auxiliary vector, so the executable
knows at which address it got loaded and can apply relocations.
Simplify and unify placeholder type definitions.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D5771
This moves the enabling of interrupts slightly earlier (the old location
was still before devices were enumerated and probed) and does it in the
interrupt code (rather than in the device configuration code). This
also avoids tripping over an assertion on the first TLB shootdown with
earlier AP startup.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D5710
during argument validity verification, unbound zero'ing of the process LDT
and adjacent memory can be initiated from usermode.
Submitted by: CORE Security
Patch by: kib
Security: SA-16:15
non-multiple of 64 bytes. Thereafter, the user state save area is
misaligned, which triggers assertion in the debugging kernels, or
segmentation violation on accesses for non-debugging configs.
Force the desired alignment of the user save area as the fix
(workaround is to disable bit 9 in the hw.xsave_mask loader tunable).
This correction is required for booting on the upcoming Intel' Purley
platform.
Reported and tested by: "Pieper, Jeffrey E" <jeffrey.e.pieper@intel.com>,
jimharris
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
- Advertise the word size for CloudABI ABIs via the SV_LP64 flag. All of
the other ABIs include either SV_ILP32 or SV_LP64.
- Fix kdump to not assume a 32-bit ABI if the ABI flags field is non-zero
but SV_LP64 isn't set. Instead, only assume a 32-bit ABI if SV_ILP32 is
set and fallback to the unknown value of "00" if neither SV_LP64 nor
SV_ILP32 is set.
Reviewed by: kib, ed
Differential Revision: https://reviews.freebsd.org/D5560
need to include it explicitly when <vm/vm_param.h> is already included.
Suggested by: alc
Reviewed by: alc
Differential Revision: https://reviews.freebsd.org/D5379
POSIX requires these members to be of type void * rather than the
char * inherited from 4BSD. NetBSD and OpenBSD both changed their
fields to void * back in 1998. No new build failures were reported
via an exp-run.
PR: 206503 (exp-run)
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D5092
AT_SECURE auxv entry has been added to the Linux 2.5 kernel to pass a
boolean flag indicating whether secure mode should be enabled. 1 means
that the program has changes its credentials during the execution.
Being exported AT_SECURE used by glibc issetugid() call.
Submitted by: imp, dchagin
Security: FreeBSD-SA-16:10.linux
Security: CVE-2016-1883
registers. This brings amd64 on par with i386, providing consistent
initial FPU state.
Note that we do not clear any extended state, at least because kernel
does not understand extended state structure and consequences of zero
overwrite after fninit()/fpusave().
Submitted by: joss.upton@yahoo.com
PR: 206370
MFC after: 2 weeks
The set_robust_list system call request the kernel to record the head
of the list of robust futexes owned by the calling thread. The head
argument is the list head to record.
The get_robust_list system call should return the head of the robust
list of the thread whose thread id is specified in pid argument.
The list head should be stored in the location pointed to by head
argument.
In contrast, our implemenattion of get_robust_list system call copies
the known portion of memory pointed by recorded in set_robust_list
system call pointer to the head of the robust list to the location
pointed by head argument.
So, it is possible for a local attacker to read portions of kernel
memory, which may result in a privilege escalation.
Submitted by: mjg
Security: SA-16:03.linux
providing compiled-in static environment data that is used instead of any
data passed in from a boot loader.
Previously 'env' worked only on i386 and arm xscale systems, because it
required the MD startup code to examine the global envmode variable and
decide whether to use static_env or an environment obtained from the boot
loader, and set the global kern_envp accordingly. Most startup code wasn't
doing so. Making things even more complex, some mips startup code uses an
alternate scheme that involves calling init_static_kenv() to pass an empty
buffer and its size, then uses a series of kern_setenv() calls to populate
that buffer.
Now all MD startup code calls init_static_kenv(), and that routine provides
a single point where envmode is checked and the decision is made whether to
use the compiled-in static_kenv or the values provided by the MD code.
The routine also continues to serve its original purpose for mips; if a
non-zero buffer size is passed the routine installs the empty buffer ready
to accept kern_setenv() values. Now if the size is zero, the provided buffer
full of existing env data is installed. A NULL pointer can be passed if the
boot loader provides no env data; this allows the static env to be installed
if envmode is set to do so.
Most of the work here is a near-mechanical change to call the init function
instead of directly setting kern_envp. A notable exception is in xen/pv.c;
that code was originally installing a buffer full of preformatted env data
along with its non-zero size (like mips code does), which would have allowed
kern_setenv() calls to wipe out the preformatted data. Now it passes a zero
for the size so that the buffer of data it installs is treated as
non-writeable.
While here, move the common bits of <machine/cputypes.h> to
<x86/cputypes.h> as well.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D4670
This variable was added to sys/x86/include/x86_var.h recently.
This unbreaks building kernel source that #includes both md_var.h and x86_var.h
with gcc 4.2.1 on amd64
Differential Revision: https://reviews.freebsd.org/D4686
Reviewed by: kib
X-MFC with: r291949
Sponsored by: EMC / Isilon Storage Division
new headers x86/include x86_var.h and x86_smp.h.
Reviewed by: emaste, jhb
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D4358
Typical TLBs have 40-512 entries available. At some point, iterating
every single page in a requested invalidation range and issuing invlpg
on it is more expensive than flushing the TLB and allowing it to reload
on demand.
Broadwell CPUs have 1536 L2 TLB entries, so I've picked the arbitrary
number 4096 entries as a hueristic at which point we flush TLB rather
than invalidating every single potential page.
Reviewed by: alc
Feedback from: jhb, kib
MFC notes: Depends on r291688
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D4280
the PG_G global pte flag, pmap_invalidate_all() fails to flush global
TLB entries [*]. This is because TLB shootdown handler for such
configs reloads CR3, and on i386 pmap_invalidate_all() does the same
for the initiating CPU. Note that current code does not issue total
invalidation requests for the kernel_pmap.
Rename amd64 function invltlb_globpcid() to invltlb_glob(), it is not
specific for PCID for quite some time, and implement the same
functionality for i386. Use the function instead of invltlb() in
shootdown handlers and in i386 pmap_invalidate_all(), but only for the
kernel pmap (which maps pages with the PG_G attribute set), which
takes care of PG_G TLB entries on flush.
To detect the affected pmap in i386 TLB shootdown handler, pmap should
be passed to the smp_masked_invltlb() function, which makes amd64 and
i386 TLB shootdown code almost identical. Merge the code under x86/.
Noted by: jhb [*]
Reviewed by: cem, jhb, pho
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D4346
sysent.
sv_prepsyscall is unused.
sv_sigsize and sv_sigtbl translate signal number from the FreeBSD
namespace into the ABI domain. It is only utilized on i386 for iBCS2
binaries. The issue with this approach is that signals for iBCS2 were
delivered with the FreeBSD signal frame layout, which does not follow
iBCS2. The same note is true for any other potential user if
sv_sigtbl. In other words, if ABI needs signal number translation, it
really needs custom sv_sendsig method instead.
Sponsored by: The FreeBSD Foundation
sysentvec. This allows the timekeep data to be shared between similar
ABIs which cannot share sysentvec.
Make the timekeep_push_vdso() tick callback to the timekeep structures
instead of sysentvecs. If several sysentvec share the vdso_sv_tk
structure, we would update the userspace data several times on each
tick, without the change.
Only allocate vdso_sv_tk in the exec_sysvec_init() sysinit when
sysentvec is marked with the new SV_TIMEKEEP flag. This saves
allocation and update of unneeded vdso_sv_tk for ABIs which do not
provide userspace gettimeofday yet, which are PowerPCs arches right
now.
Make vdso_sv_tk allocator public, namely split out and export
alloc_sv_tk() and alloc_sv_tk_compat32(). ABIs which share timekeep
data now can allocate it manually and share as appropriate.
Requested by: nwhitehorn
Tested by: nwhitehorn, pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
certain kernel structures for use by debuggers. This mostly aids
in examining cores from a kernel without debug symbols as a debugger
can infer these values if debug symbols are available.
One set of variables describes the layout of 'struct linker_file' to
walk the list of loaded kernel modules.
A second set of variables describes the layout of 'struct proc' and
'struct thread' to walk the list of processes in the kernel and the
threads in each process.
The 'pcb_size' variable is used to index into the stoppcbs[] array.
The 'vm_maxuser_address' is used to distinguish kernel virtual addresses
from user addresses. This doesn't have to be perfect, and
'vm_maxuser_address' is a cheap and simple way to differentiate kernel
pointers from simple values like TIDs and PIDs.
While here, annotate the fields in struct pcb used by kgdb on amd64
and i386 to note that their ABI should be preserved. Annotations for
other platforms will be added in the future.
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D3773
pmap_change_attr must change the memory type of both the requested KVA
and the corresponding DMAP mappings (if such mappings exist), to satisfy
an Intel requirement that two or more mappings to the same physical
pages must have the same memory type.
However, not all kernel mapped pages have corresponding DMAP mappings --
for example, 64-bit BARs. Skip fixing up the DMAP for out-of-bounds
addresses.
Submitted by: Steve Wahl <steve_wahl@dell.com>
Reviewed by: alc, jhb
Sponsored by: Dell Compellent
Differential Revision: https://reviews.freebsd.org/D4030
ordered with the MFENCE instruction. Similar weak guarantees are also
specified by the AMD APM vol. 3 rev. 3.22. x86 pmap methods
pmap_invalidate_cache_range() and pmap_invalidate_cache_pages() braced
CLFLUSH loop with MFENCE both before and after the loop.
In the revision 56 of SDM, Intel stated that all existing
implementations of CLFLUSH are strict, CLFLUSH instructions execution
is ordered WRT other CLFLUSH and writes. Also, the strict behaviour
is made architectural.
A new instruction CLFLUSHOPT (which was documented for some time in
the Instruction Set Extensions Programming Reference) provides the
weak behaviour which was previously attributed to CLFLUSH.
Use CLFLUSHOPT when available. When CLFLUSH is used on Intel CPUs, do
not execute MFENCE before and after the flushing loop.
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
linux_syscallnames[] from linux_* to linux32_* to avoid conflicts with
linux64.ko. While here, add support for linux64 binaries to systrace.
- Update NOPROTO entries in amd64/linux/syscalls.master to match the
main table to fix systrace build.
- Add a special case for union l_semun arguments to the systrace
generation.
- The systrace_linux32 module now only builds the systrace_linux32.ko.
module on amd64.
- Add a new systrace_linux module that builds on both i386 and amd64.
For i386 it builds the existing systrace_linux.ko. For amd64 it
builds a systrace_linux.ko for 64-bit binaries.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D3954
linux: fix handling of out-of-bounds syscall attempts
Due to an off by one the code would read an entry past the table, as
opposed to the last entry which contains the nosys handler.
In order to make it easier to support CloudABI on ARM64, move out all of
the bits from the AMD64 cloudabi_sysvec.c into a new file
cloudabi_module.c that would otherwise remain identical. This reduces
the AMD64 specific code to just ~160 lines.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D3974
amd64 and i386 platform code contain very similar xen/xen-os.h
The only differences are:
- Functions/variables/types which were unused in i386/xen/xen-os.h:
* xen_xchg
* __xchg_dummy
* __xg
* __xchg
* atomic_t
* atomic_inc
* rdtscll
The functions/variables/types unused in xen-os.h can be dropped and there
is no more differences betwen amd64 and i386.
The new header is placed in x86/include/xen and each platform will have
dummy headers include x86/xen/*.h. This is to be able to include
machine/xen/*.h in the PV drivers.
Submitted by: Julien Grall <julien.grall@citrix.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D3880
Sponsored by: Citrix Systems R&D
Due to an off by one the code would read an entry past the table, as
opposed to the last entry which contains the nosys handler.
Reported by: Pawel Biernacki <pawel.biernacki gmail.com>
The current Xen console driver is crashing very quickly when using it on
an ARM guest. This is because the console lock is recursive and it may
lead to recursion on the tty lock and/or corrupt the ring pointer.
Furthermore, the console lock is not always taken where it should be and has
to be released too early because of the way the console has been designed.
Over the years, code has been modified to support various new features but
the driver has not been reworked.
This new driver has been rewritten with the idea of only having a small set
of specific function to write either via the shared ring or the hypercall
interface.
Note that HVM support has been left aside for now because it requires
additional features which are not yet supported. A follow-up patch will be
sent with HVM guest support.
List of items that may be good to have but not mandatory:
- Avoid to flush for each character written when using the tty
- Support multiple consoles
Submitted by: Julien Grall <julien.grall@citrix.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D3698
Sponsored by: Citrix Systems R&D
Pull the latest headers for Xen which allow us to add support for ARM and
use new features in FreeBSD.
This is a verbatim copy of the xen/include/public so every headers which
don't exits anymore in the Xen repositories have been dropped.
Note the interface version hasn't been bumped, it will be done in a
follow-up. Although, it requires fix in the code to get it compiled:
- sys/xen/xen_intr.h: evtchn_port_t is already defined in the headers so
drop it.
- {amd64,i386}/include/intr_machdep.h: NR_EVENT_CHANNELS now depends on
xen/interface/event_channel.h, so include it.
- {amd64,i386}/{amd64,i386}/support.S: It's not neccessary to include
machine/intr_machdep.h. This is also fixing build compilation with the
new headers.
- dev/xen/blkfront/blkfront.c: The typedef for blkif_request_segmenthas
been dropped. So directly use struct blkif_request_segment
Finally, modify xen/interface/xen-compat.h to throw a preprocessing error if
__XEN_INTERFACE_VERSION__ is not set. This is allow us to catch any file
where xen/xen-os.h is not correctly included.
Submitted by: Julien Grall <julien.grall@citrix.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D3805
Sponsored by: Citrix Systems R&D
belong to a vm object, they can't be paged out. Since they can't be paged
out, they are never enqueued in a paging queue. Nonetheless, passing
PQ_INACTIVE to vm_page_unwire() creates the appearance that these pages
are being enqueued in the inactive queue. As of r288122, we can avoid
this false impression by passing PQ_NONE.
Submitted by: kmacy (an earlier version)
Differential Revision: https://reviews.freebsd.org/D1674
linkers no longer raise an error when undefined weak symbols are
found, but relocate as if the symbol value was 0. Note that we do not
repeat the mistake of userspace dynamic linker of making the symbol
lookup prefer non-weak symbol definition over the weak one, if both
are available. In fact, kernel linker uses the first definition
found, and ignores duplicates.
Signature of the elf_lookup() and elf_obj_lookup() functions changed
to split result/error code and the symbol address returned.
Otherwise, it is impossible to return zero address as the symbol
value, to MD relocation code. This explains the mechanical changes in
elf_machdep.c sources.
The powerpc64 R_PPC_JMP_SLOT handler did not checked error from the
lookup() call, the patch leaves the code as is (untested).
Reported by: glebius
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
running thread.
It is currently implemented only on amd64 and i386; on these
architectures, it is implemented by raising an NMI on the CPU on which
the target thread is currently running. Unlike stack_save_td(), it may
fail, for example if the thread is running in user mode.
This change also modifies the kern.proc.kstack sysctl to use this function,
so that stacks of running threads are shown in the output of "procstat -kk".
This is handy for debugging threads that are stuck in a busy loop.
Reviewed by: bdrewery, jhb, kib
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3256
The only operation which is prevented by the hold is the kernel stack
swapout for the faulted thread, which should be fine to allow.
Remove useless checks for NULL curproc or curproc->p_vmspace from the
trap_pfault() wrappers on x86 and powerpc.
Reviewed by: alc (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
I/OAT is also referred to as Crystal Beach DMA and is a Platform Storage
Extension (PSE) on some Intel server platforms.
This driver currently supports DMA descriptors only and is part of a
larger effort to upstream an interconnect between multiple systems using
the Non-Transparent Bridge (NTB) PSE.
For now, this driver is only built on AMD64 platforms. It may be ported
to work on i386 later, if that is desired. The hardware is exclusive to
x86.
Further documentation on ioat(4), including API documentation and usage,
can be found in the new manual page.
Bring in a test tool, ioatcontrol(8), in tools/tools/ioat. The test
tool is not hooked up to the build and is not intended for end users.
Submitted by: jimharris, Carl Delsey <carl.r.delsey@intel.com>
Reviewed by: jimharris (reviewed my changes)
Approved by: markj (mentor)
Relnotes: yes
Sponsored by: Intel
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3456
Add a check to preload_search_info to make sure mod is set. Most of the
callers of preload_search_info don't check that the mod parameter is
set, which can cause page faults. While at it, remove some now unnecessary
checks before calling preload_search_info.
Sponsored by: Citrix Systems R&D
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D3440
is a little tight in and by itself, but severily insufficient
when one needs to map a large frame buffer as part of console
initialization. 64MB slop should be enough for a while. As an
example: a 15" MacBook Pro with retina display needs ~28MB of
KVA for the frame buffer.
PR: 193745
map. Handle busdma bouncing and ata PIO accesses by using global
frame used by the current CPU locally for the duration of
pmap_quick_enter/remove_page(). A spin mutex protects the concurent
frame use and prevents thread migration.
Noted by: royger
Reviewed by: alc, jah, royger (previous version)
Sponsored by: The FreeBSD Foundation
frame buffers and memory mapped UARTs.
1. Delay calling cninit() until after pmap_bootstrap(). This makes
sure we have PMAP initialized enough to add translations. Keep
kdb_init() after cninit() so that we have console when we need
to break into the debugger on boot.
2. Unfortunately, the ATPIC code had be moved as well so as to
avoid a spurious trap #30. The reason for which is not known
at this time.
3. In pmap_mapdev_attr(), when we need to map a device prior to the
VM system being initialized, use virtual_avail as the KVA to map
the device at. In particular, avoid using the direct map on amd64
because we can't demote by virtue of not being able to allocate
yet. Keep track of the translation.
Re-use the translation after the VM has been initialized to not
waste KVA and to satisfy the assumption in uart(4) that the handle
returned for the low-level console is the same as later returned
when the device is probed and attached.
4. In pmap_unmapdev() remove the mapping from the table when called
pre-init. Otherwise keep the mapping. During bus probe and attach
device resources are mapped and unmapped multiple times, which
would have us destroy the mapping used by the low-level console.
5. In pmap_init(), set pmap_initialized to signal that we're not
pre-init anymore. On amd64, bring the direct map in sync with the
translations created at that time.
6. Implement bus_space_map() and bus_space_unmap() for real: when
the tag corresponds to memory space, call the corresponding
pmap_mapdev() and pmap_unmapdev() functions to construct and
actual handle.
7. In efifb.c and vt_vga.c, remove the crutches and hacks and simply
call pmap_mapdev_attr() or bus_space_map() as desired.
Notes:
1. uart(4) already used bus_space_map() during low-level console
setup but since serial ports have traditionally been I/O port
based, the lack of a proper implementation for said function
was not a problem. It has always supported memory mapped UARTs
for low-level consoles by setting hw.uart.console accordingly.
2. The use of the direct map on amd64 without setting caching
attributes has been a bigger problem than previously thought.
This change has the fortunate (and unexpected) side-effect of
fixing various EFI frame buffer problems (though not all).
PR: 191564, 194952
Special thanks to:
1. XipLink, Inc -- generously donated an Intel Bay Trail E3800
based eval board (ADLE3800PC).
2. The FreeBSD Foundation, in particular emaste@ -- for UEFI
support in general and testing.
3. Everyone who tested the proposed for PR 191564.
4. jhb@ and kib@ for being a soundboard and applying a clue bat
if so needed.
data is synchronized by store/load to the variable. The
lapic_write_icr() function ensures that store buffers are flushed
before IPI command is issued.
Discussed with: bde
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
the SMP structures, synchronized with the load by release store in
release_aps().
The change is formal, x86 strong memory model implicitely provided
the guarantees.
Discussed with: bde
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
initial thread stack is not adjusted by the tunable, the stack is
allocated too early to get access to the kernel environment. See
TD0_KSTACK_PAGES for the thread0 stack sizing on i386.
The tunable was tested on x86 only. From the visual inspection, it
seems that it might work on arm and powerpc. The arm
USPACE_SVC_STACK_TOP and powerpc USPACE macros seems to be already
incorrect for the threads with non-default kstack size. I only
changed the macros to use variable instead of constant, since I cannot
test.
On arm64, mips and sparc64, some static data structures are sized by
KSTACK_PAGES, so the tunable is disabled.
Sponsored by: The FreeBSD Foundation
MFC after: 2 week
use vtophys() directly instead of vtomach() and retire the no-longer-used
headers <machine/xenfunc.h> and <machine/xenvar.h>.
Reported by: bde (stale bits in <machine/xenfunc.h>)
Reviewed by: royger (earlier version)
Differential Revision: https://reviews.freebsd.org/D3266
vm_offset_t pmap_quick_enter_page(vm_page_t m)
void pmap_quick_remove_page(vm_offset_t kva)
These will create and destroy a temporary, CPU-local KVA mapping of a specified page.
Guarantees:
--Will not sleep and will not fail.
--Safe to call under a non-sleepable lock or from an ithread
Restrictions:
--Not guaranteed to be safe to call from an interrupt filter or under a spin mutex on all platforms
--Current implementation does not guarantee more than one page of mapping space across all platforms. MI code should not make nested calls to pmap_quick_enter_page.
--MI code should not perform locking while holding onto a mapping created by pmap_quick_enter_page
The idea is to use this in busdma, for bounce buffer copies as well as virtually-indexed cache maintenance on mips and arm.
NOTE: the non-i386, non-amd64 implementations of these functions still need review and testing.
Reviewed by: kib
Approved by: kib (mentor)
Differential Revision: http://reviews.freebsd.org/D3013
reported, on APs. We already did this on BSP.
Otherwise, the userspace software which depends on the features
reported by the high CPUID levels is misbehaving. In particular, AVX
detection is non-functional, depending on which CPU thread happens to
execute when doing CPUID. Another victim is the libthr signal
handlers interposer, which needs to save full FPU extended state.
Reported and tested by: Andre Meiser <ortadur@web.de>
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Certain system calls have quirks applied to make them work as if called
on an older version of FreeBSD. As CloudABI executables don't have the
FreeBSD OS release number in the ELF header, this value is set to zero,
making the system calls fall back to typically historic, non-standard
behaviour.
Reviewed by: kib
ordering semantic of x86 CPUs makes only the compiler barrier
neccessary to give the acquire behaviour.
Existing implementation ensured sequentially consistent semantic for
load_acq, making much stronger guarantee than required by standard's
definition of the load acquire. Consumers which depend on the barrier
are believed to be identified and already fixed to use proper
operations.
Noted by: alc (long time ago)
Reviewed by: alc, bde
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
- Fix segment registers to only display the low 16 bits.
- Remove unused handlers and entries for the debug registers.
- Display xcr0 (if valid) in 'show sysregs'.
- Add '0x' prefix to MSR values to match other values in 'show sysregs'.
- MFamd64: Display various MSRs in 'show sysregs'.
- Add a 'show dbregs' to display the value of debug registers.
- Dynamically size the column width for register values to properly
align columns on 64-bit platforms.
- Display %gs for i386 in 'show registers'.
Differential Revision: https://reviews.freebsd.org/D2784
Reviewed by: kib, markj
MFC after: 2 weeks
The i386 and amd64 DDB stack unwinders contain code to detect and handle
the case where the first frame is not completely set up or torn down. This
code was accidentally unused however, since db_backtrace() was never called
with a non-NULL trap frame. This change fixes that.
Also remove get_rsp() from the amd64 code. It appears to have come from
i386, which needs to take into account whether the exception triggered a
CPL switch, since SS:ESP is only pushed onto the stack if so. On amd64,
SS:RSP is pushed regardless, so get_rsp() was doing the wrong thing for
kernel-mode exceptions. As a result, we can also remove custom print
functions for these registers.
Reviewed by: jhb
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D2881
If we can't find a symbol corresponding to the faulting instruction, assume
that the previously-executed function is a call and attempt to find the
calling function using the return address on the stack. Otherwise we end
up associating the last stack frame with the current call, which is
incorrect and causes the unwinder to skip printing of the calling function,
resulting in a confusing backtrace.
Reviewed by: jhb
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D2859
The amd64 port copied some code from i386 to fetch function arguments and
display them in backtraces. However, it was commented out and can't easily
be implemented since the function arguments are passed in
registers rather than on the stack in amd64. Remove it in preparation for
some bug fixes in this area.
Reviewed by: jhb
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D2857
Summary:
Remove the stub system call that was put in place during the system call
import and replace it by a target-dependent version stored in sys/amd64.
Initialize the thread in a way similar to cpu_set_upcall_kse(). We
provide the entry point with two arguments: the thread ID and the
argument pointer.
Test Plan:
Thread creation still seems to work, both for FreeBSD and CloudABI
binaries.
Reviewers: dchagin, mjg, kib
Reviewed By: kib
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D3110
Just like FreeBSD+Capsicum, CloudABI uses process descriptors. Return
the file descriptor number to the parent process.
To the child process we both return a special value for the file
descriptor number (CLOUDABI_PROCESS_CHILD). We also return the thread ID
of the new thread in the copied process, so the threading library can
reinitialize itself.
Obtained from: https://github.com/NuxiNL/freebsd
in lockstat.ko. This means that lockstat probes now have typed arguments and
will utilize SDT probe hot-patching support when it arrives.
Reviewed by: gnn
Differential Revision: https://reviews.freebsd.org/D2993
belongs to the kernel stack address range for the thread. Right now,
code checks that new frame is not farther then KSTACK_PAGES pages from
the current frame, which allows the address to point past the top of
the stack.
Reviewed by: andrew, emaste, markj
Differential revision: https://reviews.freebsd.org/D3108
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Summary:
For CloudABI we need to put two things on the stack of new processes:
the argument data (a binary blob; not strings) and a startup data
structure. The startup data structure contains interesting things such
as a pointer to the ELF program header, the thread ID of the initial
thread, a stack smashing protection canary, and a pointer to the
argument data.
Fetching system call arguments and setting the return value is similar
to FreeBSD. The only differences are that system call 0 does not exist
and that we call into cloudabi_convert_errno() to convert the error
code. We also need this function in a couple of other places, so we'd
better reuse it here.
Reviewers: dchagin, kib
Reviewed By: kib
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D3098
provide a semantic defined by the C11 fences with corresponding
memory_order.
atomic_thread_fence_acq() gives r | r, w, where r and w are read and
write accesses, and | denotes the fence itself.
atomic_thread_fence_rel() is r, w | w.
atomic_thread_fence_acq_rel() is the combination of the acquire and
release in single operation. Note that reads after the acq+rel fence
could be made visible before writes preceeding the fence.
atomic_thread_fence_seq_cst() orders all accesses before/after the
fence, and the fence itself is globally ordered against other
sequentially consistent atomic operations.
Reviewed by: alc
Discussed with: bde
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Some external tools just do a 'ls /dev/vmm' to figure out the bhyve virtual
machines on the host. These tools break if the devmem device nodes also
appear in /dev/vmm.
Requested by: grehan
macros on amd64 and i386. Move the definition to machine/param.h.
kgdb defines INKERNEL() too, the conflict is resolved by renaming kgdb
version to PINKERNEL().
On i386, correct the lowest kernel address. After the shared page was
introduced, USRSTACK no longer points to the last user address + 1 [*]
Submitted by: Oliver Pinter [*]
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
process beyond the end of the process address space. Such setting is
not dangerous to the kernel integrity, but it causes confusing
application misbehaviour.
Sponsored by: The FreeBSD Foundation
MFC after: 12 days
obtain the thread %fs and %gs bases. Add x86 PT_SETFSBASE and
PT_SETGSBASE requests to set the bases from debuggers. The set
requests, similarly to the sysarch({I386,AMD64}_SET_FSBASE),
override the corresponding segment registers.
The main purpose of the operations is to retrieve and modify the tcb
address for debuggee.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
atomic_load_acq(9), on it source, for x86.
Right now, atomic_load_acq() on x86 is sequentially consistent with
other atomics, code ensures this by doing store/load barrier by
performing locked nop on the source. Provide separate primitive
__storeload_barrier(), which is implemented as the locked nop done on
a cpu-private variable, and put __storeload_barrier() before load, to
keep seq_cst semantic but avoid introducing false dependency on the
no-modification of the source for its later use.
Note that seq_cst property of x86 atomic_load_acq() is not documented
and not carried by atomics implementations on other architectures,
although some kernel code relies on the behaviour. This commit does
not intend to change this.
Reviewed by: alc
Discussed with: bde
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
The current linker script generates program headers with VMA == LMA:
Entry point 0xffffffff802e7000
There are 6 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0xffffffff80200040 0xffffffff80200040
0x0000000000000150 0x0000000000000150 R E 8
INTERP 0x0000000000000190 0xffffffff80200190 0xffffffff80200190
0x000000000000000d 0x000000000000000d R 1
[Requesting program interpreter: /red/herring]
LOAD 0x0000000000000000 0xffffffff80200000 0xffffffff80200000
0x00000000010559b0 0x00000000010559b0 R E 200000
LOAD 0x0000000001056000 0xffffffff81456000 0xffffffff81456000
0x0000000000132638 0x000000000052ecf8 RW 200000
DYNAMIC 0x0000000001056000 0xffffffff81456000 0xffffffff81456000
0x00000000000000d0 0x00000000000000d0 RW 8
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RWE 8
This is fine for the FreeBSD loader, because it completely ignores p_paddr
and instead uses p_vaddr with a hardcoded offset. Other loaders however
acknowledge p_paddr (like the Xen ELF loader), in which case they will try
to load the kernel at the wrong place. Fix this by adding an AT keyword to
the first section specifying the physical address, other sections will
follow suit, so it ends up looking like:
Entry point 0xffffffff802e7000
There are 6 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0xffffffff80200040 0x0000000000200040
0x0000000000000150 0x0000000000000150 R E 8
INTERP 0x0000000000000190 0xffffffff80200190 0x0000000000200190
0x000000000000000d 0x000000000000000d R 1
[Requesting program interpreter: /red/herring]
LOAD 0x0000000000000000 0xffffffff80200000 0x0000000000200000
0x00000000010559b0 0x00000000010559b0 R E 200000
LOAD 0x0000000001056000 0xffffffff81456000 0x0000000001456000
0x0000000000132638 0x000000000052ecf8 RW 200000
DYNAMIC 0x0000000001056000 0xffffffff81456000 0x0000000001456000
0x00000000000000d0 0x00000000000000d0 RW 8
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RWE 8
Tested on bare metal using the native FreeBSD loader and grub2 from TRUEOS.
Sponsored by: Citrix Systems R&D
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D2783
Previously this was done by the caller of 'svm_launch()' after it returned.
This works fine as long as no code is executed in the interim that depends
on pcpu data.
The dtrace probe 'fbt:vmm:svm_launch:return' broke this assumption because
it calls 'dtrace_probe()' which in turn relies on pcpu data.
Reported by: avg
MFC after: 1 week
devmem is used to represent MMIO devices like the boot ROM or a VESA framebuffer
where doing a trap-and-emulate for every access is impractical. devmem is a
hybrid of system memory (sysmem) and emulated device models.
devmem is mapped in the guest address space via nested page tables similar
to sysmem. However the address range where devmem is mapped may be changed
by the guest at runtime (e.g. by reprogramming a PCI BAR). Also devmem is
usually mapped RO or RW as compared to RWX mappings for sysmem.
Each devmem segment is named (e.g. "bootrom") and this name is used to
create a device node for the devmem segment (e.g. /dev/vmm/testvm.bootrom).
The device node supports mmap(2) and this decouples the host mapping of
devmem from its mapping in the guest address space (which can change).
Reviewed by: tychon
Discussed with: grehan
Differential Revision: https://reviews.freebsd.org/D2762
MFC after: 4 weeks
While here, also report %eflags from the i386 trapframe.
Differential Revision: https://reviews.freebsd.org/D2743
Reviewed by: kib
Obtained from: 1 month
This will require for AArch64 as we dont have modules yet.
Sponsored by: HEIF5
Sponsored by: ARM Ltd.
Differential Revision: https://reviews.freebsd.org/D1997
Use the same scheme implemented to manage credentials.
Code needing to look at process's credentials (as opposed to thred's) is
provided with *_proc variants of relevant functions.
Places which possibly had to take the proc lock anyway still use the proc
pointer to access limits.
Thread credentials are maintained as follows: each thread has a pointer to
creds and a reference on them. The pointer is compared with proc's creds on
userspace<->kernel boundary and updated if needed.
This patch introduces a counter which can be compared instead, so that more
structures can use this scheme without adding more comparisons on the boundary.
execution control and writing the difference between the host TSC and
the guest TSC into the TSC offset in the VMCS upon encountering a
write.
Reviewed by: neel
rev. 55. The modern CPUs cache and TLB descriptions looked quite
questionable without the update, e.g. Haswell i7 4770S reported:
Data TLB: 4 KB pages, 4-way set associative, 64 entries
L2 cache: 256 kbytes, 8-way associative, 64 bytes/line
After the update, the report is:
Data TLB: 1 GByte pages, 4-way set associative, 4 entries
Data TLB: 4 KB pages, 4-way set associative, 64 entries
Instruction TLB: 2M/4M pages, fully associative, 8 entries
Instruction TLB: 4KByte pages, 8-way set associative, 64 entries
64-Byte prefetching
Shared 2nd-Level TLB: 4 KByte/2MByte pages, 8-way associative, 1024 entries
L2 cache: 256 kbytes, 8-way associative, 64 bytes/line
Some tags were apparently removed from the table 3-21, Vol. 2A. Keep
them around, but add a comment stating the removal.
Update the format line for cpu_stdext_feature according to the bits
from the SDM rev.55. It appears that Haswells do not store %cs and
%ds values in the FPU save area.
Store content of the %ecx register from the CPUID leaf 0x7
subleaf 0 as cpu_stdext_feature2 and print defined bits from it,
again acording to SDM rev. 55.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
after decoding the instruction matches the one provided by hardware.
Prior to r283293 'vie->num_valid' used to contain the actual length of
the instruction whereas now it contains the maximum instruction length
possible. This introduced a bug when calculating a RIP-relative base address.
Fix this by using 'vie->num_processed' rather than 'vie->num_valid' as the
length of the emulated instruction.
Reported and tested by: tychon
MFC after: 1 week
Since td_name is an array member of struct thread, it can never be NULL,
so the check can be removed. In addition, curproc can never be NULL,
so remove the if statement, and splice the two printfs() together.
While here, remove the u_long cast, and use the correct printf format
specifier curproc->p_pid.
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D2695
buildkernel run.
Some of them were write-only under some kernel options, e.g. variables
keeping values only used by CTR() macros. It costs nothing to the
code readability and correctness to eliminate the warnings in those
cases too by removing the local cached values used only for
single-access.
Review: https://reviews.freebsd.org/D2665
Reviewed by: rodrigc
Looked at by: bjk
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
"sleeping" state. This is done by forcing the vcpu to transition to "idle"
by returning to userspace with an exit code of VM_EXITCODE_REQIDLE.
MFC after: 2 weeks
Native ABI do not need signal conversion, only emulators may want this. Usually
emulators implements its own sv_sendsig method. For now only ibcs2 emulator does
not have own sv_sendsig implementation and depends on native sendsig() method.
So, remove any extra attempts to convert signal numbers from native sendsig()
methods except from i386 where ibsc2 is living.
1. Linux sigset always 64 bit on all platforms. In order to move Linux
sigset code to the linux_common module define it as 64 bit int. Move
Linux sigset manipulation routines to the MI path.
2. Move Linux signal number definitions to the MI path. In general, they
are the same on all platforms except for a few signals.
3. Map Linux RT signals to the FreeBSD RT signals and hide signal conversion
tables to avoid conversion errors.
4. Emulate Linux SIGPWR signal via FreeBSD SIGRTMIN signal which is outside
of allowed on Linux signal numbers.
PR: 197216
argument is not a null pointer, and the ss_flags member pointed to by ss
contains flags other than SS_DISABLE. However, in fact, Linux also
allows SS_ONSTACK flag which is simply ignored.
For buggy apps (at least mono) ignore other than SS_DISABLE
flags as a Linux do.
While here move MI part of sigaltstack code to the appropriate place.
Reported by: abi at abinet dot ru
around kqueue() to implement epoll subset of functionality.
The kqueue user data are 32bit on i386 which is not enough for
epoll user data, so we keep user data in the proc emuldata.
Initial patch developed by rdivacky@ in 2007, then extended
by Yuri Victorovich @ r255672 and finished by me
in collaboration with mjg@ and jillies@.
Differential Revision: https://reviews.freebsd.org/D1092