Not sure why the platform hypercall was disabled on i386, just enable it in
order to fix compilation of the PV timer on i386.
Sponsored by: Citrix Systems R&D
driver is (or behaves identically to) /dev/mem. Remove the D_MEM flag
from random drivers.
Note that currently the D_MEM flag does not affect any behaviour, but
this going to change in the next commit.
Noted and reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
X-Differential revision: https://reviews.freebsd.org/D6149
rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.
This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.
Submitted by: Jun Su <junsu microsoft com>
Reviewed by: jhb, kib, sephe
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5910
doreti provides the common code path for returning from interrupt
andlers on x86. Exposing doreti as a global symbol allows kernel
modules to include low-level interrupt handlers instead of requiring
all low-level handlers to be statically compiled into the kernel.
Submitted by: Howard Su <howard0su@gmail.com>
Reviewed by: kib
Some BIOSes disable AMD Topology extension on AMD Family 15h notebook
processors. We re-enable the extension, so that we can properly discover
core and cache topology. Linux seems to do the same.
Reported by: Johannes Dieterich <dieterich.joh@gmail.com>
Reviewed by: jhb, kib
Tested by: Johannes Dieterich <dieterich.joh@gmail.com>
(earlier version)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D5883
kern.features.linux: 1 meaning linux 32 bits binaries are supported
kern.features.linux64: 1 meaning linux 64 bits binaries are supported
The goal here is to help 3rd party applications (including ports) to determine
if the host do support linux emulation
Reviewed by: dchagin
MFC after: 1 week
Relnotes: yes
Differential Revision: D5830
Simplify and unify placeholder type definitions.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D5771
This moves the enabling of interrupts slightly earlier (the old location
was still before devices were enumerated and probed) and does it in the
interrupt code (rather than in the device configuration code). This
also avoids tripping over an assertion on the first TLB shootdown with
earlier AP startup.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D5710
non-multiple of 64 bytes. Thereafter, the user state save area is
misaligned, which triggers assertion in the debugging kernels, or
segmentation violation on accesses for non-debugging configs.
Force the desired alignment of the user save area as the fix
(workaround is to disable bit 9 in the hw.xsave_mask loader tunable).
This correction is required for booting on the upcoming Intel' Purley
platform.
Reported and tested by: "Pieper, Jeffrey E" <jeffrey.e.pieper@intel.com>,
jimharris
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Most calls to bus_alloc_resource() use "anywhere" as the range, with a given
count. Migrate these to use the new bus_alloc_resource_anywhere() API.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D5370
POSIX requires these members to be of type void * rather than the
char * inherited from 4BSD. NetBSD and OpenBSD both changed their
fields to void * back in 1998. No new build failures were reported
via an exp-run.
PR: 206503 (exp-run)
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D5092
AT_SECURE auxv entry has been added to the Linux 2.5 kernel to pass a
boolean flag indicating whether secure mode should be enabled. 1 means
that the program has changes its credentials during the execution.
Being exported AT_SECURE used by glibc issetugid() call.
Submitted by: imp, dchagin
Security: FreeBSD-SA-16:10.linux
Security: CVE-2016-1883
The set_robust_list system call request the kernel to record the head
of the list of robust futexes owned by the calling thread. The head
argument is the list head to record.
The get_robust_list system call should return the head of the robust
list of the thread whose thread id is specified in pid argument.
The list head should be stored in the location pointed to by head
argument.
In contrast, our implemenattion of get_robust_list system call copies
the known portion of memory pointed by recorded in set_robust_list
system call pointer to the head of the robust list to the location
pointed by head argument.
So, it is possible for a local attacker to read portions of kernel
memory, which may result in a privilege escalation.
Submitted by: mjg
Security: SA-16:03.linux
providing compiled-in static environment data that is used instead of any
data passed in from a boot loader.
Previously 'env' worked only on i386 and arm xscale systems, because it
required the MD startup code to examine the global envmode variable and
decide whether to use static_env or an environment obtained from the boot
loader, and set the global kern_envp accordingly. Most startup code wasn't
doing so. Making things even more complex, some mips startup code uses an
alternate scheme that involves calling init_static_kenv() to pass an empty
buffer and its size, then uses a series of kern_setenv() calls to populate
that buffer.
Now all MD startup code calls init_static_kenv(), and that routine provides
a single point where envmode is checked and the decision is made whether to
use the compiled-in static_kenv or the values provided by the MD code.
The routine also continues to serve its original purpose for mips; if a
non-zero buffer size is passed the routine installs the empty buffer ready
to accept kern_setenv() values. Now if the size is zero, the provided buffer
full of existing env data is installed. A NULL pointer can be passed if the
boot loader provides no env data; this allows the static env to be installed
if envmode is set to do so.
Most of the work here is a near-mechanical change to call the init function
instead of directly setting kern_envp. A notable exception is in xen/pv.c;
that code was originally installing a buffer full of preformatted env data
along with its non-zero size (like mips code does), which would have allowed
kern_setenv() calls to wipe out the preformatted data. Now it passes a zero
for the size so that the buffer of data it installs is treated as
non-writeable.
While here, move the common bits of <machine/cputypes.h> to
<x86/cputypes.h> as well.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D4670
new headers x86/include x86_var.h and x86_smp.h.
Reviewed by: emaste, jhb
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D4358
The pcb is saved at the top of the kernel stack on x86 platforms.
The initial kenrel stack pointer is set in the TSS so that the trapframe
from user -> kernel transitions begins directly below the pcb and grows
down.
The XSAVE changes moved the FPU save area out of the pcb and into a
variable-sized area after the pcb. This required updating the expressions
to calculate the initial stack pointer from 'stacktop - sizeof(pcb)' to
'stacktop - sizeof(pcb) + FPU save area size'.
The i386_set_ioperm() system call allows user applications to access
individual I/O ports via the I/O port permission bitmap in the TSS.
On FreeBSD this requires allocating a custom per-process TSS instead of
using the shared per-CPU TSS.
The expression to initialize the initial kernel stack pointer in the
per-process TSS created for i386_set_ioperm() was not properly updated
after the XSAVE changes. Processes that used i386_set_ioperm() would
trash the trapframe during subsequent context switches resulting in
panics from memory corruption.
This changes fixes the kernel stack pointer calculation for the per-process
TSS.
Reviewed by: kib, n_hibma
Reported by: n_hibma
MFC after: 1 week
Typical TLBs have 40-512 entries available. At some point, iterating
every single page in a requested invalidation range and issuing invlpg
on it is more expensive than flushing the TLB and allowing it to reload
on demand.
Broadwell CPUs have 1536 L2 TLB entries, so I've picked the arbitrary
number 4096 entries as a hueristic at which point we flush TLB rather
than invalidating every single potential page.
Reviewed by: alc
Feedback from: jhb, kib
MFC notes: Depends on r291688
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D4280
ptes mapping the kernel on CPUs where global TLB entries are
supported, revert to flushing only non-global entries, i.e. to the
pre-r291688 state. There is no need to flush global TLB entries,
since only global entries created during the previous iterations of
the loop could exist at this moment.
Submitted by: alc
Differential revision: https://reviews.freebsd.org/D4368
the PG_G global pte flag, pmap_invalidate_all() fails to flush global
TLB entries [*]. This is because TLB shootdown handler for such
configs reloads CR3, and on i386 pmap_invalidate_all() does the same
for the initiating CPU. Note that current code does not issue total
invalidation requests for the kernel_pmap.
Rename amd64 function invltlb_globpcid() to invltlb_glob(), it is not
specific for PCID for quite some time, and implement the same
functionality for i386. Use the function instead of invltlb() in
shootdown handlers and in i386 pmap_invalidate_all(), but only for the
kernel pmap (which maps pages with the PG_G attribute set), which
takes care of PG_G TLB entries on flush.
To detect the affected pmap in i386 TLB shootdown handler, pmap should
be passed to the smp_masked_invltlb() function, which makes amd64 and
i386 TLB shootdown code almost identical. Merge the code under x86/.
Noted by: jhb [*]
Reviewed by: cem, jhb, pho
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D4346
sysent.
sv_prepsyscall is unused.
sv_sigsize and sv_sigtbl translate signal number from the FreeBSD
namespace into the ABI domain. It is only utilized on i386 for iBCS2
binaries. The issue with this approach is that signals for iBCS2 were
delivered with the FreeBSD signal frame layout, which does not follow
iBCS2. The same note is true for any other potential user if
sv_sigtbl. In other words, if ABI needs signal number translation, it
really needs custom sv_sendsig method instead.
Sponsored by: The FreeBSD Foundation
sysentvec. This allows the timekeep data to be shared between similar
ABIs which cannot share sysentvec.
Make the timekeep_push_vdso() tick callback to the timekeep structures
instead of sysentvecs. If several sysentvec share the vdso_sv_tk
structure, we would update the userspace data several times on each
tick, without the change.
Only allocate vdso_sv_tk in the exec_sysvec_init() sysinit when
sysentvec is marked with the new SV_TIMEKEEP flag. This saves
allocation and update of unneeded vdso_sv_tk for ABIs which do not
provide userspace gettimeofday yet, which are PowerPCs arches right
now.
Make vdso_sv_tk allocator public, namely split out and export
alloc_sv_tk() and alloc_sv_tk_compat32(). ABIs which share timekeep
data now can allocate it manually and share as appropriate.
Requested by: nwhitehorn
Tested by: nwhitehorn, pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
certain kernel structures for use by debuggers. This mostly aids
in examining cores from a kernel without debug symbols as a debugger
can infer these values if debug symbols are available.
One set of variables describes the layout of 'struct linker_file' to
walk the list of loaded kernel modules.
A second set of variables describes the layout of 'struct proc' and
'struct thread' to walk the list of processes in the kernel and the
threads in each process.
The 'pcb_size' variable is used to index into the stoppcbs[] array.
The 'vm_maxuser_address' is used to distinguish kernel virtual addresses
from user addresses. This doesn't have to be perfect, and
'vm_maxuser_address' is a cheap and simple way to differentiate kernel
pointers from simple values like TIDs and PIDs.
While here, annotate the fields in struct pcb used by kgdb on amd64
and i386 to note that their ABI should be preserved. Annotations for
other platforms will be added in the future.
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D3773
ordered with the MFENCE instruction. Similar weak guarantees are also
specified by the AMD APM vol. 3 rev. 3.22. x86 pmap methods
pmap_invalidate_cache_range() and pmap_invalidate_cache_pages() braced
CLFLUSH loop with MFENCE both before and after the loop.
In the revision 56 of SDM, Intel stated that all existing
implementations of CLFLUSH are strict, CLFLUSH instructions execution
is ordered WRT other CLFLUSH and writes. Also, the strict behaviour
is made architectural.
A new instruction CLFLUSHOPT (which was documented for some time in
the Instruction Set Extensions Programming Reference) provides the
weak behaviour which was previously attributed to CLFLUSH.
Use CLFLUSHOPT when available. When CLFLUSH is used on Intel CPUs, do
not execute MFENCE before and after the flushing loop.
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
amd64 and i386 platform code contain very similar xen/xen-os.h
The only differences are:
- Functions/variables/types which were unused in i386/xen/xen-os.h:
* xen_xchg
* __xchg_dummy
* __xg
* __xchg
* atomic_t
* atomic_inc
* rdtscll
The functions/variables/types unused in xen-os.h can be dropped and there
is no more differences betwen amd64 and i386.
The new header is placed in x86/include/xen and each platform will have
dummy headers include x86/xen/*.h. This is to be able to include
machine/xen/*.h in the PV drivers.
Submitted by: Julien Grall <julien.grall@citrix.com>
Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D3880
Sponsored by: Citrix Systems R&D