13681 Commits

Author SHA1 Message Date
kib
12d583b35c In x86 pmap_extract_and_hold(), there is no need to recalculate the
physical address, which is readily available after sucessfull
vm_page_pa_tryrelock().

Noted and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16085
2018-07-05 16:38:54 +00:00
kib
a0b8819d1d In x86 pmap_extract_and_hold(), there is no need to recalculate the
physical address, which is readily available after sucessfull
vm_page_pa_tryrelock().

Noted and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D16085
2018-07-05 16:27:34 +00:00
kib
7c8694b186 In x86 pmap_extract_and_hold()s, handle the case of PHYS_TO_VM_PAGE()
returning NULL.

vm_fault_quick_hold_pages() can be legitimately called on userspace
mappings backed by fictitious pages created by unmanaged device and sg
pagers.

Note that other architectures pmap_extract_and_hold() might need
similar fix, but I postponed the examination.

Reported by:	bde
Discussed with:	alc
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D16085
2018-07-04 21:21:59 +00:00
mmacy
40fc34fd82 inline atomics and allow tied modules to inline locks
- inline atomics in modules on i386 and amd64 (they were always
  inline on other arches)
- allow modules to opt in to inlining locks by specifying
  MODULE_TIED=1 in the makefile

Reviewed by: kib
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16079
2018-07-02 19:48:38 +00:00
chuck
360d52c628 Fix the Linux kernel version number calculation
The Linux compatibility code was converting the version number (e.g.
2.6.32) in two different ways and then comparing the results.

The linux_map_osrel() function converted MAJOR.MINOR.PATCH similar to
what FreeBSD does natively. I.e. where major=v0, minor=v1, and patch=v2
    v = v0 * 1000000 + v1 * 1000 + v2;

The LINUX_KERNVER() macro, on the other hand, converted the value with
bit shifts. I.e. where major=a, minor=b, and patch=c
    v = (((a) << 16) + ((b) << 8) + (c))

The Linux kernel uses the later format via the KERNEL_VERSION() macro in
include/generated/uapi/linux/version.h

Fix is to use the LINUX_KERNVER() macro in linux_map_osrel() as well as
in the .trans_osrel functions.

PR: 229209
Reviewed by: emaste, cem, imp (mentor)
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D15952
2018-06-22 00:02:03 +00:00
emaste
bc07d95d93 linuxulator: do not include legacy syscalls on arm64
Existing linuxulator platforms (i386, amd64) support legacy syscalls,
such as non-*at ones like open, but arm64 and other new platforms do
not.

Wrap these in #ifdef LINUX_LEGACY_SYSCALLS, #defined in the MD linux.h
files.  We may need finer grained control in the future but this is
sufficient for now.

Reviewed by:	andrew
Sponsored by:	Turing Robotic Industries
Differential Revision:	https://reviews.freebsd.org/D15237
2018-06-15 14:41:51 +00:00
brooks
8e419faaf8 Regen after 335177 (rename sys_obreak to sys_break). 2018-06-14 21:29:31 +00:00
brooks
ad6bae500f Name the implementation of brk and sbrk sys_break().
The break() system call was renamed (several times) starting in v3
AT&T UNIX when C was invented and break was a language keyword. The
last vestage of a need for it to be called something else (eg obreak)
was removed in r225617 which consistantly prefixed all syscall
implementations.

Reviewed by:	emaste, kib (older version)
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15638
2018-06-14 21:27:25 +00:00
kib
a3a3e0b0ab Reorganize code flow in fpudna()/npxdna() to highlight the critical
section scope.  Sprinkle __predict_false() for conditions known to
never occur or occur only on rare platforms.

Sponsored by:	The FreeBSD Foundation
2018-06-14 11:09:51 +00:00
kib
18574304a4 Remove printf() in #NM handler.
Give up and remove the almost useless informational message reporting
that device not available exception occured while our state tracking
indicates the current CPU has FPU context loaded for the current
thread.

It seems that this is recurring bug with some VM monitors.

Sponsored by:	The FreeBSD Foundation
2018-06-14 10:33:26 +00:00
kib
f2aa526948 Enable eager FPU context switch by default on i386 too, based on
amd64 r335072.

Security:	CVE-2018-3665
Sponsored by:	The FreeBSD Foundation
2018-06-13 21:10:23 +00:00
rlibby
6e36dd43ee i386: copyin/copyout error is EFAULT
Discussed with:	kib
MFC with:	r332489
Sponsored by:	Dell EMC Isilon
2018-06-13 19:57:03 +00:00
jtl
8222f5cb7c Make UMA and malloc(9) return non-executable memory in most cases.
Most kernel memory that is allocated after boot does not need to be
executable.  There are a few exceptions.  For example, kernel modules
do need executable memory, but they don't use UMA or malloc(9).  The
BPF JIT compiler also needs executable memory and did use malloc(9)
until r317072.

(Note that a side effect of r316767 was that the "small allocation"
path in UMA on amd64 already returned non-executable memory.  This
meant that some calls to malloc(9) or the UMA zone(9) allocator could
return executable memory, while others could return non-executable
memory.  This change makes the behavior consistent.)

This change makes malloc(9) return non-executable memory unless the new
M_EXEC flag is specified.  After this change, the UMA zone(9) allocator
will always return non-executable memory, and a KASSERT will catch
attempts to use the M_EXEC flag to allocate executable memory using
uma_zalloc() or its variants.

Allocations that do need executable memory have various choices.  They
may use the M_EXEC flag to malloc(9), or they may use a different VM
interfact to obtain executable pages.

Now that malloc(9) again allows executable allocations, this change also
reverts most of r317072.

PR:		228927
Reviewed by:	alc, kib, markj, jhb (previous version)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D15691
2018-06-13 17:04:41 +00:00
kib
b7403f6b1b All exceptions IDT descriptors must use interrupt gates on 4/4 kernel.
Fix it for #MF.

Noted by:	rlibby
Sponsored by:	The FreeBSD Foundation
2018-06-12 10:43:20 +00:00
kib
5021be544c Fix typo.
Sponsored by:	The FreeBSD Foundation
2018-06-12 10:41:26 +00:00
bde
1096fe355c Fix panics in potentially all x86bios calls on i386 since r332489.
A call to npxsave() in the exception trampolines was not relocated.
This call to a garbage address usually paniced when made, but it is only
made when the thread has used an FPU recently, and this is not the usual
case.

PR:		228755
Reviewed by:	kib
2018-06-10 14:21:01 +00:00
markj
e402a8384b Tell the compiler that rdtscp clobbers %ecx. 2018-06-09 18:31:19 +00:00
mmacy
33d22ed3f8 hwpmc: simplify calling convention for hwpmc interrupt handling
pmc_process_interrupt takes 5 arguments when only 3 are needed.
cpu is always available in curcpu and inuserspace can always be
derived from the passed trapframe.

While facially a reasonable cleanup this change was motivated
by the need to workaround a compiler bug.

core2_intr(cpu, tf) ->
  pmc_process_interrupt(cpu, ring, pmc, tf, inuserspace) ->
    pmc_add_sample(cpu, ring, pm, tf, inuserspace)

In the process of optimizing the tail call the tf pointer was getting
clobbered:

(kgdb) up
    at /storage/mmacy/devel/freebsd/sys/dev/hwpmc/hwpmc_mod.c:4709
4709                                pmc_save_kernel_callchain(ps->ps_pc,
(kgdb) up
1205                    error = pmc_process_interrupt(cpu, PMC_HR, pm, tf,

resulting in a crash in pmc_save_kernel_callchain.
2018-06-08 04:58:03 +00:00
mmacy
a2a3facbc2 cpufunc: add rdtscp for x86 2018-06-07 00:54:11 +00:00
mmacy
3b8eb9c59a hwpmc: ABI fixes
- increase pmc cpuid field from 8 to 12 bits
- add cpuid version string to initialize entry in the log
  so that filter can identify which counter index an
  event name maps to
- GC unused config flags
- make fixed counter assignment more robust as well as the
  changes needed to be properly identified for filter
2018-06-04 02:05:48 +00:00
bde
48738e7c73 Oops, the last minute reduction in the clobber list for i386
MCOUNT_OVERHEAD() in r334522 was too agressive.  Only mcount exit
preserves %eax and %edx.
2018-06-02 09:59:27 +00:00
bde
1cd9b8b732 Finish COMPAT_AOUT support for amd64. It wasn't in any amd64 or MI
file in /sys/conf, so was unavailable in configurations that don't use
modules, and was not testable or notable in NOTES.  Its normal
configuration (not using a module) is still silently deprecated in
aout(4) by not mentioning it there.

Update i386 NOTES for COMPAT_AOUT.  It is not i386-only, or even very MD.
Sort its entry better.

Finish gzip configuration (but not support) for amd64.  gzip is really
gzipped aout.  It is currently broken even for i386 (a call to vm fails).
amd64 has always attempted to configure and test it, but it depends on
COMPAT_AOUT (as noted).  The bug that it depends on unconfigured files
was not detected since it is configured as a device.  All other optional
image activators are configured properly using an option.
2018-06-02 06:40:15 +00:00
bde
0409ad53fa Fix high resolution kernel profiling just enough to not crash at boot
time, especially for SMP.  If configured, it turns itself on at boot
time for calibration, so is fragile even if never otherwise used.

Both types of kernel profiling were supposed to use a global spinlock
in the SMP case.  If hi-res profiling is configured (but not necessarily
used), this was supposed to be optimized by only using it when
necessary, and slightly more efficiently, in asm.  But it was not done
at all for mcount entry where it is necessary.  This caused crashes
in the SMP case when either type of profiling was enabled.  For mcount
exit, it only caused wrong times.  The times were wrongest with an
i8254 timer since using that requires exclusive access to the hardware.
The i8254 timer was too slow to use here 20 years ago and is much less
usable now, but it is the default for the SMP case since TSCs weren't
invariant when SMP was new.  Do the locking in all hi-res SMP cases for
simplicity.

Calibration uses special asms, and the clobber lists in these were sort
of inverted.  They contained the arg and return registers which are not
clobbered, but on amd64 they didn't contain the residue of the call-used
registers which may be clobbered (%r10 and %r11).  This usually caused
hangs at boot time.  This usually affected even the UP case.
2018-06-02 05:48:44 +00:00
bde
d4e99d50f0 Fix recent breakages of kernel profiling, mostly on i386 (high resolution
kernel profiling remains broken).

memmove() was broken using ALTENTRY().  ALTENTRY() is only different from
ENTRY() in the profiling case, and its use in that case was sort of
backwards.  The backwardness magically turned memmove() into memcpy()
instead of completely breaking it.  Only the high resolution parts of
profiling itself were broken.  Use ordinary ENTRY() for memmove().
Turn bcopy() into a tail call to memmove() to reduce complications.
This gives slightly different pessimizations and profiling lossage.
The pessimizations are minimized by not using a frame pointer() for
bcopy().

Calls to profiling functions from exception trampolines were not
relocated.  This caused crashes on the first exception.  Fix this using
function pointers.

Addresses of exception handlers in trampolines were not relocated.  This
caused unknown offsets in the profiling data.  Relocate by abusing
setidt_disp as for pmc although this is slower than necessary and
requires namespace pollution.  pmc seems to be missing some relocations.
Stack traces and lots of other things in debuggers need similar relocations.

Most user addresses were misclassified as unknown kernel addresses and
then ignored.  Treat all unknown addresses as user. Now only user
addresses in the kernel text range are significantly misclassified (as
known kernel addresses).

The ibrs functions didn't preserve enough registers.  This is the only
recent breakage on amd64.  Although these functions are written in
asm, in the profiling case they call profiling functions which are
mostly for the C ABI, so they only have to save call-used registers.
They also have to save arg and return registers in some cases and
actually save them in all cases to reduce complications.  They end up
saving all registers except %ecx on i386 and %r10 and %r11 on amd64.
Saving these is only needed for 1 caller on each of amd64 and i386.
Save them there.  This is slightly simpler.

Remove saving %ecx in handle_ibrs_exit on i386.  Both handle_ibrs_entry
and handle_ibrs_exit use %ecx, but only the latter needed to or did
save it.  But saving it there doesn't work for the profiling case.

amd64 has more automatic saving of the most common scratch registers
%rax, %rcx and %rdx (its complications for %r10 are from unusual use
of %r10 by SYSCALL).  Thus profiling of handle_ibrs_exit_rs() was not
broken, and I didn't simplify the saving by moving the saving of these
registers from it to the caller.
2018-06-02 04:25:09 +00:00
mmacy
2f6bd2cd39 hwpmc: remove unused pre-table driven bits for intel
Intel now provides comprehensive tables for all performance counters
and the various valid configuration permutations as text .json files.
Libpmc has been converted to use these and hwpmc_core has been greatly
simplified by moving to passthrough of the table values.

The one gotcha is that said tables don't support pentium pro and and pentium
IV. There's very few users of hwpmc on _amd64_ kernels on new hardware. It is
unlikely that anyone is doing low level optimization on 15 year old Intel
hardware. Nonetheless, if someone feels strongly enough to populate the
corresponding tables for p4 and ppro I will reinstate the files in to the
build.

Code for the K8 counters and !x86 architectures remains unchanged.
2018-05-31 22:41:07 +00:00
dim
a34aa33cd6 Resolve conflicts between macros in fenv.h and ieeefp.h
This is a follow-up to r321483, which disabled -Wmacro-redefined for
some lib/msun tests.

If an application included both fenv.h and ieeefp.h, several macros such
as __fldcw(), __fldenv() were defined in both headers, with slightly
different arguments, leading to conflicts.

Fix this by putting all the common macros in the machine-specific
versions of ieeefp.h.  Where needed, update the arguments in places
where the macros are invoked.

This also slightly reduces the differences between the amd64 and i386
versions of ieeefp.h.

Reviewed by:	kib
MFC after:	1 week
Differential Revision: https://reviews.freebsd.org/D15633
2018-05-31 20:22:47 +00:00
kib
b412910664 Use pmap_pte_ufast() instead of pmap_pte() in pmap_extract(),
pmap_is_prefaultable() and pmap_incore(), pushing the number of
shootdown IPIs back to the 3/1 kernel.

Benchmarked by:	bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2018-05-30 20:47:20 +00:00
kib
398a2a262a Extract code for fast mapping of pte from pmap_extract_and_hold()
into the helper function pmap_pte_ufast().

Benchmarked by:	bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2018-05-30 20:43:48 +00:00
kib
5eacb3fdb9 Restore pmap_copy() for 4/4 i386 pmap.
Create yet another temporal pte mapping routine pmap_pte_quick3(),
which is the copy of the pmap_pte_quick() and relies on the
pvh_global_lock to protect the frame.  It accounts into the same
counters as pmap_pte_quick().  It is needed since pmap_copy() uses
pmap_pte_quick() already, and since a user pmap is no longer current
pmap.

pmap_copy() still provides the advantage for real-world workloads
involving lot of forks where processes do not exec immediately.

Benchmarked by:	bde
Sponsored by:	The FreeBSD Foundation
2018-05-30 20:39:22 +00:00
kib
7296417bee Do use pmap_pte_quick() in pmap_enter_quick_locked().
Benchmarked by:	bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2018-05-30 20:26:47 +00:00
kib
5c6152bf04 Avoid unneccessary TLB shootdowns in pmap_unwire_ptp() for user pmaps,
which no longer create recursive page table mappings.

Benchmarked by:	bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2018-05-30 20:24:21 +00:00
brooks
db929686a9 Correct pointer subtraction in KASSERT().
The assertion would never fire without truly spectacular future
programming errors.

Reported by:	Coverity
CID:		1391370
Sponsored by:	DARPA, AFRL
2018-05-29 20:03:24 +00:00
hselasky
831d81975f Implement atomic_add_64() and atomic_subtract_64() for the i386 target.
While at it add missing _acq_ and _rel_ variants for 64-bit atomic
operations under i386.

Reviewed by:	kib @
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2018-05-29 11:59:02 +00:00
kib
b8cb7e293a Optimize i386 pmap_extract_and_hold().
In particular, stop using pmap_pte() to read non-promoted pte while
walking the page table.  pmap_pte() needs to shoot down the kernel
mapping globally which causes IPI broadcast.  Since
pmap_extract_and_hold() is used for slow copyin(9), it is very
significant hit for the 4/4 kernels.

Instead, create single purpose per-processor page frame and use it to
locally map page table page inside the critical section, to avoid
reuse of the frame by other thread if context switched.

Measurement demostrated very significant improvements in any load that
utilizes copyin/copyout.

Found and benchmarked by:	bde
Sponsored by:	The FreeBSD Foundation
2018-05-25 16:29:22 +00:00
kib
4778ea9ef7 Cleanup. Remove unused instruction and label.
Tested by:	bde
Sponsored by:	The FreeBSD Foundation
2018-05-25 16:24:20 +00:00
avg
546f863d51 re-synchronize TSC-s on SMP systems after resume, if necessary
The TSC-s are checked and synchronized only if they were good
originally.  That is, invariant, synchronized, etc.

This is necessary on an AMD-based system where after a wakeup from STR I
see that BSP clock differs from AP clocks by a count that roughly
corresponds to one second.  The APs are in sync with each other.  Not
sure if this is a hardware quirk or a firmware bug.

This is what I see after a resume with this change:
    SMP: passed TSC synchronization test after adjustment
    acpi_timer0: restoring timecounter, ACPI-fast -> TSC-low

Reviewed by:	kib
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D15551
2018-05-25 07:33:20 +00:00
imp
c98da3edad Make memmove and bcopy share code
Make memmove the primary interface, but have bcopy be an alternative
entry point that jumps into memmove. This will slightly pessimize
bcopy calls, but those are about to get much rarer. Return dst always,
but it will be ignored by bcopy callers. We can remove just the alt
entry point if we ever remove bcopy entirely.

Differential Revision: https://reviews.freebsd.org/D15374
2018-05-24 21:11:33 +00:00
brooks
a36ed4ef19 Avoid two suword() calls per auxarg entry.
Instead, construct an auxargs array and copy it out all at once.

Use an array of Elf_Auxinfo rather than pairs of Elf_Addr * to represent
the array. This is the correct type where pairs of words just happend
to work. To reduce the size of the diff, AUXARGS_ENTRY is altered to act
on this array rather than introducing a new macro.

Return errors on copyout() and suword() failures and handle them in the
caller.

Incidentally fixes AT_RANDOM and AT_EXECFN in 32-bit linux on amd64
which incorrectly used AUXARG_ENTRY instead of AUXARGS_ENTRY_32
(now removed due to the use of proper types).

Reviewed by:	kib
Comments from:	emaste, jhb
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D15485
2018-05-24 16:25:18 +00:00
kib
cf836f6647 x86: stop unconditionally clearing PSL_T on the trace trap.
We certainly should clear PSL_T when calling the SIGTRAP signal
handler, which is already done by all x86 sendsig(9) ABI code.  On the
other hand, there is no obvious reason why PSL_T needs to be cleared
when returning from the signal handler.  For instance, Linux allows
userspace to set PSL_T and keep tracing enabled for the desired
period.  There are userspace programs which would use PSL_T if we make
it possible, for instance sbcl.

Remember if PSL_T was set by PT_STEP or PT_SETSTEP by mean of TDB_STEP
flag, and only clear it when the flag is set.

Discussed with:	Ali Mashtizadeh
Reviewed by:	jhb (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D15054
2018-05-23 21:39:29 +00:00
kib
666d7a085a Stop obliterating actual exception type for emulated traps from vm86.
Wording and reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D15054
2018-05-23 21:26:41 +00:00
kib
90c2d6c602 Style.
Wording and reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D15054
2018-05-23 21:25:49 +00:00
kib
4bdf909413 Support IBRS for i386.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D15522
2018-05-23 16:31:46 +00:00
kib
b10c76de35 Use local unique labels inside most often used macros.
Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-05-22 13:45:40 +00:00
kib
684e2d2368 Fix double-load of %cr3 and double-copy of the stack frame for the
kernel entry from userspace vm86.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-05-22 13:30:56 +00:00
mmacy
cb38945503 fix i386 builds after r334005 and r334009
r334005: add pc_ibpb_set as it is now referenced by common code
(although presumably not needed on i386 since it has been there
since the first spectre mitigation work on amd64)

r334009: there is no amd64 rflags -> i386 eflags
2018-05-22 05:09:33 +00:00
jhb
31acfe0f07 Cleanups related to debug exceptions on x86.
- Add constants for fields in DR6 and the reserved fields in DR7.  Use
  these constants instead of magic numbers in most places that use DR6
  and DR7.
- Refer to T_TRCTRAP as "debug exception" rather than a "trace trap"
  as it is not just for trace exceptions.
- Always read DR6 for debug exceptions and only clear TF in the flags
  register for user exceptions where DR6.BS is set.
- Clear DR6 before returning from a debug exception handler as
  recommended by the SDM dating all the way back to the 386.  This
  allows debuggers to determine the cause of each exception.  For
  kernel traps, clear DR6 in the T_TRCTRAP case and pass DR6 by value
  to other parts of the handler (namely, user_dbreg_trap()).  For user
  traps, wait until after trapsignal to clear DR6 so that userland
  debuggers can read DR6 via PT_GETDBREGS while the thread is stopped
  in trapsignal().

Reviewed by:	kib, rgrimes
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D15189
2018-05-22 00:45:00 +00:00
markj
5594f37668 Enable kernel dump features in GENERIC for most platforms.
This turns on support for kernel dump encryption and compression, and
netdump. arm and mips platforms are omitted for now, since they are more
constrained and don't benefit as much from these features.

Reviewed by:	cem, manu, rgrimes
Tested by:	manu (arm64)
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D15465
2018-05-19 19:53:23 +00:00
emaste
f0cc1a044c Use NULL for SYSINIT's last arg, which is a pointer type
Sponsored by:	The FreeBSD Foundation
2018-05-18 17:58:09 +00:00
kib
088f960e02 Fix PMC_IN_TRAP_HANDLER() for i386 after the 4/4 split.
Sponsored by:	The FreeBSD Foundation
2018-05-13 20:10:02 +00:00
kib
99b8ab0632 Kernel entry from vm86 mode, where PCB_VM86CALL pcb flag is not set,
is executed on the right stack already.  No copy from the entry stack
to the kstack must be performed for vm86 bios call code to function.

To access the pcb flags on kernel entry, unconditionally switch to
kernel address space if vm86 mode is detected.

This fixes very early vm86 bios calls, typically done when boot is
performed by boot2 without loader, and kernel falls back to BIOS calls
to get SMAP.

Reported by:	bde
Sponsored by:	The FreeBSD Foundation
2018-05-12 11:06:59 +00:00