Commit Graph

7472 Commits

Author SHA1 Message Date
Andriy Gapon
3387e8743e vmm/svm: iopm_bitmap and msr_bitmap must be contiguous in physical memory
To achieve that the whole svm_softc is allocated with contigmalloc now.
It would be more effient to de-embed those arrays and allocate only them
with contigmalloc.

Previously, if malloc(9) used non-contiguous pages for the arrays, then
random bits in physical pages next to the first page would be used to
determine permissions for I/O port and MSR accesses.  That could result
in a guest dangerously modifying the host hardware configuration.

One example is that sometimes NMI watchdog driver in a Linux guest
would be able to configure a performance counter on a host system.
The counter would generate an interrupt and if hwpmc(4) driver is loaded
on the host, then the interrupt would be delivered as an NMI.

Discussed with:	jhb
Reviewed by:	grehan
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D8321
2016-10-25 10:34:14 +00:00
Konstantin Belousov
295f4b6cfe Follow-up to r307866:
- Make !KDB config buildable.
- Simplify interface to nmi_handle_intr() by evaluating panic_on_nmi
  in one place, namely nmi_call_kdb().  This allows to remove do_panic
  argument from the functions, and to remove i386/amd64 duplication of
  the variable and sysctl definitions.  Note that now NMI causes
  panic(9) instead of trap_fatal() reporting and then panic(9),
  consistently for NMIs delivered while CPU operated in ring 0 and 3.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-24 20:47:46 +00:00
Konstantin Belousov
835c2787be Handle broadcast NMIs.
On several Intel chipsets, diagnostic NMIs sent from BMC or NMIs
reporting hardware errors are broadcasted to all CPUs.

When kernel is configured to enter kdb on NMI, the outcome is
problematic, because each CPU tries to enter kdb.  All CPUs are
executing NMI handlers, which set the latches disabling the nested NMI
delivery; this means that stop_cpus_hard(), used by kdb_enter() to
stop other cpus by broadcasting IPI_STOP_HARD NMI, cannot work.  One
indication of this is the harmless but annoying diagnostic "timeout
stopping cpus".

Much more harming behaviour is that because all CPUs try to enter kdb,
and if ddb is used as debugger, all CPUs issue prompt on console and
race for the input, not to mention the simultaneous use of the ddb
shared state.

Try to fix this by introducing a pseudo-lock for simultaneous attempts
to handle NMIs.  If one core happens to enter NMI trap handler, other
cores see it and simulate reception of the IPI_STOP_HARD.  More,
generic_stop_cpus() avoids sending IPI_STOP_HARD and avoids waiting
for the acknowledgement, relying on the nmi handler on other cores
suspending and then restarting the CPU.

Since it is impossible to detect at runtime whether some stray NMI is
broadcast or unicast, add a knob for administrator (really developer)
to configure debugging NMI handling mode.

The updated patch was debugged with the help from Andrey Gapon (avg)
and discussed with him.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D8249
2016-10-24 16:40:27 +00:00
Jung-uk Kim
69d410eeb1 Implement BPF_MOD and BPF_XOR instructions.
These two ALU instructions first appeared on Linux.  Then, libpcap adopted
and made them available since 1.6.2.  Now more platforms including NetBSD
have them in kernel.  So do we.
 --이 줄 이하는 자동으로 제거됩니다--
2016-10-21 06:55:07 +00:00
Jung-uk Kim
730b3be34f Redude code for conditional jumps. 2016-10-21 06:09:30 +00:00
Jung-uk Kim
99e3ae6839 Fix compiler warnings for user land. 2016-10-21 06:06:54 +00:00
Stephen J. Kiernan
3239d65238 Add sysctl to make amd64 minidump retry count tunable at runtime.
PR:		213462
Submitted by:	RaviPrakash Darbha <rdarbha@juniper.net>
Reviewed by:	cemi, markj
Approved by:	sjg (mentor)
Obtained from:	Juniper Networks
Differential Revision:	https://reviews.freebsd.org/D8254
2016-10-17 22:57:41 +00:00
Konstantin Belousov
e4b9ff3a9e Do not try to create /dev/efi device node before devfs is initialized.
Split efirt.ko initialization into early stage where runtime services
KPI environment is created, to be used e.g. for RTC, and the later
devfs node creation stage, per module.

Switch the efi device to use make_dev_s(9) instead of make_dev(9).  At
least, this gracefully handles the duplicated device name issue.

Remove ARGSUSED comment from efidev_ioctl(), all unused arguments are
annotated with __unused attribute.

Reported by:	ambrisko, O. Hartmann <ohartman@zedat.fu-berlin.de>
Reviewed by:	imp
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-10-16 06:07:43 +00:00
John Baldwin
31dc1e9681 Drop support for using mmap() with /dev/kmem.
Using the device pager with /dev/kmem is not stable since KVA mappings
are transient, but the device pager caches the PA associated with a
given offset forever.  Interestingly, mips' implementation of
memmap() already refused requests for /dev/kmem.

Note that kvm_read/kvm_write do not use mmap, but use read and write on
/dev/kmem, so this should not affect libkvm users.

Reviewed by:	kib
MFC after:	2 months
2016-10-14 20:01:07 +00:00
Jonathan T. Looney
bd79708dbf In the TCP stack, the hhook(9) framework provides hooks for kernel modules
to add actions that run when a TCP frame is sent or received on a TCP
session in the ESTABLISHED state. In the base tree, this functionality is
only used for the h_ertt module, which is used by the cc_cdg, cc_chd, cc_hd,
and cc_vegas congestion control modules.

Presently, we incur overhead to check for hooks each time a TCP frame is
sent or received on an ESTABLISHED TCP session.

This change adds a new compile-time option (TCP_HHOOK) to determine whether
to include the hhook(9) framework for TCP. To retain backwards
compatibility, I added the TCP_HHOOK option to every configuration file that
already defined "options INET". (Therefore, this patch introduces no
functional change. In order to see a functional difference, you need to
compile a custom kernel without the TCP_HHOOK option.) This change will
allow users to easily exclude this functionality from their kernel, should
they wish to do so.

Note that any users who use a custom kernel configuration and use one of the
congestion control modules listed above will need to add the TCP_HHOOK
option to their kernel configuration.

Reviewed by:	rrs, lstewart, hiren (previous version), sjg (makefiles only)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D8185
2016-10-12 02:16:42 +00:00
Warner Losh
f79d484dff Create /dev/efidev to provide an ioctl interface to
userland.  It supports userland interfaces to UEFI Runtime Services. This is
indended to the the MI portion of EFI RuntimeServices support.

Differential Revision: https://reviews.freebsd.org/D8128
Reviewed by: kib@, wblock@, Ganael Laplanche
2016-10-11 22:24:30 +00:00
Konstantin Belousov
83c001d3c2 Re-apply r306516 (by cem):
Reduce the cost of TLB invalidation on x86 by using per-CPU completion flags

Reduce contention during TLB invalidation operations by using a per-CPU
completion flag, rather than a single atomically-updated variable.

On a Westmere system (2 sockets x 4 cores x 1 threads), dtrace measurements
show that smp_tlb_shootdown is about 50% faster with this patch; observations
with VTune show that the percentage of time spent in invlrng_single_page on an
interrupt (actually doing invalidation, rather than synchronization) increases
from 31% with the old mechanism to 71% with the new one.  (Running a basic file
server workload.)

Submitted by:	Anton Rang <rang at acm.org>
Reviewed by:	cem (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8041
2016-10-04 17:01:24 +00:00
Conrad Meyer
31f575777c Revert r306516 for now, it is incomplete on i386
Noted by:	kib
2016-09-30 18:58:50 +00:00
Conrad Meyer
2965d505f6 Reduce the cost of TLB invalidation on x86 by using per-CPU completion flags
Reduce contention during TLB invalidation operations by using a per-CPU
completion flag, rather than a single atomically-updated variable.

On a Westmere system (2 sockets x 4 cores x 1 threads), dtrace measurements
show that smp_tlb_shootdown is about 50% faster with this patch; observations
with VTune show that the percentage of time spent in invlrng_single_page on an
interrupt (actually doing invalidation, rather than synchronization) increases
from 31% with the old mechanism to 71% with the new one.  (Running a basic file
server workload.)

Submitted by:	Anton Rang <rang at acm.org>
Reviewed by:	cem (earlier version), kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D8041
2016-09-30 18:12:16 +00:00
Hans Petter Selasky
97549c34ec Move the ConnectX-3 and ConnectX-2 driver from sys/ofed into sys/dev/mlx4
like other PCI network drivers. The sys/ofed directory is now mainly
reserved for generic infiniband code, with exception of the mthca driver.

- Add new manual page, mlx4en(4), describing how to configure and load
mlx4en.

- All relevant driver C-files are now prefixed mlx4, mlx4_en and
mlx4_ib respectivly to avoid object filename collisions when compiling
the kernel. This also fixes an issue with proper dependency file
generation for the C-files in question.

- Device mlxen is now device mlx4en and depends on device mlx4, see
mlx4en(4). Only the network device name remains unchanged.

- The mlx4 and mlx4en modules are now built by default on i386 and
amd64 targets. Only building the mlx4ib module depends on
WITH_OFED=YES .

Sponsored by:	Mellanox Technologies
2016-09-30 08:23:06 +00:00
Konstantin Belousov
9fc97c9010 Handle TLB shootdown IPI during the EFI runtime calls, on SandyBridge
and IvyBridge machines, which support PCID but do not have INVPCID
instruction.

MFC after:	1 week
2016-09-26 17:25:25 +00:00
Konstantin Belousov
20692187ea For machines which support PCID but not have INVPCID instruction,
i.e. SandyBridge and IvyBridge, correct a race between pmap_activate()
and invltlb_pcid_handler().

Reported by and tested by:	Slawa Olhovchenkov <slw@zxy.spb.ru>
MFC after:	1 week
2016-09-26 17:22:44 +00:00
Bruce Evans
f5435b8bbe Fix vm86 initialization, part 3 of 2 and a half. (Actually, just fix
early printfs and debugging of vm86 initialization and some other early
initialization in some cases.)  Add an option debug.late_console (with
default 1=off) to move console and kdb initialization back where it was.
Do the same for amd64 although there is no vm86 there.

On my test system, debug.late_console=0 works for the syscons, sio and
uart console drivers on amd64 and i386, and for vt on i386 but not on
amd64.

The early printfs fixed by debug.late_console=0 are:
- on i386, the message about lost memory above 4G
- with -v in otherwise normal use, about 20 printfs for SMAP
- other debugging messages for memory sizing.  Mostly under -v and
  not printed in normal use.

Document in a comment how much earlier the initialization and early
printf()s can be.  That is very early for the console.  Not much more
than curthread is needed.  kdb use obviously needs to be not so early,
since it needs IDT initialization and that is done relatively late
for convenience and historical reasons.
2016-09-25 14:56:24 +00:00
Warner Losh
2faa9f8c8e Change the efi_get_table interface to a void ** so we can return the
pointer by dereferencing the pointer.

Reviewed by: kib@
MFC After: 2 weeks
Sponsored by: Netflix, Inc
2016-09-22 19:04:51 +00:00
Mark Johnston
bdaf6d6913 Regenerate syscall provider argument strings. 2016-09-22 04:50:03 +00:00
Konstantin Belousov
bc3ad3a179 Add kernel interfaces to call EFI Runtime Services.
Runtime services require special execution environment for the call.
Besides that, OS must inform firmware about runtime virtual memory map
which will be active during the calls, with the SetVirtualAddressMap()
runtime call, done while the 1:1 mapping is still used.  There are two
complication: the SetVirtualAddressMap() effectively must be done from
loader, which needs to know kernel address map in advance.  More,
despite not explicitely mentioned in the specification, both 1:1 and
the map passed to SetVirtualAddressMap() must be active during the
SetVirtualAddressMap() call.  Second, there are buggy BIOSes which
require both mappings active during runtime calls as well, most likely
because they fail to identify all relocations to perform.

On amd64, we can get rid of both problems by providing 1:1 mapping for
the duration of runtime calls, by temprorary remapping user addresses.
As result, we avoid the need for loader to know about future kernel
address map, and avoid bugs in BIOSes.  Typically BIOS only maps
something in low 4G.  If not runtime bugs, we would take advantage of
the DMAP, as previous versions of this patch did.

Similar but more complicated trick can be used even for i386 and 32bit
runtime, if and when the EFI boot on i386 is supported.  We would need
a trampoline page, since potentially whole 4G of VA would be switched
on calls, instead of only userspace portion on amd64.

Context switches are disabled for the duration of the call, FPU access
is granted, and interrupts are not disabled.  The later is possible
because kernel is mapped during calls.

To test, the sysctl mib debug.efi_time is provided, setting it to 1
makes one call to EFI get_time() runtime service, on success the efitm
structure is printed to the control terminal.  Load efirt.ko, or add
EFIRT option to the kernel config, to enable code.

Discussed with:	emaste, imp
Tested by:	emaste (mac, qemu)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-09-21 11:31:58 +00:00
Konstantin Belousov
bd0892ffd4 Rename efi_systbl to efi_systbl_phys, the variable contains the
physical address of the EFI System Table.  Add _KERNEL guard around
its declaration in sys/efi.h.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-09-21 10:55:28 +00:00
Konstantin Belousov
559a7b209a Add a way for the architecture to specify the calling ABI for methods
in the EFI Runtime Services Table.  On amd64, the calling conventions
are MS.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-09-21 10:35:44 +00:00
Konstantin Belousov
c1538a1365 Add amd64 functions to load/store GDT register, store IDT and TR registers.
Note that lgdt() name is already used for function which, besides
loading GDT, also reloads segment descriptors cache, thus new function
is named bare_lgdt().

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-09-21 10:10:36 +00:00
Konstantin Belousov
195a6bb9e6 Export the pmap_cache_bits() and pmap_pinit_pml4() functions from the
amd64 pmap.

The new pmap_pinit_pml4() function initializes the level 4 page table
with entries for the kernel mappings.  Both functions are needed for
upcoming EFI Runtime Services support.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-09-21 10:05:51 +00:00
Konstantin Belousov
65afaac0c3 Move pmap_p*e_index() inline functions from pmap.c to pmap.h.
They are already used in minidump code.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-09-20 09:38:07 +00:00
Ed Maste
df4336ddfa Catch up to sys/capability.h rename to sys/capsicum.h in r263232
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-09-19 18:44:43 +00:00
Konstantin Belousov
944e0bab86 Consolidate four efi_next_descriptor() definitions.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-09-18 17:38:02 +00:00
Konstantin Belousov
bd773b6abb Remove trailing space.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2016-09-18 17:33:49 +00:00
Bruce Evans
5904b5a6f2 Fix decoding of tf_rsp on amd64, and move TF_HAS_STACKREGS() to the
i386-only section, and fix a comment about the amd64 kernel trapframe
not having stackregs.

tf_rsp doesn't need decoding on amd64, but had an old clone of i386
code to do this in 1 place, and since the amd64 kernel trapframe does
have stackregs, the result was an off-by-16 error for %rsp in an error
message.
2016-09-16 07:09:35 +00:00
Bruce Evans
c2d4aad4e0 (1) Ifdef the new dr6 variable for KDB.
While here, avoid using the old variable 'code' and remove it
in trap().  ('code' was meant for holding things like %dr6,
but is too small to hold %dr6 on amd64 and was reduced to an
obfuscation of tf_err, with early truncation on amd64.)

Submitted by:	Michael Butler (imb@...)
2016-09-16 04:58:37 +00:00
Bruce Evans
9680db5e94 Decode some REX prefixes in inst_call(). This makes the 'next' and
'until' commands work in more cases.
2016-09-15 18:30:53 +00:00
Bruce Evans
bd20334ca0 Abort single stepping in ddb if the trap is not for single-stepping.
This is not very easy to do, since ddb didn't know when traps are
for single-stepping.  It more or less assumed that traps are either
breakpoints or single-step, but even for x86 this became inadequate
with the release of the i386 in ~1986, and FreeBSD passes it other
trap types for NMIs and panics.

On x86, teach ddb when a trap is for single stepping using the %dr6
register.  Unknown traps are now treated almost the same as breakpoints
instead of as the same as single-steps.  Previously, the classification
of breakpoints was almost correct and everything else was unknown so
had to be treated as a single-step.  Now the classification of single-
steps is precise, the classification of breakpoints is almost correct
(as before) and everything else is unknown and treated like a
breakpoint.

This fixes:
- breakpoints not set by ddb, including the main one in kdb_enter(),
  were treated as single-steps and not stopped on when stepping
  (except for the usual, simple case of a step with residual count 1).
  As special cases, kdb_enter() didn't stop for fatal traps or panics
- similarly for "hardware breakpoints".

Use a new MD macro IS_SSTEP_TRAP(type, code) to code to classify
single-steps.  This is excessively complicated for bug-for-bug and
backwards compatibilty.  Design errors apparently started in Mach
in ~1990 or perhaps in the FreeBSD interface in ~1993.  Common trap
types like single steps should have a unique MI code (like the TRAP*
codes for user SIGTRAP) so that debuggers don't need macros like
IS_SSTEP_TRAP() to decode them.  But 'type' is actually an ambiguous
MD trap number, and code was always 0 (now it is (int)%dr6 on x86).
So it was impossible to determine the trap type from the args.
Global variables had to be used.

There is already a classification macro db_pc_is_single_step(), but
this just gets in the way.  It is only used to recover from bugs in
IS_BREAKPOINT_TRAP().  On some arches, IS_BREAKPOINT_TRAP() just
duplicates the ambiguity in 'type' and misclassifies single-steps as
breakpoints.  It defaults to 'false', which is the opposite of what is
needed for bug-for-bug compatibility.

When this is cleaned up, MI classification bits should be passed in
'code'.  This could be done now for positive-logic bits, since 'code'
was always 0, but some negative logic is needed for compatibility so
a simple MI classificition is not usable yet.

After reading %dr6, clear the single-step bit in it so that the type
of the next debugger trap can be decoded.  This is a little
ddb-specific.  ddb doesn't understand the need to clear this bit and
doing it before calling kdb is easiest.  gdb would need to reverse
this to support hardware breakpoints, but it just doesn't support
them now since gdbstub doesn't support %dr*.

Fix a bug involving %dr6: when emulating a single-step trap for vm86,
set the bit for it in %dr6.  Userland debuggers need this.  ddb now
needs this for vm86 bios calls.  The bit gets copied to 'code' then
cleared again.

Fix related style bugs:
- when clearing bits for hardware breakpoints in %dr6, spell the mask
  as ~0xf on both amd64 and i386 to get the correct number of bits
  using sign extension and not need a comment about using the wrong
  mask on amd64 (amd64 traps for invalid results but clearing the
  reserved top bits didn't trap since they are 0).
- rewrite my old wrong comments about using %dr6 for ddb watchpoints.
2016-09-15 17:24:23 +00:00
John Baldwin
38605d7312 Remove 'cpu' and 'cpu_class' on amd64.
The 'cpu' and 'cpu_class' variables were always set to the same value
on amd64 and are legacy holdovers from i386.  Remove them entirely on
amd64.

Reviewed by:	imp, kib (older version)
Differential Revision:	https://reviews.freebsd.org/D7888
2016-09-15 17:05:54 +00:00
Bjoern A. Zeeb
fce7ff68f3 Try to fix LINT builds after r305807. Seems to be a simple s&r error
I missed while reading through the 1st time as well.
2016-09-14 16:08:23 +00:00
Bruce Evans
701ac88055 Use the MI macro TRAPF_USERMODE() instead of open-coded checks for
SEL_UPL and sometimes PSL_VM.  This is just a style change on amd64,
but on i386 it fixes 1 unimportant place where the PSL_VM check was
missing and starts fixing 1 important place where the PSL_VM check
had a logic error.

Fix logic errors in treating vm86 bioscall mode as kernel mode.  The
main place checked all the necessary flags, but put the necessary
parentheses for the PSL_VM and PCB_VM86CALL checks in the wrong
place.  The broken case is only reached if a vm86 bioscall uses a
%cs which is nonzero mod 4, but that is unusual -- most bios calls
start with %cs = 0xc000 or 0xf000 and rarely change it.  Another
place was missing the check for PCB_VM86CALL, but was only reachable
if there are bugs virtualizing PSL_I.

Add a macro TF_HAS_STACKREGS() and use this instead of converting
open-coded checks of SEL_UPL, etc. to TRAPF_USERMODE() when we only
care about whether the frame has stack registers.  This fixes 3
places in my recent fix for register variables in vm86 mode where I
messed up the PSL_VM check and cleans up other places.
2016-09-14 12:57:40 +00:00
Konstantin Belousov
cf1c47763f Add FPU_KERN_NOCTX flag to the fpu_kern_enter() function on amd64.
The flag specifies that the block which uses FPU must be executed in
critical section, i.e. take no context switches, and does not need an
FPU save area during the execution.

It is intended to be applied around fast and short code pathes where
save area allocation is impossible or undesirable, due to context or
due to the relative cost of calculation vs. allocation.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-09-11 09:14:07 +00:00
Alan Cox
8cb0c1029d Various changes to pmap_ts_referenced()
Move PMAP_TS_REFERENCED_MAX out of the various pmap implementations and
into vm/pmap.h, and describe what its purpose is.  Eliminate the archaic
"XXX" comment about its value.  I don't believe that its exact value, e.g.,
5 versus 6, matters.

Update the arm64 and riscv pmap implementations of pmap_ts_referenced()
to opportunistically update the page's dirty field.

On amd64, use the PDE value already cached in a local variable rather than
dereferencing a pointer again and again.

Reviewed by:	kib, markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D7836
2016-09-10 16:49:25 +00:00
Andriy Gapon
f13826052b work around AMD erratum 793 for family 16h, models 00h-0Fh 2016-09-07 14:24:29 +00:00
John Baldwin
da0fc9250c Reset PCI pass through devices via PCI-e FLR during VM start and end.
Add routines to trigger a function level reset (FLR) of a PCI-express
device via the PCI-express device control register.  This also includes
support routines to wait for pending transactions to complete as well
as calculating the maximum completion timeout permitted by a device.

Change the ppt(4) driver to reset pass through devices before attaching
to a VM during startup and before detaching from a VM during shutdown.

Reviewed by:	imp, wblock (earlier version)
MFC after:	1 month
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D7751
2016-09-06 21:15:35 +00:00
John Baldwin
64414cc00f Update the I/O MMU in bhyve when PCI devices are added and removed.
When the I/O MMU is active in bhyve, all PCI devices need valid entries
in the DMAR context tables. The I/O MMU code does a single enumeration
of the available PCI devices during initialization to add all existing
devices to a domain representing the host. The ppt(4) driver then moves
pass through devices in and out of domains for virtual machines as needed.
However, when new PCI devices were added at runtime either via SR-IOV or
HotPlug, the I/O MMU tables were not updated.

This change adds a new set of EVENTHANDLERS that are invoked when PCI
devices are added and deleted. The I/O MMU driver in bhyve installs
handlers for these events which it uses to add and remove devices to
the "host" domain.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D7667
2016-09-06 20:17:54 +00:00
John Baldwin
db4b3cdad8 Remove remnants of PERFMON and I586_PMC_GUPROF from amd64.
These options were never fully ported over from i386.
2016-09-06 19:25:32 +00:00
John Baldwin
5fb03c3780 Leave ppt devices in the host domain when they are not attached to a VM.
This allows a pass through device to be reset to a normal device driver
on the host and reused on the host.  ppt devices are now always active in
some I/O MMU domain when the I/O MMU is active, either the host domain
or the domain of a VM they are attached to.

Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D7666
2016-09-06 18:53:17 +00:00
Mark Johnston
dbbaf04f1e Remove support for idle page zeroing.
Idle page zeroing has been disabled by default on all architectures since
r170816 and has some bugs that make it seemingly unusable. Specifically,
the idle-priority pagezero thread exacerbates contention for the free page
lock, and yields the CPU without releasing it in non-preemptive kernels. The
pagezero thread also does not behave correctly when superpage reservations
are enabled: its target is a function of v_free_count, which includes
reserved-but-free pages, but it is only able to zero pages belonging to the
physical memory allocator.

Reviewed by:	alc, imp, kib
Differential Revision:	https://reviews.freebsd.org/D7714
2016-09-03 20:38:13 +00:00
Alan Cox
53aadae680 As an optimization to the machine-independent layer, change the machine-
dependent pmap_ts_referenced() so that it updates the page's dirty field
if a modified bit is found while counting reference bits.  This
opportunistic update can be performed at low cost and can eliminate the
need for some future calls to pmap_is_modified() by the machine-
independent layer.

Reviewed by:	kib, markj
MFC after:	3 weeks
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D7722
2016-09-01 15:57:44 +00:00
Bruce Evans
ef209971e9 Shorten banal comments about zeroing and copying pages. Don't give
implementation details that last echoed the code 15-20 years ago.
But add a detail about pagezero() on i386.  Switch from Mach style
to BSD style.
2016-08-29 14:38:31 +00:00
Bruce Evans
1a5735873e On amd64, declare sse2_pagezero() and start using it again, but only
for zeroing pages in idle where nontemporal writes are clearly best.
This is almost a no-op since zeroing in idle works does nothing good
and is off by default.  Fix END() statement forgotten in previous
commit.

Align the loop in sse2_pagezero().  Since it writes to main memory,
the loop doesn't have to be very carefully written to keep up.
Unrolling it was considered useless or harmful and was not done on
i386, but that was too careless.

Timing for i386: the loop was not unrolled at all, and moved only 4
bytes/iteration.  So on a 2GHz CPU, it needed to run at 2 cycles/
iteration to keep up with a memory speed of just 4GB/sec.  But when
it crossed a 16-byte boundary, on old CPUs it ran at 3 cycles/
iteration so it gave a maximum speed of 2.67GB/sec and couldn't even
keep up with PC3200 memory.  Fix the alignment so that it keep up with
4GB/sec memory, and unroll once to get nearer to 8GB/sec.  Further
unrolling might be useless or harmful since it would prevent the loop
fitting in 16-bytes.  My test system with an old CPU and old DDR1 only
needed 5+ GB/sec.  My test system with a new CPU and DDR3 doesn't need
any changes to keep up ~16GB/sec.

Timing for amd64: with 8-byte accesses and newer faster CPUs it is
easy to reach 16GB/sec but not so easy to go much faster.  The
alignment doesn't matter much if the CPU is not very old.  The loop
was already unrolled 4 times, but needs 32 bytes and uses a fancy
method that doesn't work for 2-way unrolling in 16 bytes.  Just
align it to 32-bytes.
2016-08-29 13:07:21 +00:00
Bruce Evans
537a47a1ba Restore the nontemporal pagezero() under the name sse2_pagezero() (the
same name as for i386).  It is not reconnected yet.

Which method is better is too machine-dependent and system-dependent
to replace the old method unconditionally.
2016-08-29 06:07:43 +00:00
John Baldwin
ffe1b10d95 Enable I/O MMU when PCI pass through is first used.
Rather than enabling the I/O MMU when the vmm module is loaded,
defer initialization until the first attempt to pass a PCI device
through to a guest.  If the I/O MMU fails to initialize or is not
present, than fail the attempt to pass a PCI device through to a
guest.

The hw.vmm.force_iommu tunable has been removed since the I/O MMU is
no longer enabled during boot.  However, the I/O MMU support can be
disabled by setting the hw.vmm.iommu.enable tunable to 0 to prevent
use of the I/O MMU on any systems where it is buggy.

Reviewed by:	grehan
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D7448
2016-08-26 20:15:22 +00:00
Ed Schouten
22f2f875ad Make execution of 32-bit CloudABI executables work on amd64.
A nice thing about requiring a vDSO is that it makes it incredibly easy
to provide full support for running 32-bit processes on 64-bit systems.
Instead of letting the kernel be responsible for composing/decomposing
64-bit arguments across multiple registers/stack slots, all of this can
now be done in the vDSO. This means that there is no need to provide
duplicate copies of certain system calls, like the sys_lseek() and
freebsd32_lseek() we have for COMPAT_FREEBSD32.

This change imports a new vDSO from the CloudABI repository that has
automatically generated code in it that copies system call arguments
into a buffer, padding them to eight bytes and zero-extending any
pointers/size_t arguments. After returning from the kernel, it does the
inverse: extracting return values, in the process truncating
pointers/size_t values to 32 bits.

Obtained from:	https://github.com/NuxiNL/cloudabi
2016-08-24 10:51:33 +00:00
Ed Schouten
48734c99d3 Convert pointers obtained from the threadattr_t structure with TO_PTR().
In all of these source files, the userspace pointer size corresponds
with the kernelspace pointer size, meaning that casting directly works.
As I'm planning on making 32-bit execution on 64-bit systems work as
well, use TO_PTR() here as well, so that the changes between source
files remain minimal.
2016-08-24 10:13:18 +00:00
John Baldwin
a47632d45b Fix build for !SMP kernels after the Xen MSIX workaround.
Move msix_disable_migration under #ifdef SMP since it doesn't make sense
for !SMP kernels.

PR:		212014
Reported by:	Glyn Grinstead <glyn@grinstead.org>
MFC after:	3 days
2016-08-22 21:23:17 +00:00
John Baldwin
c1c9764296 Remove the si(4) driver and sicontrol(8) for Specialix serial cards.
The si(4) driver supported multiport serial adapters for ISA, EISA, and
PCI buses.  This driver does not use bus_space, instead it depends on
direct use of the pointer returned by rman_get_virtual().  It is also
still locked by Giant and calls for patch testing to convert it to use
bus_space were unanswered.

Relnotes:	yes
2016-08-19 21:14:27 +00:00
Konstantin Belousov
b44d5b4a49 The pmap_delayed_invl_wait() function blocks on turnstile, it does not
spin, in the committed version.  Remove stray '*' in the text.

Sponsored by:	The FreeBSD Foundation.
MFC after:	3 days
2016-08-11 12:37:11 +00:00
Ed Schouten
13b4b4df98 Provide the CloudABI vDSO to its executables.
CloudABI executables already provide support for passing in vDSOs. This
functionality is used by the emulator for OS X to inject system call
handlers. On FreeBSD, we could use it to optimize calls to
gettimeofday(), etc.

Though I don't have any plans to optimize any system calls right now,
let's go ahead and already pass in a vDSO. This will allow us to
simplify the executables, as the traditional "syscall" shims can be
removed entirely. It also means that we gain more flexibility with
regards to adding and removing system calls.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D7438
2016-08-10 21:02:41 +00:00
Konstantin Belousov
e42f8233fc Unconditionally perform checks that FPU region was entered, when #NM
exception is caught in kernel mode.  There are third-party modules
which trigger the issue, and since the problem causes usermode state
corruption at least, panic in production kernels as well.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-08-10 13:44:03 +00:00
John Baldwin
ad1d96ca7a Don't permit mappings of invalid physical addresses on amd64 via /dev/mem.
Discussed with:	kib
2016-08-04 17:55:23 +00:00
John Baldwin
2de70600fa Correct assertion on vcpuid argument to vm_gpa_hold().
PR:		208168
Submitted by:	Dave Cameron <daverabbitz@ihug.co.nz>
Reviewed by:	grehan
MFC after:	1 month
2016-08-03 15:20:10 +00:00
Konstantin Belousov
fa03524a9f Merge i386 and amd64 variants of mp_watchdog.c into x86/, there is no
difference between files.
For pc98, put x86/mp_x86.c into the same place as used by i386 file list.
Fix typo in comment.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-08-03 13:51:53 +00:00
Mateusz Guzik
b7ff4d59de amd64: implement pagezero using rep stos
The current implementation uses non-temporal writes. This turns out to
be detrimental to performance if the page is used shortly after, which
is the typical case with page faults.

Switch to rep stos.

Reviewed by:	kib
MFC after:	1 week
2016-07-31 11:34:08 +00:00
Brooks Davis
40018b91dd Don't create pointless backups of generated files in "make sysent".
Any sensible workflow will include a revision control system from which
to restore the old files if required.  In normal usage, developers just
have to clean up the mess.

Reviewed by:	jhb
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D7353
2016-07-28 21:29:04 +00:00
Alexander Motin
fb112f72a8 Add more UEFI/e820 memory types from latest specifications.
This is only cosmetics.

MFC after:	2 weeks
2016-07-24 09:15:11 +00:00
Alexander Motin
4cefe96c6d Increase number of I/O APIC pins from 24 to 32 to give PCI up to 16 IRQs.
Move HPET to the top of the supported 0-31 range.

Proposed by:	jhb@, grehan@
2016-07-14 14:35:25 +00:00
Andriy Gapon
7a946127ef remove a stray change from r302834
MFC after:	3 weeks
X-MFC with:	r302834
2016-07-14 11:13:26 +00:00
Andriy Gapon
a2d87b79cf fix-up for configuration of AMD Family 10h processors borrowed from Linux
http://lxr.free-electrons.com/source/arch/x86/kernel/cpu/amd.c#L643
BIOS may configure Family 10h processors to convert WC+ cache type
to CD.  That can hurt performance of guest VMs using nested paging.

Reviewed by:	kib
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D6059
2016-07-14 11:03:05 +00:00
Eric Badger
fdb6320d45 Add explicit detection of KVM hypervisor
Set vm_guest to a new enum value (VM_GUEST_KVM) when kvm is detected and use
vm_guest in conditionals testing for KVM.

Also, fix a conditional checking if we're running in a VM which caught only
the generic VM case, but not more specific VMs (KVM, VMWare, etc.).  (Spotted
by: vangyzen).

Differential revision:	https://reviews.freebsd.org/D7172
Sponsored by:	Dell Inc.
Approved by:	kib (mentor), vangyzen (mentor)
Reviewed by:	alc
MFC after:	4 weeks
2016-07-13 19:19:18 +00:00
Roger Pau Monné
302244700f xen: automatically disable MSI-X interrupt migration
If the hypervisor version is smaller than 4.6.0. Xen commits 74fd00 and
70a3cb are required on the hypervisor side for this to be fixed, and those
are only included in 4.6.0, so stay on the safe side and disable MSI-X
interrupt migration on anything older than 4.6.0.

It should not cause major performance degradation unless a lot of MSI-X
interrupts are allocated.

Sponsored by:		Citrix Systems R&D
MFC after:		3 days
Reviewed by:		jhb
Differential revision:	https://reviews.freebsd.org/D7148
2016-07-12 08:43:09 +00:00
Dmitry Chagin
97d06da692 Fix a copy/paste bug introduced during X86_64 Linuxulator work.
FreeBSD support NX bit on X86_64 processors out of the box, for i386 emulation
use READ_IMPLIES_EXEC flag, introduced in r302515.

While here move common part of mmap() and mprotect() code to the files in compat/linux
to reduce code dupcliation between Linuxulator's.

Reported by:    Johannes Jost Meixner, Shawn Webb

MFC after:	1 week
XMFC with:	r302515, r302516
2016-07-10 08:22:04 +00:00
Dmitry Chagin
ab231b83ea Regen for r302215 (Linux personality). 2016-07-10 08:17:16 +00:00
Dmitry Chagin
23e8912c60 Implement Linux personality() system call mainly due to READ_IMPLIES_EXEC flag.
In Linux if this flag is set, PROT_READ implies PROT_EXEC for mmap().
Linux/i386 set this flag automatically if the binary requires executable stack.

READ_IMPLIES_EXEC flag will be used in the next Linux mmap() commit.
2016-07-10 08:15:50 +00:00
Ed Schouten
d96aeddf2f Don't forget to set sa->narg for CloudABI system calls.
It turns out that this value is not used within the system call code
under normal conditions, except when using tracing tools like ktrace.
If we forget to set this value, it is set to random garbage. This may
cause ktrace to hang indefinitely, making it impossible to kill.

Reported by: Michael Plass
PR: 210800
MFC before: 11.0-RELEASE
2016-07-08 20:09:21 +00:00
Nathan Whitehorn
96c85efb4b Replace a number of conflations of mp_ncpus and mp_maxid with either
mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in
the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try,
for example, to run tasks on CPUs that did not exist or to allocate too
few buffers on systems with sparse CPU IDs in which there are holes in the
range and mp_maxid > mp_ncpus. Such circumstances generally occur on
systems with SMT, but on which SMT is disabled. This patch restores system
operation at least on POWER8 systems configured in this way.

There are a number of other places in the kernel with potential problems
in these situations, but where sparse CPU IDs are not currently known
to occur, mostly in the ARM machine-dependent code. These will be fixed
in a follow-up commit after the stable/11 branch.

PR:		kern/210106
Reviewed by:	jhb
Approved by:	re (glebius)
2016-07-06 14:09:49 +00:00
Konstantin Belousov
5c2cf81845 Update comments for the MD functions managing contexts for new
threads, to make it less confusing and using modern kernel terms.

Rename the functions to reflect current use of the functions, instead
of the historic KSE conventions:
  cpu_set_fork_handler -> cpu_fork_kthread_handler (for kthreads)
  cpu_set_upcall -> cpu_copy_thread (for forks)
  cpu_set_upcall_kse -> cpu_set_upcall (for new threads creation)

Reviewed by:	jhb (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:	re (hrs)
Differential revision:	https://reviews.freebsd.org/D6731
2016-06-16 12:05:44 +00:00
Konstantin Belousov
b9c8a54dc3 Do not access pv_table array for fictitious pages, since the array
does not cover the dynamically registered ficititious ranges, and
fictitious pages mappings are not promoted.  Offer a dummy struct
md_page to fetch constant superpage pv list generation to satisfy
logic.  Also, by initializing the pv_dummy pv_list to empty, we can
remove several explicit PG_FICTITIOUS tests.

Reported and tested by:	Michael Butler <imb@protected-networks.net>
	(previous version)
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D6728
Approved by:	re (hrs)
2016-06-13 03:45:08 +00:00
Konstantin Belousov
fc0924b9a4 Avoid spurious EINVAL in amd64 pmap_change_attr().
Do not try to change attributes for DMAP when working on a mapping
which is not covered by the DMAP. This was reported on real system
where a BAR of a device (NTB) was mapped outside the PCI window.

Reported and tested by:	mav
Reviewed by:	jhb, mav
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D6668
2016-06-05 17:11:23 +00:00
Konstantin Belousov
4230ac2fa5 In pmap_advise(), avoid leaking DI start for EPT pmaps which needs A/D
emulation.  Assert that syscalls do not leak DI.

Reported by:	gjb
Sponsored by:	The FreeBSD Foundation
2016-05-27 18:45:11 +00:00
Jung-uk Kim
1eb6b86a44 Both Clang and GCC cannot generate efficient reserve_pv_entries().
http://docs.freebsd.org/cgi/mid.cgi?552BFEB2.8040407

Re-implement it entirely in inline assembly not to let compilers do silly
spilling to memory.  For non-POPCNT case, use newly added bit_count(3).

Reported by:	alc
Reviewed by:	alc, kib
Differential Revision:	https://reviews.freebsd.org/D6541
2016-05-25 23:06:52 +00:00
Jung-uk Kim
22d9c132f6 Document POPCNT erratum for 6th Generation Intel Core processors. 2016-05-23 23:00:47 +00:00
Dmitry Chagin
5437e1d103 Add macro to convert errno and use it when appropriate.
MFC after:	1 week
2016-05-22 12:46:34 +00:00
Dmitry Chagin
f26a190f65 Regen after r300359 (struct l_sched_param removal).
MFC after:	1 week
2016-05-21 08:03:13 +00:00
Dmitry Chagin
8cc96fb43a Correct an argument param of linux_sched_* system calls as a struct l_sched_param
does not defined due to it's nature.

MFC after:	1 week
2016-05-21 08:01:14 +00:00
Konstantin Belousov
0bfad8e4a3 Check for overflow and return EINVAL if detected. Backport this and
r300305 to i386.

PR:	209661
Reported and reviewed by:	cturt
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2016-05-20 19:50:32 +00:00
Konstantin Belousov
ae76e30131 Use unsigned type for the loop index to make overflow checks effective.
PR:	209661
Reported by:	cturt
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2016-05-20 15:32:48 +00:00
Eitan Adler
cef367e6a1 Don't repeat the the word 'the'
(one manual change to fix grammar)

Confirmed With: db
Approved by: secteam (not really, but this is a comment typo fix)
2016-05-17 12:52:31 +00:00
Sepherosa Ziehau
dfdc9a05c6 atomic: Add testandclear on i386/amd64
Reviewed by:	kib
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D6381
2016-05-16 07:19:33 +00:00
Konstantin Belousov
56e61f57b0 Eliminate pvh_global_lock from the amd64 pmap.
The only current purpose of the pvh lock was explained there
On Wed, Jan 09, 2013 at 11:46:13PM -0600, Alan Cox wrote:
> Let me lay out one example for you in detail.  Suppose that we have
> three processors and two of these processors are actively using the same
> pmap.  Now, one of the two processors sharing the pmap performs a
> pmap_remove().  Suppose that one of the removed mappings is to a
> physical page P.  Moreover, suppose that the other processor sharing
> that pmap has this mapping cached with write access in its TLB.  Here's
> where the trouble might begin.  As you might expect, the processor
> performing the pmap_remove() will acquire the fine-grained lock on the
> PV list for page P before destroying the mapping to page P.  Moreover,
> this processor will ensure that the vm_page's dirty field is updated
> before releasing that PV list lock.  However, the TLB shootdown for this
> mapping may not be initiated until after the PV list lock is released.
> The processor performing the pmap_remove() is not problematic, because
> the code being executed by that processor won't presume that the mapping
> is destroyed until the TLB shootdown has completed and pmap_remove() has
> returned.  However, the other processor sharing the pmap could be
> problematic.  Specifically, suppose that the third processor is
> executing the page daemon and concurrently trying to reclaim page P.
> This processor performs a pmap_remove_all() on page P in preparation for
> reclaiming the page.  At this instant, the PV list for page P may
> already be empty but our second processor still has a stale TLB entry
> mapping page P.  So, changes might still occur to the page after the
> page daemon believes that all mappings have been destroyed.  (If the PV
> entry had still existed, then the pmap lock would have ensured that the
> TLB shootdown completed before the pmap_remove_all() finished.)  Note,
> however, the page daemon will know that the page is dirty.  It can't
> possibly mistake a dirty page for a clean one.  However, without the
> current pvh global locking, I don't think anything is stopping the page
> daemon from starting the laundering process before the TLB shootdown has
> completed.
>
> I believe that a similar example could be constructed with a clean page
> P' and a stale read-only TLB entry.  In this case, the page P' could be
> "cached" in the cache/free queues and recycled before the stale TLB
> entry is flushed.

TLBs for addresses with updated PTEs are always flushed before pmap
lock is unlocked.  On the other hand, amd64 pmap code does not always
flushes TLBs before PV list locks are unlocked, if previously PTEs
were cleared and PV entries removed.

To handle the situations where a thread might notice empty PV list but
third thread still having access to the page due to TLB invalidation
not finished yet, introduce delayed invalidation.  Comparing with the
pvh_global_lock, DI does not block entered thread when
pmap_remove_all() or pmap_remove_write() (callers of
pmap_delayed_invl_wait()) are executed in parallel.  But _invl_wait()
callers are blocked until all previously noted DI blocks are leaved,
thus ensuring that neccessary TLB invalidations were performed before
returning from pmap_remove_all() or pmap_remove_write().

See comments for detailed description of the mechanism, and also for
the explanations why several pmap methods, most important
pmap_enter(), do not need DI protection.

Reviewed by:	alc, jhb (turnstile KPI usage)
Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D5747
2016-05-14 23:35:11 +00:00
Alan Cox
d3ffaee8e6 Eliminate an unused #include. For a brief period of time, _unrhdr.h was
used to implement PCID support on amd64.

Reviewed by:	kib
2016-05-13 20:14:41 +00:00
Konstantin Belousov
aa3ec63e02 Add locking annotations to amd64 struct md_page members.
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2016-05-10 09:58:51 +00:00
John Baldwin
8d791e5af1 Add a new bus method to fetch device-specific CPU sets.
bus_get_cpus() returns a specified set of CPUs for a device.  It accepts
an enum for the second parameter that indicates the type of cpuset to
request.  Currently two valus are supported:

 - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to
   the device when DEVICE_NUMA is enabled)
 - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core)

For systems that do not support NUMA (or if it is not enabled in the kernel
config), LOCAL_CPUS fails with EINVAL.  INTR_CPUS is mapped to 'all_cpus'
by default.  The idea is that INTR_CPUS should always return a valid set.

Device drivers which want to use per-CPU interrupts should start using
INTR_CPUS instead of simply assigning interrupts to all available CPUs.
In the future we may wish to add tunables to control the policy of
INTR_CPUS (e.g. should it be local-only or global, should it ignore
SMT threads or not).

The x86 nexus driver exposes the internal set of interrupt CPUs from the
the x86 interrupt code via INTR_CPUS.

The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable
LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled.  They also and
the global INTR_CPUS set from the nexus driver with the per-domain set from
_PXM to generate a local INTR_CPUS set for child devices.

Compared to the r298933, this version uses 'struct _cpuset' in
<sys/bus.h> instead of 'cpuset_t' to avoid requiring <sys/param.h>
(<sys/_cpuset.h> still requires <sys/param.h> for MAXCPU even though
<sys/_bitset.h> does not after recent changes).
2016-05-09 20:50:21 +00:00
Dmitry Chagin
8521d01a71 Add a forgotten in r283424 .eh_frame section with CFI & FDE records to allow
stack unwinding through signal handler.

Reported by:	Dmitry Sivachenko
MFC after:	2 weeks
2016-05-09 07:38:47 +00:00
John Baldwin
82cb5c3b5b Native PCI-express HotPlug support.
PCI-express HotPlug support is implemented via bits in the slot
registers of the PCI-express capability of the downstream port along
with an interrupt that triggers when bits in the slot status register
change.

This is implemented for FreeBSD by adding HotPlug support to the
PCI-PCI bridge driver which attaches to the virtual PCI-PCI bridges
representing downstream ports on HotPlug slots. The PCI-PCI bridge
driver registers an interrupt handler to receive HotPlug events. It
also uses the slot registers to determine the current HotPlug state
and drive an internal HotPlug state machine. For simplicty of
implementation, the PCI-PCI bridge device detaches and deletes the
child PCI device when a card is removed from a slot and creates and
attaches a PCI child device when a card is inserted into the slot.

The PCI-PCI bridge driver provides a bus_child_present which claims
that child devices are present on HotPlug-capable slots only when a
card is inserted. Rather than requiring a timeout in the RC for
config accesses to not-present children, the pcib_read/write_config
methods fail all requests when a card is not present (or not yet
ready).

These changes include support for various optional HotPlug
capabilities such as a power controller, mechanical latch,
electro-mechanical interlock, indicators, and an attention button.
It also includes support for devices which require waiting for
command completion events before initiating a subsequent HotPlug
command. However, it has only been tested on ExpressCard systems
which support surprise removal and have none of these optional
capabilities.

PCI-express HotPlug support is conditional on the PCI_HP option
which is enabled by default on arm64, x86, and powerpc.

Reviewed by:	adrian, imp, vangyzen (older versions)
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D6136
2016-05-05 22:26:23 +00:00
Alan Cox
cdbf6d8a05 Explain why pmap_copy(), pmap_enter_pde(), and pmap_enter_quick_locked()
call pmap_invalidate_page() even though they are not destroying a leaf-
level page table entry.

Eliminate some bogus white-space characters in a comment.

Reviewed by:	kib
2016-05-04 17:54:13 +00:00
Pedro F. Giffuni
edafb5a327 sys/amd64: Small spelling fixes.
No functional change.
2016-05-03 22:13:04 +00:00
Pedro F. Giffuni
500eb14ae8 vmm(4): Small spelling fixes.
Reviewed by:	grehan
2016-05-03 22:07:18 +00:00
John Baldwin
8a08b7d36b Revert bus_get_cpus() for now.
I really thought I had run this through the tinderbox before committing,
but many places need <sys/types.h> -> <sys/param.h> for <sys/bus.h> now.
2016-05-03 01:17:40 +00:00
John Baldwin
bc153c692f Add a new bus method to fetch device-specific CPU sets.
bus_get_cpus() returns a specified set of CPUs for a device.  It accepts
an enum for the second parameter that indicates the type of cpuset to
request.  Currently two valus are supported:

 - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to
   the device when DEVICE_NUMA is enabled)
 - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core)

For systems that do not support NUMA (or if it is not enabled in the kernel
config), LOCAL_CPUS fails with EINVAL.  INTR_CPUS is mapped to 'all_cpus'
by default.  The idea is that INTR_CPUS should always return a valid set.

Device drivers which want to use per-CPU interrupts should start using
INTR_CPUS instead of simply assigning interrupts to all available CPUs.
In the future we may wish to add tunables to control the policy of
INTR_CPUS (e.g. should it be local-only or global, should it ignore
SMT threads or not).

The x86 nexus driver exposes the internal set of interrupt CPUs from the
the x86 interrupt code via INTR_CPUS.

The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable
LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled.  They also and
the global INTR_CPUS set from the nexus driver with the per-domain set from
_PXM to generate a local INTR_CPUS set for child devices.

Reviewed by:	wblock (manpage)
Differential Revision:	https://reviews.freebsd.org/D5519
2016-05-02 18:00:38 +00:00
John Baldwin
e131ba36e8 Move 'device pci' for the PCI bus driver to the MI NOTES file.
The PCI bus was already listed in all of the MD NOTES files and the
driver should at least compile on all platforms.
2016-04-29 23:53:55 +00:00
Andriy Gapon
f9ac50ac45 fix missing variable in r298736
Pointyhat to:	avg
Reported by:	Ivan Klymenko <fidaj@ukr.net>
MFC after:	2 weeks
X-MFC with:	r298736
2016-04-28 09:40:24 +00:00
Andriy Gapon
e5e4452078 ensure that initial local apic id is sane on AMD 10h systems
Summary:
The Initial Local APIC ID is returned by CPUID function 1 (in EBX).
On AMD Family 10h systems the way that ID is built is controlled by
an MSR bit (InitApicIdCpuIdLo).  BKDG instructs BIOS to set it in a
certain way, but a BIOS can be buggy.  In that case the ID can confuse
tools that use it, e.g. hwloc.
For example, on a system that I own real Local APIC IDs are configured
as 0, 1, 2, 3, but IDs reported via CPUID.1 are 0, 0x40, 0x80, 0xc0.
See: https://github.com/open-mpi/hwloc/issues/183

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D6060
2016-04-28 08:29:57 +00:00
Conrad Meyer
0e3f9e5bdd AMD64 pmap: Use howmany() macro
Use param.h howmany() instead of hand-rolled version.

Sponsored by:	EMC / Isilon Storage Division
2016-04-24 21:35:01 +00:00
Pedro F. Giffuni
b66bb393f2 Cleanup redundant parenthesis from existing howmany()/roundup() macro uses. 2016-04-22 16:57:42 +00:00
Pedro F. Giffuni
d9c9c81c08 sys: use our roundup2/rounddown2() macros when param.h is available.
rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.

This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.
2016-04-21 19:57:40 +00:00
Pedro F. Giffuni
ea24b0561f X86: use our nitems() macro when it is avaliable through param.h.
No functional change, only trivial cases are done in this sweep,

Discussed in:	freebsd-current
2016-04-19 23:41:46 +00:00
Conrad Meyer
5dc5dab6eb Add 4Kn kernel dump support
(And 4Kn minidump support, but only for amd64.)

Make sure all I/O to the dump device is of the native sector size.  To
that end, we keep a native sector sized buffer associated with dump
devices (di->blockbuf) and use it to pad smaller objects as needed (e.g.
kerneldumpheader).

Add dump_write_pad() as a convenience API to dump smaller objects with
zero padding.  (Rather than pull in NPM leftpad, we wrote our own.)

Savecore(1) has been updated to deal with these dumps.  The format for
512-byte sector dumps should remain backwards compatible.

Minidumps for other architectures are left as an exercise for the
reader.

PR:		194279
Submitted by:	ambrisko@
Reviewed by:	cem (earlier version), rpokala
Tested by:	rpokala (4Kn/512 except 512 fulldump), cem (512 fulldump)
Relnotes:	yes
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D5848
2016-04-15 17:45:12 +00:00
Sepherosa Ziehau
0c29fe6db8 hyperv: Deprecate HYPERV option by moving Hyper-V IDT vector into vmbus
Submitted by:	Jun Su <junsu microsoft com>
Reviewed by:	jhb, kib, sephe
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D5910
2016-04-15 02:20:18 +00:00
John Baldwin
4478441145 Expose doreti as a global symbol on amd64 and i386.
doreti provides the common code path for returning from interrupt
andlers on x86.  Exposing doreti as a global symbol allows kernel
modules to include low-level interrupt handlers instead of requiring
all low-level handlers to be statically compiled into the kernel.

Submitted by:	Howard Su <howard0su@gmail.com>
Reviewed by:	kib
2016-04-13 17:37:31 +00:00
John Baldwin
7ecf8cab6f Enable DEVICE_NUMA with up to 8 domains by default on amd64.
8 memory domains should handle a quad-socket board with dual-domain
processors.

Reviewed by:	kib
Relnotes:	maybe?
Differential Revision:	https://reviews.freebsd.org/D5893
2016-04-12 21:23:44 +00:00
Andriy Gapon
0d63fc3ed8 re-enable AMD Topology extension on certain models if disabled by BIOS
Some BIOSes disable AMD Topology extension on AMD Family 15h notebook
processors.  We re-enable the extension, so that we can properly discover
core and cache topology.  Linux seems to do the same.

Reported by:	Johannes Dieterich <dieterich.joh@gmail.com>
Reviewed by:	jhb, kib
Tested by:	Johannes Dieterich <dieterich.joh@gmail.com>
		(earlier version)
MFC after:	3 weeks
Differential Revision:	https://reviews.freebsd.org/D5883
2016-04-12 13:30:39 +00:00
Andriy Gapon
9054bcbce7 [amd64] dtrace_invop handler is to be called only for kernel exceptions
DTrace-related exceptions in userland code are handled elsewhere.
One practical problem was a crash in dtrace_invop_start() when saved
%rsp pointed to a virtual address that was not backed.

i386 code already ignored userland exceptions.

Reviewed by: markj, kib
MFC after:	2 weeks
Differential Revision: https://reviews.freebsd.org/D5906
2016-04-12 06:46:54 +00:00
Anish Gupta
441a3497f5 Allow guest writes to AMD microcode update[0xc0010020] MSR without updating actual hardware MSR. This allows guest microcode update to go through which otherwise failing because wrmsr() was returning EINVAL.
Submitted by:Yamagi Burmeister
Approved by:grehan
MFC after:2 weeks
2016-04-11 05:09:43 +00:00
Ed Schouten
ab83575070 Make CloudABI's way of doing TLS more friendly to userspace emulators.
We're currently seeing how hard it would be to run CloudABI binaries on
operating systems cannot be modified easily (Windows, Mac OS X). The
idea is that we want to just run them without any sandboxing. Now
that CloudABI executables are PIE, this is already a bit easier, but TLS
is still problematic:

- CloudABI executables want to write to the %fs, which typically
  requires extra system calls by the emulator every time it needs to
  switch between CloudABI's and its own TLS.

- If CloudABI executables overwrite the %fs base unconditionally, it
  also becomes harder for the emulator to store a backup of the old
  value of %fs. To solve this, let's no longer overwrite %fs, but just
  %fs:0.

As CloudABI's C library does not use a TCB, this space can now be used
by an emulator to keep track of its internal state. The executable can
now safely overwrite %fs:0, as long as it makes sure that the TCB is
copied over to the new TLS area.

Ensure that there is an initial TLS area set up when the process starts,
only containing a bogus TCB. We don't really care about its contents on
FreeBSD.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D5836
2016-04-06 11:11:31 +00:00
Baptiste Daroussin
b6348be7b9 Add kern.features flags for linux and linux64 modules
kern.features.linux: 1 meaning linux 32 bits binaries are supported
kern.features.linux64: 1 meaning linux 64 bits binaries are supported

The goal here is to help 3rd party applications (including ports) to determine
if the host do support linux emulation

Reviewed by:	dchagin
MFC after:	1 week
Relnotes:	yes
Differential Revision:	D5830
2016-04-05 22:36:48 +00:00
John Baldwin
2b1e924b69 Move i386/i386/autoconf.c to sys/x86/x86 and use it on both amd64 and i386. 2016-04-03 23:03:54 +00:00
Ed Schouten
4a8b3b18cc Make Position Independent Executables work for CloudABI.
- Set BI_CAN_EXEC_DYN, so we can execute ET_DYN ELF files in addition to
  regular ET_EXECs.
- Provide an AT_BASE entry in the auxiliary vector, so the executable
  knows at which address it got loaded and can apply relocations.
2016-03-31 18:52:00 +00:00
Konstantin Belousov
0df87548b9 Type of the interrupt handlers on x86 cannot be expressed in C.
Simplify and unify placeholder type definitions.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D5771
2016-03-29 19:56:48 +00:00
Dmitry Chagin
7c5982000d Revert r297310 as the SOL_XXX are equal to the IPPROTO_XX except SOL_SOCKET.
Pointed out by:	ae@
2016-03-27 10:09:10 +00:00
Dmitry Chagin
c826fcfe22 iConvert Linux SOL_IPV6 level.
MFC after:	1 week
2016-03-27 08:12:01 +00:00
Alexander Motin
baa7dd65be Polish wbwd(4) driver and add more supported chips.
MFC after:	1 month
2016-03-24 20:52:35 +00:00
John Baldwin
7a2c1d8c60 Enable interrupts on the BSP once all PICs are initialized.
This moves the enabling of interrupts slightly earlier (the old location
was still before devices were enumerated and probed) and does it in the
interrupt code (rather than in the device configuration code).  This
also avoids tripping over an assertion on the first TLB shootdown with
earlier AP startup.

Reviewed by:	kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D5710
2016-03-24 00:24:07 +00:00
Dmitry Chagin
351cf753eb Regen for r297061 (fstatfs64 Linux syscall).
MFC after:	1 week
2016-03-20 13:23:01 +00:00
Dmitry Chagin
99546279d6 Implement fstatfs64 system call.
PR:		181012
Submitted by:	John Wehle
MFC after:	1 week
2016-03-20 13:21:20 +00:00
Gleb Smirnoff
e33c2e6b06 Due to invalid use of a signed intermediate value in the bounds checking
during argument validity verification, unbound zero'ing of the process LDT
and adjacent memory can be initiated from usermode.

Submitted by:	CORE Security
Patch by:	kib
Security:	SA-16:15
2016-03-16 22:33:12 +00:00
Konstantin Belousov
3ef966c4c0 The PKRU state size is 4 bytes, its support makes the XSAVE area size
non-multiple of 64 bytes.  Thereafter, the user state save area is
misaligned, which triggers assertion in the debugging kernels, or
segmentation violation on accesses for non-debugging configs.

Force the desired alignment of the user save area as the fix
(workaround is to disable bit 9 in the hw.xsave_mask loader tunable).
This correction is required for booting on the upcoming Intel' Purley
platform.

Reported and tested by:	"Pieper, Jeffrey E" <jeffrey.e.pieper@intel.com>,
	jimharris
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2016-03-15 15:42:53 +00:00
John Baldwin
6fc8053f1a Fix reporting of the CloudABI ABI in kdump.
- Advertise the word size for CloudABI ABIs via the SV_LP64 flag.  All of
  the other ABIs include either SV_ILP32 or SV_LP64.
- Fix kdump to not assume a 32-bit ABI if the ABI flags field is non-zero
  but SV_LP64 isn't set.  Instead, only assume a 32-bit ABI if SV_ILP32 is
  set and fallback to the unknown value of "00" if neither SV_LP64 nor
  SV_ILP32 is set.

Reviewed by:	kib, ed
Differential Revision:	https://reviews.freebsd.org/D5560
2016-03-09 18:38:30 +00:00
Marcel Moolenaar
6bcf245ebc Bump VM_MAX_MEMSEGS from 2 to 3 to match the number of VM segment
identifiers present in vmmapi.h. In particular, it's now possible
to create a VM_FRAMEBUFFER segment.
2016-02-26 16:18:47 +00:00
Konstantin Belousov
abb8f08388 Return dst as the result from memcpy(9) on amd64.
PR:	207422
MFC after:	1 week
2016-02-24 11:58:15 +00:00
Svatopluk Kraus
b352b10400 As <machine/vm.h> is included from <vm/vm.h>, there is no need to
include it explicitly when <vm/vm.h> is already included.

Reviewed by:	alc, kib
Differential Revision:	https://reviews.freebsd.org/D5380
2016-02-22 09:10:23 +00:00
Svatopluk Kraus
35a0bc1260 As <machine/vmparam.h> is included from <vm/vm_param.h>, there is no
need to include it explicitly when <vm/vm_param.h> is already included.

Suggested by:	alc
Reviewed by:	alc
Differential Revision:	https://reviews.freebsd.org/D5379
2016-02-22 09:08:04 +00:00
Svatopluk Kraus
a1e1814d76 As <machine/pmap.h> is included from <vm/pmap.h>, there is no need to
include it explicitly when <vm/pmap.h> is already included.

Reviewed by:	alc, kib
Differential Revision:	https://reviews.freebsd.org/D5373
2016-02-22 09:02:20 +00:00
Gleb Smirnoff
b28cc462ad Include sys/_task.h into uma_int.h, so that taskqueue.h isn't a
requirement for uma_int.h.

Suggested by:	jhb
2016-02-09 20:22:35 +00:00
Gleb Smirnoff
e60b2fcbeb Redo r292484. Embed task(9) into zone, so that uz_maxaction is called
in a context that can sleep, allowing consumers of the KPI to run their
drain routines without any extra measures.

Discussed with:	jtl
2016-02-03 23:30:17 +00:00
John Baldwin
aa949be551 Convert ss_sp in stack_t and sigstack to void *.
POSIX requires these members to be of type void * rather than the
char * inherited from 4BSD.  NetBSD and OpenBSD both changed their
fields to void * back in 1998.  No new build failures were reported
via an exp-run.

PR:		206503 (exp-run)
Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D5092
2016-01-27 17:55:01 +00:00
Xin LI
669414e4fb Implement AT_SECURE properly.
AT_SECURE auxv entry has been added to the Linux 2.5 kernel to pass a
boolean flag indicating whether secure mode should be enabled. 1 means
that the program has changes its credentials during the execution.
Being exported AT_SECURE used by glibc issetugid() call.

Submitted by:	imp, dchagin
Security:	FreeBSD-SA-16:10.linux
Security:	CVE-2016-1883
2016-01-27 07:20:55 +00:00
Dmitry Chagin
9dba79fb66 Remove obsolete comment.
MFC after:	3 days
2016-01-23 08:08:06 +00:00
Dmitry Chagin
f138999141 Fix a typo.
MFC after:	3 days
2016-01-23 08:04:29 +00:00
Hans Petter Selasky
c1ecb7e114 Add missing atomic wrapper macro.
Reviewed by:	alfred @
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2016-01-21 18:22:50 +00:00
Konstantin Belousov
f132cd0547 Use ANSI definitions. Wrap long line.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-01-19 08:08:08 +00:00
Konstantin Belousov
b57e68141f Clear whole XMM register file instead of only XMM0. Also clear x87
registers.  This brings amd64 on par with i386, providing consistent
initial FPU state.

Note that we do not clear any extended state, at least because kernel
does not understand extended state structure and consequences of zero
overwrite after fninit()/fpusave().

Submitted by:	joss.upton@yahoo.com
PR:	206370
MFC after:	2 weeks
2016-01-19 08:04:02 +00:00
Gleb Smirnoff
de44d808ef Regen after r293907. 2016-01-14 10:15:21 +00:00
Gleb Smirnoff
037f750877 Change linux get_robust_list system call to match actual linux one.
The set_robust_list system call request the kernel to record the head
of the list of robust futexes owned by the calling thread. The head
argument is the list head to record.
The get_robust_list system call should return the head of the robust
list of the thread whose thread id is specified in pid argument.
The list head should be stored in the location pointed to by head
argument.

In contrast, our implemenattion of get_robust_list system call copies
the known portion of memory pointed by recorded in set_robust_list
system call pointer to the head of the robust list to the location
pointed by head argument.

So, it is possible for a local attacker to read portions of kernel
memory, which may result in a privilege escalation.

Submitted by:	mjg
Security:	SA-16:03.linux
2016-01-14 10:13:58 +00:00
Jung-uk Kim
4ec1c9bfac Remove dead code when the target processor has POPCNT instruction. 2016-01-13 19:19:50 +00:00
Dmitry Chagin
038c720553 Implement vsyscall hack. Prior to 2.13 glibc uses vsyscall
instead of vdso. An upcoming linux_base-c6 needs it.

Differential Revision:  https://reviews.freebsd.org/D1090

Reviewed by:	kib, trasz
MFC after:	1 week
2016-01-09 20:18:53 +00:00
Ed Maste
0e42ee5dd8 Move amd64 metadata.h to x86 and share with i386
MFC after:	1 week
2016-01-07 19:47:26 +00:00
Ian Lepore
69dcb7e771 Make the 'env' directive described in config(5) work on all architectures,
providing compiled-in static environment data that is used instead of any
data passed in from a boot loader.

Previously 'env' worked only on i386 and arm xscale systems, because it
required the MD startup code to examine the global envmode variable and
decide whether to use static_env or an environment obtained from the boot
loader, and set the global kern_envp accordingly.  Most startup code wasn't
doing so.  Making things even more complex, some mips startup code uses an
alternate scheme that involves calling init_static_kenv() to pass an empty
buffer and its size, then uses a series of kern_setenv() calls to populate
that buffer.

Now all MD startup code calls init_static_kenv(), and that routine provides
a single point where envmode is checked and the decision is made whether to
use the compiled-in static_kenv or the values provided by the MD code.

The routine also continues to serve its original purpose for mips; if a
non-zero buffer size is passed the routine installs the empty buffer ready
to accept kern_setenv() values.  Now if the size is zero, the provided buffer
full of existing env data is installed.  A NULL pointer can be passed if the
boot loader provides no env data; this allows the static env to be installed
if envmode is set to do so.

Most of the work here is a near-mechanical change to call the init function
instead of directly setting kern_envp.  A notable exception is in xen/pv.c;
that code was originally installing a buffer full of preformatted env data
along with its non-zero size (like mips code does), which would have allowed
kern_setenv() calls to wipe out the preformatted data.  Now it passes a zero
for the size so that the buffer of data it installs is treated as
non-writeable.
2016-01-02 02:53:48 +00:00
John Baldwin
9e8d8b4b0c Move shared variables from {amd64,i386}/initcpu.c to x86/identcpu.c.
While here, move the common bits of <machine/cputypes.h> to
<x86/cputypes.h> as well.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D4670
2015-12-23 21:41:42 +00:00
Enji Cooper
418629d81d Remove redundant ctx_switch_xsave declaration in sys/amd64/include/md_var.h
This variable was added to sys/x86/include/x86_var.h recently.

This unbreaks building kernel source that #includes both md_var.h and x86_var.h
with gcc 4.2.1 on amd64

Differential Revision: https://reviews.freebsd.org/D4686
Reviewed by: kib
X-MFC with: r291949
Sponsored by: EMC / Isilon Storage Division
2015-12-22 20:08:32 +00:00
Warner Losh
2fca0f2dd4 Save the physical address passed into the kernel of the UEFI system
table.
2015-12-19 19:01:43 +00:00
Konstantin Belousov
7c958a41fe Merge common parts of i386 and amd64 md_var.h and smp.h into
new headers x86/include x86_var.h and x86_smp.h.

Reviewed by:	emaste, jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D4358
2015-12-07 17:41:20 +00:00
Konstantin Belousov
49e806677c Use ANSI C definition.
MFC after:	1 week
2015-12-07 17:24:55 +00:00
Conrad Meyer
10386b56ad pmap_invalidate_range: For very large ranges, flush the whole TLB
Typical TLBs have 40-512 entries available.  At some point, iterating
every single page in a requested invalidation range and issuing invlpg
on it is more expensive than flushing the TLB and allowing it to reload
on demand.

Broadwell CPUs have 1536 L2 TLB entries, so I've picked the arbitrary
number 4096 entries as a hueristic at which point we flush TLB rather
than invalidating every single potential page.

Reviewed by:	alc
Feedback from:	jhb, kib
MFC notes:	Depends on r291688
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D4280
2015-12-06 17:39:13 +00:00
Konstantin Belousov
27691a24ab For amd64 non-PCID machines, and for i386 machines with support for
the PG_G global pte flag, pmap_invalidate_all() fails to flush global
TLB entries [*].  This is because TLB shootdown handler for such
configs reloads CR3, and on i386 pmap_invalidate_all() does the same
for the initiating CPU.  Note that current code does not issue total
invalidation requests for the kernel_pmap.

Rename amd64 function invltlb_globpcid() to invltlb_glob(), it is not
specific for PCID for quite some time, and implement the same
functionality for i386.  Use the function instead of invltlb() in
shootdown handlers and in i386 pmap_invalidate_all(), but only for the
kernel pmap (which maps pages with the PG_G attribute set), which
takes care of PG_G TLB entries on flush.

To detect the affected pmap in i386 TLB shootdown handler, pmap should
be passed to the smp_masked_invltlb() function, which makes amd64 and
i386 TLB shootdown code almost identical.  Merge the code under x86/.

Noted by:	jhb [*]
Reviewed by:	cem, jhb, pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D4346
2015-12-03 11:14:14 +00:00
Konstantin Belousov
724f4b62b0 Remove sv_prepsyscall, sv_sigsize and sv_sigtbl members of the struct
sysent.

sv_prepsyscall is unused.

sv_sigsize and sv_sigtbl translate signal number from the FreeBSD
namespace into the ABI domain.  It is only utilized on i386 for iBCS2
binaries.  The issue with this approach is that signals for iBCS2 were
delivered with the FreeBSD signal frame layout, which does not follow
iBCS2.  The same note is true for any other potential user if
sv_sigtbl.  In other words, if ABI needs signal number translation, it
really needs custom sv_sendsig method instead.

Sponsored by:	The FreeBSD Foundation
2015-11-28 08:49:07 +00:00
Ed Maste
2e0002c18e Fix whitespace on addition of IPSEC option 2015-11-26 21:35:50 +00:00
Konstantin Belousov
5e27d79314 Split kerne timekeep ABI structure vdso_sv_tk out of the struct
sysentvec.  This allows the timekeep data to be shared between similar
ABIs which cannot share sysentvec.

Make the timekeep_push_vdso() tick callback to the timekeep structures
instead of sysentvecs.  If several sysentvec share the vdso_sv_tk
structure, we would update the userspace data several times on each
tick, without the change.

Only allocate vdso_sv_tk in the exec_sysvec_init() sysinit when
sysentvec is marked with the new SV_TIMEKEEP flag.  This saves
allocation and update of unneeded vdso_sv_tk for ABIs which do not
provide userspace gettimeofday yet, which are PowerPCs arches right
now.

Make vdso_sv_tk allocator public, namely split out and export
alloc_sv_tk() and alloc_sv_tk_compat32().  ABIs which share timekeep
data now can allocate it manually and share as appropriate.

Requested by:	nwhitehorn
Tested by:	nwhitehorn, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-11-23 07:09:35 +00:00
Mark Johnston
7672ca059a Remove unneeded includes of opt_kdtrace.h.
As of r258541, KDTRACE_HOOKS is defined in opt_global.h, so opt_kdtrace.h
is not needed when defining SDT(9) probes.
2015-11-22 02:01:01 +00:00
John Baldwin
645743ea99 Export various helper variables describing the layout and size of
certain kernel structures for use by debuggers. This mostly aids
in examining cores from a kernel without debug symbols as a debugger
can infer these values if debug symbols are available.

One set of variables describes the layout of 'struct linker_file' to
walk the list of loaded kernel modules.

A second set of variables describes the layout of 'struct proc' and
'struct thread' to walk the list of processes in the kernel and the
threads in each process.

The 'pcb_size' variable is used to index into the stoppcbs[] array.

The 'vm_maxuser_address' is used to distinguish kernel virtual addresses
from user addresses. This doesn't have to be perfect, and
'vm_maxuser_address' is a cheap and simple way to differentiate kernel
pointers from simple values like TIDs and PIDs.

While here, annotate the fields in struct pcb used by kgdb on amd64
and i386 to note that their ABI should be preserved.  Annotations for
other platforms will be added in the future.

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D3773
2015-11-12 22:00:59 +00:00
Conrad Meyer
0e5d2011ae pmap_change_attr: Only fixup DMAP for DMAPed ranges
pmap_change_attr must change the memory type of both the requested KVA
and the corresponding DMAP mappings (if such mappings exist), to satisfy
an Intel requirement that two or more mappings to the same physical
pages must have the same memory type.

However, not all kernel mapped pages have corresponding DMAP mappings --
for example, 64-bit BARs.  Skip fixing up the DMAP for out-of-bounds
addresses.

Submitted by:	Steve Wahl <steve_wahl@dell.com>
Reviewed by:	alc, jhb
Sponsored by:	Dell Compellent
Differential Revision:	https://reviews.freebsd.org/D4030
2015-10-29 19:07:00 +00:00
John Baldwin
2219c44a1f Update for LINUX32 rename. The assembler didn't complain about undefined
symbols but just used 0 after the rename.
2015-10-29 15:20:47 +00:00
John Baldwin
6cea44a704 Fix build with DEBUG defined.
Reported by:	hselasky
2015-10-29 15:16:47 +00:00
Kirk McKusick
a57418a761 Bring the tags and links entries for amd64 up to date.
Based on how out of date it is, I doubt that anyone
other than me and my code-reading students still use it.
2015-10-27 22:59:24 +00:00
Konstantin Belousov
af95bbf5bf Intel SDM before revision 56 described the CLFLUSH instruction as only
ordered with the MFENCE instruction.  Similar weak guarantees are also
specified by the AMD APM vol. 3 rev. 3.22.  x86 pmap methods
pmap_invalidate_cache_range() and pmap_invalidate_cache_pages() braced
CLFLUSH loop with MFENCE both before and after the loop.

In the revision 56 of SDM, Intel stated that all existing
implementations of CLFLUSH are strict, CLFLUSH instructions execution
is ordered WRT other CLFLUSH and writes.  Also, the strict behaviour
is made architectural.

A new instruction CLFLUSHOPT (which was documented for some time in
the Instruction Set Extensions Programming Reference) provides the
weak behaviour which was previously attributed to CLFLUSH.

Use CLFLUSHOPT when available.  When CLFLUSH is used on Intel CPUs, do
not execute MFENCE before and after the flushing loop.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
2015-10-24 21:37:47 +00:00
Konstantin Belousov
3f8e071052 Add CLFLUSHOPT instruction wrappers.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-10-23 11:45:38 +00:00
John Baldwin
f2264f5048 Regen for linux32 rename and linux64 systrace. 2015-10-22 21:33:37 +00:00
John Baldwin
2f99bcce1e Rename remaining linux32 symbols such as linux_sysent[] and
linux_syscallnames[] from linux_* to linux32_* to avoid conflicts with
linux64.ko.  While here, add support for linux64 binaries to systrace.
- Update NOPROTO entries in amd64/linux/syscalls.master to match the
  main table to fix systrace build.
- Add a special case for union l_semun arguments to the systrace
  generation.
- The systrace_linux32 module now only builds the systrace_linux32.ko.
  module on amd64.
- Add a new systrace_linux module that builds on both i386 and amd64.
  For i386 it builds the existing systrace_linux.ko.  For amd64 it
  builds a systrace_linux.ko for 64-bit binaries.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D3954
2015-10-22 21:28:20 +00:00
John Baldwin
5047105b71 Merge r289055 to amd64/linux32:
linux: fix handling of out-of-bounds syscall attempts

Due to an off by one the code would read an entry past the table, as
opposed to the last entry which contains the nosys handler.
2015-10-22 21:23:58 +00:00
Ed Schouten
b78ef4bd86 Refactoring: move out generic bits from cloudabi64_sysvec.c.
In order to make it easier to support CloudABI on ARM64, move out all of
the bits from the AMD64 cloudabi_sysvec.c into a new file
cloudabi_module.c that would otherwise remain identical. This reduces
the AMD64 specific code to just ~160 lines.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D3974
2015-10-22 09:07:53 +00:00
Roger Pau Monné
6a306bff7f x86/xen: Consolidate xen-os.h in a single place
amd64 and i386 platform code contain very similar xen/xen-os.h

The only differences are:
 - Functions/variables/types which were unused in i386/xen/xen-os.h:
    * xen_xchg
    * __xchg_dummy
    * __xg
    * __xchg
    * atomic_t
    * atomic_inc
    * rdtscll

The functions/variables/types unused in xen-os.h can be dropped and there
is no more differences betwen amd64 and i386.

The new header is placed in x86/include/xen and each platform will have
dummy headers include x86/xen/*.h. This is to be able to include
machine/xen/*.h in the PV drivers.

Submitted by:		Julien Grall <julien.grall@citrix.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D3880
Sponsored by:		Citrix Systems R&D
2015-10-21 10:04:35 +00:00
Alexander Motin
4a3760bae6 Remove compatibility shims for legacy ATA device names.
We got new ATA stack in FreeBSD 8.x, switched to it at 9.x, completely
removed old stack at 10.x, so at 11.x it is time to remove compat shims.
2015-10-11 13:01:51 +00:00
Mateusz Guzik
3e15a670d2 linux: fix handling of out-of-bounds syscall attempts
Due to an off by one the code would read an entry past the table, as
opposed to the last entry which contains the nosys handler.

Reported by:	Pawel Biernacki <pawel.biernacki gmail.com>
2015-10-08 21:08:35 +00:00
Roger Pau Monné
a231723cc0 xen/console: Introduce a new console driver for Xen guest
The current Xen console driver is crashing very quickly when using it on
an ARM guest. This is because the console lock is recursive and it may
lead to recursion on the tty lock and/or corrupt the ring pointer.

Furthermore, the console lock is not always taken where it should be and has
to be released too early because of the way the console has been designed.

Over the years, code has been modified to support various new features but
the driver has not been reworked.

This new driver has been rewritten with the idea of only having a small set
of specific function to write either via the shared ring or the hypercall
interface.

Note that HVM support has been left aside for now because it requires
additional features which are not yet supported. A follow-up patch will be
sent with HVM guest support.

List of items that may be good to have but not mandatory:
 - Avoid to flush for each character written when using the tty
 - Support multiple consoles

Submitted by:		Julien Grall <julien.grall@citrix.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D3698
Sponsored by:		Citrix Systems R&D
2015-10-08 16:39:43 +00:00
Roger Pau Monné
1a52c10530 Update Xen headers from 4.2 to 4.6
Pull the latest headers for Xen which allow us to add support for ARM and
use new features in FreeBSD.

This is a verbatim copy of the xen/include/public so every headers which
don't exits anymore in the Xen repositories have been dropped.

Note the interface version hasn't been bumped, it will be done in a
follow-up. Although, it requires fix in the code to get it compiled:

 - sys/xen/xen_intr.h: evtchn_port_t is already defined in the headers so
   drop it.

 - {amd64,i386}/include/intr_machdep.h: NR_EVENT_CHANNELS now depends on
   xen/interface/event_channel.h, so include it.

 - {amd64,i386}/{amd64,i386}/support.S: It's not neccessary to include
   machine/intr_machdep.h. This is also fixing build compilation with the
   new headers.

 - dev/xen/blkfront/blkfront.c: The typedef for blkif_request_segmenthas
   been dropped. So directly use struct blkif_request_segment

Finally, modify xen/interface/xen-compat.h to throw a preprocessing error if
__XEN_INTERFACE_VERSION__ is not set. This is allow us to catch any file
where xen/xen-os.h is not correctly included.

Submitted by:		Julien Grall <julien.grall@citrix.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D3805
Sponsored by:		Citrix Systems R&D
2015-10-06 11:29:44 +00:00
Alan Cox
9f86aba61c Exploit r288122 to address a cosmetic issue. Since PV chunk pages don't
belong to a vm object, they can't be paged out.  Since they can't be paged
out, they are never enqueued in a paging queue.  Nonetheless, passing
PQ_INACTIVE to vm_page_unwire() creates the appearance that these pages
are being enqueued in the inactive queue.  As of r288122, we can avoid
this false impression by passing PQ_NONE.

Submitted by:	kmacy (an earlier version)
Differential Revision:	https://reviews.freebsd.org/D1674
2015-09-26 07:18:05 +00:00
Mateusz Guzik
c025b81442 amd64: plug redundant bootAP declaration
Reported by:	gcc5
2015-09-22 21:07:47 +00:00
Konstantin Belousov
cff8c6f2d1 Add support for weak symbols to the kernel linkers. It means that
linkers no longer raise an error when undefined weak symbols are
found, but relocate as if the symbol value was 0.  Note that we do not
repeat the mistake of userspace dynamic linker of making the symbol
lookup prefer non-weak symbol definition over the weak one, if both
are available.  In fact, kernel linker uses the first definition
found, and ignores duplicates.

Signature of the elf_lookup() and elf_obj_lookup() functions changed
to split result/error code and the symbol address returned.
Otherwise, it is impossible to return zero address as the symbol
value, to MD relocation code.  This explains the mechanical changes in
elf_machdep.c sources.

The powerpc64 R_PPC_JMP_SLOT handler did not checked error from the
lookup() call, the patch leaves the code as is (untested).

Reported by:	glebius
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-09-20 01:27:59 +00:00
Mark Johnston
610141cebb Add stack_save_td_running(), a function to trace the kernel stack of a
running thread.

It is currently implemented only on amd64 and i386; on these
architectures, it is implemented by raising an NMI on the CPU on which
the target thread is currently running. Unlike stack_save_td(), it may
fail, for example if the thread is running in user mode.

This change also modifies the kern.proc.kstack sysctl to use this function,
so that stacks of running threads are shown in the output of "procstat -kk".
This is handy for debugging threads that are stuck in a busy loop.

Reviewed by:	bdrewery, jhb, kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3256
2015-09-11 03:54:37 +00:00
Mark Johnston
4db79feb8f Merge stack(9) implementations for i386 and amd64 under x86/.
Reviewed by:	jhb, kib
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3255
2015-09-11 03:24:07 +00:00
Konstantin Belousov
1fa6712471 Do not hold the process around the vm_fault() call from the trap()s.
The only operation which is prevented by the hold is the kernel stack
swapout for the faulted thread, which should be fine to allow.

Remove useless checks for NULL curproc or curproc->p_vmspace from the
trap_pfault() wrappers on x86 and powerpc.

Reviewed by:	alc (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-09-10 17:46:48 +00:00
Mark Johnston
21be12e0ca Remove an unneeded instruction.
MFC after:	1 week
2015-08-28 00:17:21 +00:00
Conrad Meyer
e974f91c38 Import ioat(4) driver
I/OAT is also referred to as Crystal Beach DMA and is a Platform Storage
Extension (PSE) on some Intel server platforms.

This driver currently supports DMA descriptors only and is part of a
larger effort to upstream an interconnect between multiple systems using
the Non-Transparent Bridge (NTB) PSE.

For now, this driver is only built on AMD64 platforms.  It may be ported
to work on i386 later, if that is desired.  The hardware is exclusive to
x86.

Further documentation on ioat(4), including API documentation and usage,
can be found in the new manual page.

Bring in a test tool, ioatcontrol(8), in tools/tools/ioat.  The test
tool is not hooked up to the build and is not intended for end users.

Submitted by:	jimharris, Carl Delsey <carl.r.delsey@intel.com>
Reviewed by:	jimharris (reviewed my changes)
Approved by:	markj (mentor)
Relnotes:	yes
Sponsored by:	Intel
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3456
2015-08-24 19:32:03 +00:00
Roger Pau Monné
e8234cfef6 preload_search_info: make sure mod is set
Add a check to preload_search_info to make sure mod is set. Most of the
callers of preload_search_info don't check that the mod parameter is
set, which can cause page faults. While at it, remove some now unnecessary
checks before calling preload_search_info.

Sponsored by:		Citrix Systems R&D
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D3440
2015-08-21 15:57:57 +00:00
Baptiste Daroussin
d83272a486 Add a kern.features.cloudabi64 entry when the module is loaded to helps the
userland to be able to test is cloudabi64 is supported or not

Reviewed by:	ed
Differential Revision:	https://reviews.freebsd.org/D3430
2015-08-19 15:18:32 +00:00
Marcel Moolenaar
4a99d3f571 Add 24 more page table pages we allocate on boot-up. 16MB slop
is a little tight in and by itself, but severily insufficient
when one needs to map a large frame buffer as part of console
initialization. 64MB slop should be enough for a while. As an
example: a 15" MacBook Pro with retina display needs ~28MB of
KVA for the frame buffer.

PR:		193745
2015-08-18 01:53:41 +00:00
Konstantin Belousov
7a39d38dbd XEN/amd64 may initiate i/o over the pages not mapped by the direct
map.  Handle busdma bouncing and ata PIO accesses by using global
frame used by the current CPU locally for the duration of
pmap_quick_enter/remove_page().  A spin mutex protects the concurent
frame use and prevents thread migration.

Noted by:	royger
Reviewed by:	alc, jah, royger (previous version)
Sponsored by:	The FreeBSD Foundation
2015-08-17 18:42:45 +00:00
Marcel Moolenaar
7ef5e8bc80 Better support memory mapped console devices, such as VGA and EFI
frame buffers and memory mapped UARTs.

1.  Delay calling cninit() until after pmap_bootstrap(). This makes
    sure we have PMAP initialized enough to add translations. Keep
    kdb_init() after cninit() so that we have console when we need
    to break into the debugger on boot.
2.  Unfortunately, the ATPIC code had be moved as well so as to
    avoid a spurious trap #30. The reason for which is not known
    at this time.
3.  In pmap_mapdev_attr(), when we need to map a device prior to the
    VM system being initialized, use virtual_avail as the KVA to map
    the device at. In particular, avoid using the direct map on amd64
    because we can't demote by virtue of not being able to allocate
    yet. Keep track of the translation.
    Re-use the translation after the VM has been initialized to not
    waste KVA and to satisfy the assumption in uart(4) that the handle
    returned for the low-level console is the same as later returned
    when the device is probed and attached.
4.  In pmap_unmapdev() remove the mapping from the table when called
    pre-init. Otherwise keep the mapping. During bus probe and attach
    device resources are mapped and unmapped multiple times, which
    would have us destroy the mapping used by the low-level console.
5.  In pmap_init(), set pmap_initialized to signal that we're not
    pre-init anymore. On amd64, bring the direct map in sync with the
    translations created at that time.
6.  Implement bus_space_map() and bus_space_unmap() for real: when
    the tag corresponds to memory space, call the corresponding
    pmap_mapdev() and pmap_unmapdev() functions to construct and
    actual handle.
7.  In efifb.c and vt_vga.c, remove the crutches and hacks and simply
    call pmap_mapdev_attr() or bus_space_map() as desired.

Notes:
1.  uart(4) already used bus_space_map() during low-level console
    setup but since serial ports have traditionally been I/O port
    based, the lack of a proper implementation for said function
    was not a problem. It has always supported memory mapped UARTs
    for low-level consoles by setting hw.uart.console accordingly.
2.  The use of the direct map on amd64 without setting caching
    attributes has been a bigger problem than previously thought.
    This change has the fortunate (and unexpected) side-effect of
    fixing various EFI frame buffer problems (though not all).

PR: 191564, 194952

Special thanks to:
1.  XipLink, Inc -- generously donated an Intel Bay Trail E3800
    based eval board (ADLE3800PC).
2.  The FreeBSD Foundation, in particular emaste@ -- for UEFI
    support in general and testing.
3.  Everyone who tested the proposed for PR 191564.
4.  jhb@ and kib@ for being a soundboard and applying a clue bat
    if so needed.
2015-08-12 15:26:32 +00:00
Konstantin Belousov
0e190a486f Initialization of smp_tlb_wait does not require release semantic, no
data is synchronized by store/load to the variable.  The
lapic_write_icr() function ensures that store buffers are flushed
before IPI command is issued.

Discussed with:	bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-08-12 09:46:39 +00:00
Konstantin Belousov
c77d57c8b4 AP should load aps_ready with acquire semantic to see BSP updates to
the SMP structures, synchronized with the load by release store in
release_aps().

The change is formal, x86 strong memory model implicitely provided
the guarantees.

Discussed with:	bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-08-12 09:43:12 +00:00
Konstantin Belousov
edc8222303 Make kstack_pages a tunable on arm, x86, and powepc. On i386, the
initial thread stack is not adjusted by the tunable, the stack is
allocated too early to get access to the kernel environment. See
TD0_KSTACK_PAGES for the thread0 stack sizing on i386.

The tunable was tested on x86 only.  From the visual inspection, it
seems that it might work on arm and powerpc.  The arm
USPACE_SVC_STACK_TOP and powerpc USPACE macros seems to be already
incorrect for the threads with non-default kstack size.  I only
changed the macros to use variable instead of constant, since I cannot
test.

On arm64, mips and sparc64, some static data structures are sized by
KSTACK_PAGES, so the tunable is disabled.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 week
2015-08-10 17:18:21 +00:00
John Baldwin
3c790178c5 Remove some more vestiges of the Xen PV domu support. Specifically,
use vtophys() directly instead of vtomach() and retire the no-longer-used
headers <machine/xenfunc.h> and <machine/xenvar.h>.

Reported by:	bde (stale bits in <machine/xenfunc.h>)
Reviewed by:	royger (earlier version)
Differential Revision:	https://reviews.freebsd.org/D3266
2015-08-06 17:07:21 +00:00
Ed Maste
fc8c856029 Rationalize BSD license on sys/*/include/in_cksum.h
Remove the advertising clause from the Regents of the University of
California's license, per the letter dated July 22, 1999.

Update clause numbering.
2015-08-05 19:05:12 +00:00
Jason A. Harmening
713841afb2 Add two new pmap functions:
vm_offset_t pmap_quick_enter_page(vm_page_t m)
void pmap_quick_remove_page(vm_offset_t kva)

These will create and destroy a temporary, CPU-local KVA mapping of a specified page.

Guarantees:
--Will not sleep and will not fail.
--Safe to call under a non-sleepable lock or from an ithread

Restrictions:
--Not guaranteed to be safe to call from an interrupt filter or under a spin mutex on all platforms
--Current implementation does not guarantee more than one page of mapping space across all platforms. MI code should not make nested calls to pmap_quick_enter_page.
--MI code should not perform locking while holding onto a mapping created by pmap_quick_enter_page

The idea is to use this in busdma, for bounce buffer copies as well as virtually-indexed cache maintenance on mips and arm.

NOTE: the non-i386, non-amd64 implementations of these functions still need review and testing.

Reviewed by:	kib
Approved by:	kib (mentor)
Differential Revision:	http://reviews.freebsd.org/D3013
2015-08-04 19:46:13 +00:00
Warner Losh
75333e6435 Add pmspvc device back to GENERIC. The issues with the device playing
grabby hands with other driver's devices has been solved.

MFC After: 3 weeks
2015-08-03 13:49:46 +00:00
Ed Schouten
ee95773383 Let CloudABI use the SV_CAPSICUM flag.
CloudABI processes will now start up in capabilities mode.

Reviewed by:	kib
2015-08-03 13:42:52 +00:00
Konstantin Belousov
f94cc23475 Clear the IA32_MISC_ENABLE MSR bit, which limits the max CPUID
reported, on APs.  We already did this on BSP.

Otherwise, the userspace software which depends on the features
reported by the high CPUID levels is misbehaving.  In particular, AVX
detection is non-functional, depending on which CPU thread happens to
execute when doing CPUID.  Another victim is the libthr signal
handlers interposer, which needs to save full FPU extended state.

Reported and tested by:	Andre Meiser <ortadur@web.de>
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-08-03 12:14:42 +00:00
Ed Schouten
75c9f22394 Set p_osrel to __FreeBSD_version on process startup.
Certain system calls have quirks applied to make them work as if called
on an older version of FreeBSD. As CloudABI executables don't have the
FreeBSD OS release number in the ELF header, this value is set to zero,
making the system calls fall back to typically historic, non-standard
behaviour.

Reviewed by:	kib
2015-08-03 07:29:57 +00:00
Glen Barber
45e1c1a38d Pull pmspcv (pms(4)) from GENERIC. It has PCI ID conflicts
with ahd(4), mvs(4), and likely other drivers.

MFC after:	immediately
With hat:	re
Sponsored by:	The FreeBSD Foundation
2015-07-31 15:23:48 +00:00
Konstantin Belousov
0b6476ec5b Improve comments.
Submitted by:	bde
MFC after:	2 weeks
2015-07-30 15:47:53 +00:00
Konstantin Belousov
1d1ec02c44 Remove full barrier from the amd64 atomic_load_acq_*(). Strong
ordering semantic of x86 CPUs makes only the compiler barrier
neccessary to give the acquire behaviour.

Existing implementation ensured sequentially consistent semantic for
load_acq, making much stronger guarantee than required by standard's
definition of the load acquire.  Consumers which depend on the barrier
are believed to be identified and already fixed to use proper
operations.

Noted by:	alc (long time ago)
Reviewed by:	alc, bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-07-28 07:04:51 +00:00
Alan Cox
d8b56c8eab Add a comment discussing the appropriate use of the atomic_*() functions
with acquire and release semantics versus the *mb() functions on amd64
processors.

Reviewed by:	bde (an earlier version), kib
Sponsored by:	EMC / Isilon Storage Division
2015-07-24 19:43:18 +00:00
John Baldwin
9a2d6ab990 Various changes to the registers displayed in DDB for x86.
- Fix segment registers to only display the low 16 bits.
- Remove unused handlers and entries for the debug registers.
- Display xcr0 (if valid) in 'show sysregs'.
- Add '0x' prefix to MSR values to match other values in 'show sysregs'.
- MFamd64: Display various MSRs in 'show sysregs'.
- Add a 'show dbregs' to display the value of debug registers.
- Dynamically size the column width for register values to properly
  align columns on 64-bit platforms.
- Display %gs for i386 in 'show registers'.

Differential Revision:	https://reviews.freebsd.org/D2784
Reviewed by:	kib, markj
MFC after:	2 weeks
2015-07-22 01:09:02 +00:00
Mark Johnston
a5cbf8b9c0 Let the unwinder handle faults during function prologues or epilogues.
The i386 and amd64 DDB stack unwinders contain code to detect and handle
the case where the first frame is not completely set up or torn down. This
code was accidentally unused however, since db_backtrace() was never called
with a non-NULL trap frame. This change fixes that.

Also remove get_rsp() from the amd64 code. It appears to have come from
i386, which needs to take into account whether the exception triggered a
CPL switch, since SS:ESP is only pushed onto the stack if so. On amd64,
SS:RSP is pushed regardless, so get_rsp() was doing the wrong thing for
kernel-mode exceptions. As a result, we can also remove custom print
functions for these registers.

Reviewed by:	jhb
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D2881
2015-07-21 23:22:23 +00:00