353 Commits

Author SHA1 Message Date
Li-Wen Hsu
db7ec3c3e6 Fix i386 build for r361275
kponsored by:	The FreeBSD Foundation
2020-05-20 13:51:27 +00:00
Wei Hu
a560f3ebd7 HyperV socket implementation for FreeBSD
This change adds Hyper-V socket feature in FreeBSD. New socket address
family AF_HYPERV and its kernel support are added.

Submitted by:	Wei Hu <weh@microsoft.com>
Reviewed by:	Dexuan Cui <decui@microsoft.com>
Relnotes:	yes
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D24061
2020-05-20 11:03:59 +00:00
Konstantin Belousov
4e012582d3 hyperv: Add Hygon Dhyana support.
Submitted by:	Pu Wen <puwen@hygon.cn>
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23563
2020-02-13 19:12:07 +00:00
Wei Hu
ace5ce7e70 hyperv/vmbus: Update VMBus version 4.0 and 5.0 support.
Add VMBus protocol version 4.0. and 5.0 to support Windows 10 and newer HyperV hosts.

For VMBus 4.0 and newer HyperV, the netvsc gpadl teardown must be done after vmbus close.

Submitted by:	whu
MFC after:	2 weeks
Sponsored by:	Microsoft
2019-07-09 07:24:18 +00:00
Takanori Watanabe
5efca36fbd Distinguish _CID match and _HID match and make lower priority probe
when _CID match.

Reviewed by: jhb, imp
Differential Revision:https://reviews.freebsd.org/D16468
2018-10-26 00:05:46 +00:00
Alan Cox
49bfa624ac Eliminate the arena parameter to kmem_free(). Implicitly this corrects an
error in the function hypercall_memfree(), where the wrong arena was being
passed to kmem_free().

Introduce a per-page flag, VPO_KMEM_EXEC, to mark physical pages that are
mapped in kmem with execute permissions.  Use this flag to determine which
arena the kmem virtual addresses are returned to.

Eliminate UMA_SLAB_KRWX.  The introduction of VPO_KMEM_EXEC makes it
redundant.

Update the nearby comment for UMA_SLAB_KERNEL.

Reviewed by:	kib, markj
Discussed with:	jeff
Approved by:	re (marius)
Differential Revision:	https://reviews.freebsd.org/D16845
2018-08-25 19:38:08 +00:00
Alan Cox
83a90bffd8 Eliminate kmem_malloc()'s unused arena parameter. (The arena parameter
became unused in FreeBSD 12.x as a side-effect of the NUMA-related
changes.)

Reviewed by:	kib, markj
Discussed with:	jeff, re@
Differential Revision:	https://reviews.freebsd.org/D16825
2018-08-21 16:43:46 +00:00
Konstantin Belousov
b3a7db3b06 Use SMAP on amd64.
Ifuncs selectors dispatch copyin(9) family to the suitable variant, to
set rflags.AC around userspace access.  Rflags.AC bit is cleared in
all kernel entry points unconditionally even on machines not
supporting SMAP.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D13838
2018-07-29 20:47:00 +00:00
Dexuan Cui
96f105d11f hyperv: Fix boot-up after malloc() returns memory of NX by default now
FreeBSD VM can't boot up on Hyper-V after the recent malloc change in
r335068: Make UMA and malloc(9) return non-executable memory in most cases.

The hypercall page here must be executable.
Fix the boot-up issue by adding M_EXEC.

PR:		229167
Sponsored by:	Microsoft
2018-07-07 00:41:04 +00:00
Konstantin Belousov
d86c1f0dc1 i386 4/4G split.
The change makes the user and kernel address spaces on i386
independent, giving each almost the full 4G of usable virtual addresses
except for one PDE at top used for trampoline and per-CPU trampoline
stacks, and system structures that must be always mapped, namely IDT,
GDT, common TSS and LDT, and process-private TSS and LDT if allocated.

By using 1:1 mapping for the kernel text and data, it appeared
possible to eliminate assembler part of the locore.S which bootstraps
initial page table and KPTmap.  The code is rewritten in C and moved
into the pmap_cold(). The comment in vmparam.h explains the KVA
layout.

There is no PCID mechanism available in protected mode, so each
kernel/user switch forth and back completely flushes the TLB, except
for the trampoline PTD region. The TLB invalidations for userspace
becomes trivial, because IPI handlers switch page tables. On the other
hand, context switches no longer need to reload %cr3.

copyout(9) was rewritten to use vm_fault_quick_hold().  An issue for
new copyout(9) is compatibility with wiring user buffers around sysctl
handlers. This explains two kind of locks for copyout ptes and
accounting of the vslock() calls.  The vm_fault_quick_hold() AKA slow
path, is only tried after the 'fast path' failed, which temporary
changes mapping to the userspace and copies the data to/from small
per-cpu buffer in the trampoline.  If a page fault occurs during the
copy, it is short-circuit by exception.s to not even reach C code.

The change was motivated by the need to implement the Meltdown
mitigation, but instead of KPTI the full split is done.  The i386
architecture already shows the sizing problems, in particular, it is
impossible to link clang and lld with debugging.  I expect that the
issues due to the virtual address space limits would only exaggerate
and the split gives more liveness to the platform.

Tested by: pho
Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D14633
2018-04-13 20:30:49 +00:00
Ed Maste
5a7ed65fff Correct comment typo in Hyper-V
PR:		226665
Submitted by:	Ryo ONODERA
MFC after:	3 days
2018-03-30 02:25:12 +00:00
Ed Maste
fc2a8776a2 Rename assym.s to assym.inc
assym is only to be included by other .s files, and should never
actually be assembled by itself.

Reviewed by:	imp, bdrewery (earlier)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D14180
2018-03-20 17:58:51 +00:00
Konstantin Belousov
bd50262f70 PTI for amd64.
The implementation of the Kernel Page Table Isolation (KPTI) for
amd64, first version. It provides a workaround for the 'meltdown'
vulnerability.  PTI is turned off by default for now, enable with the
loader tunable vm.pmap.pti=1.

The pmap page table is split into kernel-mode table and user-mode
table. Kernel-mode table is identical to the non-PTI table, while
usermode table is obtained from kernel table by leaving userspace
mappings intact, but only leaving the following parts of the kernel
mapped:

    kernel text (but not modules text)
    PCPU
    GDT/IDT/user LDT/task structures
    IST stacks for NMI and doublefault handlers.

Kernel switches to user page table before returning to usermode, and
restores full kernel page table on the entry. Initial kernel-mode
stack for PTI trampoline is allocated in PCPU, it is only 16
qwords.  Kernel entry trampoline switches page tables. then the
hardware trap frame is copied to the normal kstack, and execution
continues.

IST stacks are kept mapped and no trampoline is needed for
NMI/doublefault, but of course page table switch is performed.

On return to usermode, the trampoline is used again, iret frame is
copied to the trampoline stack, page tables are switched and iretq is
executed.  The case of iretq faulting due to the invalid usermode
context is tricky, since the frame for fault is appended to the
trampoline frame.  Besides copying the fault frame and original
(corrupted) frame to kstack, the fault frame must be patched to make
it look as if the fault occured on the kstack, see the comment in
doret_iret detection code in trap().

Currently kernel pages which are mapped during trampoline operation
are identical for all pmaps.  They are registered using
pmap_pti_add_kva().  Besides initial registrations done during boot,
LDT and non-common TSS segments are registered if user requested their
use.  In principle, they can be installed into kernel page table per
pmap with some work.  Similarly, PCPU can be hidden from userspace
mapping using trampoline PCPU page, but again I do not see much
benefits besides complexity.

PDPE pages for the kernel half of the user page tables are
pre-allocated during boot because we need to know pml4 entries which
are copied to the top-level paging structure page, in advance on a new
pmap creation.  I enforce this to avoid iterating over the all
existing pmaps if a new PDPE page is needed for PTI kernel mappings.
The iteration is a known problematic operation on i386.

The need to flush hidden kernel translations on the switch to user
mode make global tables (PG_G) meaningless and even harming, so PG_G
use is disabled for PTI case.  Our existing use of PCID is
incompatible with PTI and is automatically disabled if PTI is
enabled.  PCID can be forced on only for developer's benefit.

MCE is known to be broken, it requires IST stack to operate completely
correctly even for non-PTI case, and absolutely needs dedicated IST
stack because MCE delivery while trampoline did not switched from PTI
stack is fatal.  The fix is pending.

Reviewed by:	markj (partially)
Tested by:	pho (previous version)
Discussed with:	jeff, jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2018-01-17 11:44:21 +00:00
Sepherosa Ziehau
6bf331af5d hyperv/vmbus: Expose Hyper-V major version.
MFC after:	3 days
Sponsored by:	Microsoft
2017-10-10 08:23:19 +00:00
Sepherosa Ziehau
8dc07838db hyperv/vmbus: Add tunable to pin/unpin event tasks.
Event tasks are pinned to their respective CPU by default, in the same
fashion as they were.

Unpin the event tasks by setting hw.vmbus.pin_evttask to 0, if certain
CPUs serve special purpose.

MFC after:	3 days
Sponsored by:	Microsoft
2017-10-10 08:16:55 +00:00
Sepherosa Ziehau
93b4e111bb hyperv: Update copyright for the files changed in 2017
MFC after:	3 days
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D11982
2017-08-14 06:00:50 +00:00
Sepherosa Ziehau
554e6778b6 hyperv/vmbus: Reorganize vmbus device tree
For GEN1 Hyper-V, vmbus is attached to pcib0, which contains the
resources for PCI passthrough and SR-IOV.  There is no
acpi_syscontainer0 on GEN1 Hyper-V.

For GEN2 Hyper-V, vmbus is attached to acpi_syscontainer0, which
contains the resources for PCI passthrough and SR-IOV.  There is
no pcib0 on GEN2 Hyper-V.

The ACPI VMBUS device now only holds its _CRS, which is empty as
of this commit; its existence is mainly for upward compatibility.

Device tree structure is suggested by jhb@.

Tested-by:	dexuan@
Collabrated-wth:	dexuan@
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D10565
2017-05-10 05:28:14 +00:00
Sepherosa Ziehau
9ba5e29c5c hyperv: Use kmem_malloc for hypercall memory due to NX bit change.
Reported by:	dexuan@
MFC after:	now
Sponsored by:	Microsoft
2017-04-19 02:39:48 +00:00
Sepherosa Ziehau
227bb849d3 hyperv: Add method to read 64bit Hyper-V specific time value.
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D9057
2017-01-09 03:38:41 +00:00
Sepherosa Ziehau
69d2eb82d7 hyperv/vmbus: Nuke unnecessary critical sections.
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8906
2016-12-28 03:07:58 +00:00
Sepherosa Ziehau
be53a2fa1b hyperv: Unbreak EARLY_AP_STARUP Hyper-V bootstrap by using intrhook
Properly working pause and friends are required.

MFC after:	3 days
Sponsored by:	Microsoft
2016-12-21 03:23:35 +00:00
Sepherosa Ziehau
fff5be0be3 hyperv: Implement userspace gettimeofday(2) with Hyper-V reference TSC
This 6 times gettimeofday performance, as measured by
tools/tools/syscall_timing

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8789
2016-12-19 07:40:45 +00:00
Sepherosa Ziehau
9622c93ae8 hyperv: Allow userland to ro-mmap reference TSC page
This paves way to implement VDSO for the enlightened time counter.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8768
2016-12-15 03:32:24 +00:00
Sepherosa Ziehau
de69dfbbbc hyperv: Implement "enlightened" time counter, which is rdtsc based.
Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8763
2016-12-14 03:20:57 +00:00
Sepherosa Ziehau
b99113a1c1 hyperv/vmbus: Add channel polling support.
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8738
2016-12-12 05:04:55 +00:00
Sepherosa Ziehau
096d83feb9 hyperv/timesync: Support "sent TC" to improve accuracy.
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8723
2016-12-08 05:37:39 +00:00
Sepherosa Ziehau
98cb13b6cc hyperv/vmbus: Utilize vmbus_chan_run_task()
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8686
2016-12-08 05:15:00 +00:00
Sepherosa Ziehau
6dbe58e249 hyperv/vmbus: Use pause if possible.
This makes booting on Hyper-V w/ small # of vCPUs work properly.

Reported by:	Hongxiong Xian <v-hoxian microsoft com>, Hongjiang Zhang <honzhan microsoft com>
MFC after:	1 week
Sponsored by:	Microsoft
2016-12-07 08:12:02 +00:00
Sepherosa Ziehau
ca567dee01 hypver/vmbus: Remove extra assertion.
It is asserted by vmbus_chan_gpadl_connect() now.

MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8660
2016-11-30 08:10:49 +00:00
Sepherosa Ziehau
b5c7e2415e hyperv/vmbus: Add DEVMETHOD to map cpu to event taskq.
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8658
2016-11-30 07:45:05 +00:00
Sepherosa Ziehau
5e61edf60e hyperv/vmbus: Use poll/cancel APIs to wait for the CHOPEN response.
Since hypervisor does not respond CHOPEN to a revoked channel.

MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8636
2016-11-28 07:56:03 +00:00
Sepherosa Ziehau
2ee4e46fe6 hyperv/vmbus: Add exec cancel support for message Hypercall API.
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8635
2016-11-28 07:44:50 +00:00
Sepherosa Ziehau
2fb45c54a1 hyperv/vmbus: Add result polling support for message Hypercall API.
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8634
2016-11-28 07:36:51 +00:00
Sepherosa Ziehau
1b34e69534 hyperv/vmbus: Add result polling support for xact API.
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8633
2016-11-28 07:27:08 +00:00
Sepherosa Ziehau
6555f01eec hyperv/vmbus: Stringent GPADL parameter assertion.
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8632
2016-11-28 07:04:32 +00:00
Sepherosa Ziehau
fa643a5d0a hyperv/vmbus: Make sure that the allocated GPADL is not zero.
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8631
2016-11-28 06:53:00 +00:00
Sepherosa Ziehau
a54152eaa5 hyperv/vmbus: Add supportive transaction wait function.
This function supports channel revocation properly.

MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8611
2016-11-28 05:07:48 +00:00
Sepherosa Ziehau
faaba341e5 hyperv/vmbus: Zero out GPADL if error happens.
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8601
2016-11-28 04:53:36 +00:00
Sepherosa Ziehau
2641e75742 hyperv/vmbus: Add a simplified version of channel close.
So that the caller can know the channel close error and react accordingly.

MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8600
2016-11-25 09:13:10 +00:00
Sepherosa Ziehau
892b35bc36 hyperv/vmbus: Propagate close error.
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8599
2016-11-25 08:57:52 +00:00
Sepherosa Ziehau
0743140a63 hyperv/vmbus: Always try disconnect/free bufring memory upon channel close
While I'm here, minor wording and style changes.

MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8598
2016-11-25 08:31:13 +00:00
Sepherosa Ziehau
f2aeeaff6f hyperv/vmbus: Don't free the bufring if its GPADL can't be disconnected.
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8570
2016-11-25 07:41:42 +00:00
Sepherosa Ziehau
32ab625a61 hyperv/vmbus: Return EISCONN if the bufring GPADL can't be disconnected.
So that the callers of vmbus_chan_open_br() could handle the passed in
bufring memory properly.

MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8569
2016-11-25 07:24:11 +00:00
Sepherosa Ziehau
12fbfdab73 hyperv/vmbus: No stranded bufring GPADL is allowed.
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8568
2016-11-25 07:03:45 +00:00
Sepherosa Ziehau
11629f1cd6 hyperv/vmbus: GPADL disconnect error on a revoked channel is benign.
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8567
2016-11-25 06:48:53 +00:00
Sepherosa Ziehau
fd2b520f2f hyperv/vmbus: Don't close unopened channels.
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8566
2016-11-25 06:12:18 +00:00
Sepherosa Ziehau
ba9238f89f hyperv/vmbus: Fix sysctl tree leakage, if channel open fails.
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8565
2016-11-25 06:01:45 +00:00
Sepherosa Ziehau
6d147fc18b hyperv/vmbus: Minor style changes.
MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8564
2016-11-25 05:46:15 +00:00
Sepherosa Ziehau
eb812ea9ab hyperv/vmbus: Commit the GPADL id only after the connection succeeds.
Minor style change.

MFC after:	1 week
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D8563
2016-11-25 05:35:29 +00:00
Dexuan Cui
cdb316ee87 hyperv/vmbus,pcib: unbreak build in case NEW_PCIB is undefined
vmbus_pcib requires NEW_PCIB, but in case that's not defined, we at
least shouldn't break build.

Reviewed by:	sephe
Approved by:	sephe (mentor)
MFC after:	3 days
Sponsored by:	Microsoft
2016-11-25 04:35:40 +00:00