Commit Graph

807 Commits

Author SHA1 Message Date
Warner Losh
ddfc9c4c59 newbus: Move from bus_child_{pnpinfo,location}_src to bus_child_{pnpinfo,location} with sbuf
Now that the upper layers all go through a layer to tie into these
information functions that translates an sbuf into char * and len. The
current interface suffers issues of what to do in cases of truncation,
etc. Instead, migrate all these functions to using struct sbuf and these
issues go away. The caller is also in charge of any memory allocation
and/or expansion that's needed during this process.

Create a bus_generic_child_{pnpinfo,location} and make it default. It
just returns success. This is for those busses that have no information
for these items. Migrate the now-empty routines to using this as
appropriate.

Document these new interfaces with man pages, and oversight from before.

Reviewed by:		jhb, bcr
Sponsored by:		Netflix
Differential Revision:	https://reviews.freebsd.org/D29937
2021-06-22 20:52:06 -06:00
Mark Johnston
f4bb1869dd Consistently use the SOLISTENING() macro
Some code was using it already, but in many places we were testing
SO_ACCEPTCONN directly.  As a small step towards fixing some bugs
involving synchronization with listen(2), make the kernel consistently
use SOLISTENING().  No functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2021-06-14 17:32:27 -04:00
Warner Losh
b0e54e61b3 Change "compiled" to "assembled"
Assembly files are assembled, not compiled.

Submitted by:	github user gAlfonso-bit
Pull Request:	https://github.com/freebsd/freebsd-src/pull/474

Sponsored by:		Netflix
2021-06-11 13:58:51 -06:00
Mark Johnston
97993d1ebf hyperv: Fix vmbus after the i386 4/4 split
The vmbus ISR needs to live in a trampoline.  Dynamically allocating a
trampoline at driver initialization time poses some difficulties due to
the fact that the KENTER macro assumes that the offset relative to
tramp_idleptd is fixed at static link time.  Another problem is that
native_lapic_ipi_alloc() uses setidt(), which assumes a fixed trampoline
offset.

Rather than fight this, move the Hyper-V ISR to i386/exception.s.  Add a
new HYPERV kernel option to make this optional, and configure it by
default on i386.  This is sufficient to make use of vmbus(4) after the
4/4 split.  Note that vmbus cannot be loaded dynamically and both the
HYPERV option and device must be configured together.  I think this is
not too onerous a requirement, since vmbus(4) was previously
non-functional.

Reported by:	Harry Schmalzbauer <freebsd@omnilan.de>
Tested by:	Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by:	whu, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D30577
2021-06-08 09:40:30 -04:00
Konstantin Belousov
fe7d7ac408 hyperv: register intr handler as usermode-mapped if loaded as module
Normally raw interrupt handler is provided by the kernel text.  But
vmbus module registers its own handler that needs to be mapped into
userspace mapping on PTI kernels.

Reported and reviewed by:	whu
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D30310
2021-06-05 18:03:18 +03:00
Andriy Gapon
8afecefd57 storvsc: fix auto-sense reporting
I saw a situation where the driver set CAM_AUTOSNS_VALID on a failed ccb
even though SRB_STATUS_AUTOSENSE_VALID was not set in the status.
The actual sense data remained all zeros.
The problem seems to be that create_storvsc_request() always sets
hv_storvsc_request::sense_info_len, so checking for sense_info_len != 0
is not enough to determine if any auto-sense data is actually available.

Reviewed by:	whu, imp
MFC after:	2 weeks
Sponsored by:	CyberSecure
Differential Revision:	https://reviews.freebsd.org/D30124
2021-05-07 10:17:57 +03:00
Mark Johnston
f161d294b9 Add missing sockaddr length and family validation to various protocols
Several protocol methods take a sockaddr as input.  In some cases the
sockaddr lengths were not being validated, or were validated after some
out-of-bounds accesses could occur.  Add requisite checking to various
protocol entry points, and convert some existing checks to assertions
where appropriate.

Reported by:	syzkaller+KASAN
Reviewed by:	tuexen, melifaro
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D29519
2021-05-03 13:35:19 -04:00
Vladimir Kondratyev
55eb41bb1f hv_kbd: Fix build with EVDEV_SUPPORT kernel option disabled.
Reported by:	olivier
MFC with:	e4643aa4c4
2021-04-23 01:13:25 +03:00
Vladimir Kondratyev
774cbf9b64 hv_kbd: Fix leaked $FreeBSD$ expansion
MFC with:	c2a159286c
2021-04-12 02:16:22 +03:00
Vladimir Kondratyev
e4643aa4c4 hv_kbd: Add support for K_XLATE and K_CODE modes for gen 2 VMs
That fixes disabled keyboard input after Xorg server has been stopped.

Reviewed by:	whu
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D28171
2021-04-12 02:14:12 +03:00
Vladimir Kondratyev
c2a159286c hv_kbd: Add evdev protocol support for gen 2 VMs
Reviewed by:	whu
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D28170
2021-04-12 02:14:12 +03:00
Konstantin Belousov
aa3ea612be x86: remove gcov kernel support
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D29529
2021-04-02 15:41:51 +03:00
Wei Hu
805dbff6c3 Hyper-V: hn: Initialize the internal field of per packet info on tx path
The RSC support feature introduced a bit field "rm_internal" in
struct rndis_pktinfo with total size unchanged.

The guest does not use this field in the tx path. However we need to
initialize it to zero in case older hosts which are not aware of this
field.

Fixes:		a491581f ("Hyper-V: hn: Enable vSwitch RSC support")
MFC after:	2 weeks
Sponsored by:	Microsoft
2021-03-15 10:33:29 +00:00
Wei Hu
a491581f3f Hyper-V: hn: Enable vSwitch RSC support in hn netvsc driver
Receive Segment Coalescing (RSC) in the vSwitch is a feature available in
Windows Server 2019 hosts and later. It reduces the per packet processing
overhead by coalescing multiple TCP segments when possible. This happens
mostly when TCP traffics are among different guests on same host.
This patch adds netvsc driver support for this feature.

The patch also updates NVS version to 6.1 as needed for RSC
enablement.

MFC after:	2 weeks
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D29075
2021-03-12 04:35:16 +00:00
Wei Hu
80f39bd95f Hyper-V: hn: Store host hash value in flowid
When rx packet contains hash value sent from host, store it in
the mbuf's flowid field so when the same mbuf is on the tx path,
the hash value can be used by the host to determine the outgoing
network queue.

MFC after:	2 weeks
Sponsored by:	Microsoft
2021-03-05 04:46:54 +00:00
Bradley T. Hughes
c05b5848f1 hyperv/vmbus: avoid crash, panic if vbe fb info is missing
Do not assume that VBE framebuffer metadata can be used. Like with the
EFI fb metadata, it may be null, so we should take care not to
dereference the null vbefb pointer. This avoids a panic when booting
-CURRENT on a gen1 VM in Azure.

Approved by:	tsoome
Sponsored by:	Miles AS
Differential Revision:	https://reviews.freebsd.org/D27533
2020-12-10 13:11:52 +00:00
Toomas Soome
cb79418266 fix vmbus_fb_mmio_res after r368168
mixed efifb versus vbefb struct use did slip in by mistake.
2020-11-30 08:31:41 +00:00
Toomas Soome
a4a10b37d4 Add VT driver for VBE framebuffer device
Implement vt_vbefb to support Vesa Bios Extensions (VBE) framebuffer with VT.
vt_vbefb is built based on vt_efifb and is assuming similar data for
initialization, use MODINFOMD_VBE_FB to identify the structure vbe_fb
in kernel metadata.

struct vbe_fb, is populated by boot loader, and is passed to kernel via
metadata payload.

Differential Revision:	https://reviews.freebsd.org/D27373
2020-11-30 08:22:40 +00:00
Wei Hu
b3460f4452 Hyper-V: hn: Relinquish cpu in HN_LOCK to avoid deadlock
The try lock loop in HN_LOCK put the thread spinning on cpu if the lock
is not available. It is possible to cause deadlock if the thread holding
the lock is sleeping. Relinquish the cpu to work around this problem even
it doesn't completely solve the issue. The priority inversion could cause
the livelock no matter how less likely it could happen. A more complete
solution may be needed in the future.

Reported by:	Microsoft, Netapp
MFC after:	2 weeks
Sponsored by:	Microsoft
2020-10-15 11:44:28 +00:00
Wei Hu
75c2786c25 Hyper-V: pcib: Check revoke status during device attach
It is possible that the vmbus pcib channel is revoked during attach path.
The attach path could be waiting for response from host and this response will never
arrive since the channel has already been revoked from host point of view. Check
this situation during wait complete and return failed if this happens.

Reported by:	Netapp
MFC after:	2 weeks
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D26486
2020-10-15 05:57:20 +00:00
Wei Hu
2a0ce39d08 Hyper-V: storvsc: Enhance srb_status code handling.
In hv_storvsc_io_request() when coring, prevent changing of the send channel
from the base channel to another one. storvsc_poll always probes on the base
channel.

Based upon conversations with Microsoft, changed the handling of srb_status
codes. Most we should never get, others yes. All are treated as retry-able
except for two. We should not get these statuses, but if we ever do, the I/O
state is not known.

Submitted by:	Alexander Sideropoulos <Alexander.Sideropoulos@netapp.com>
Reviewed by:	trasz, allanjude, whu
MFC after:	1 week
Sponsored by:	Netapp Inc
Differential Revision:	https://reviews.freebsd.org/D25756
2020-08-31 09:05:45 +00:00
Wei Hu
c565776195 Prevent framebuffer mmio space from being allocated to other devices on HyperV.
On Gen2 VMs, Hyper-V provides mmio space for framebuffer.
This mmio address range is not useable for other PCI devices.
Currently only efifb driver is using this range without reserving
it from system.
Therefore, vmbus driver reserves it before any other PCI device
drivers start to request mmio addresses.

PR:		222996
Submitted by:	weh@microsoft.com
Reported by:	dmitry_kuleshov@ukr.net
Reviewed by:	decui@microsoft.com
Sponsored by:	Microsoft
2020-07-30 07:26:11 +00:00
Wei Hu
c97c20ace7 Socket AF_HYPERV should return failure when it is not running on HyperV
Reported by:	pho
Sponsored by:	Microsoft
2020-05-22 09:17:07 +00:00
Li-Wen Hsu
db7ec3c3e6 Fix i386 build for r361275
kponsored by:	The FreeBSD Foundation
2020-05-20 13:51:27 +00:00
Wei Hu
a560f3ebd7 HyperV socket implementation for FreeBSD
This change adds Hyper-V socket feature in FreeBSD. New socket address
family AF_HYPERV and its kernel support are added.

Submitted by:	Wei Hu <weh@microsoft.com>
Reviewed by:	Dexuan Cui <decui@microsoft.com>
Relnotes:	yes
Sponsored by:	Microsoft
Differential Revision:	https://reviews.freebsd.org/D24061
2020-05-20 11:03:59 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Gleb Smirnoff
e87c494015 Although most of the NIC drivers are epoch ready, due to peer pressure
switch over to opt-in instead of opt-out for epoch.

Instead of IFF_NEEDSEPOCH, provide IFF_KNOWSEPOCH. If driver marks
itself with IFF_KNOWSEPOCH, then ether_input() would not enter epoch
when processing its packets.

Now this will create recursive entrance in epoch in >90% network
drivers, but will guarantee safeness of the transition.

Mark several tested drivers as IFF_KNOWSEPOCH.

Reviewed by:		hselasky, jeff, bz, gallatin
Differential Revision:	https://reviews.freebsd.org/D23674
2020-02-24 21:07:30 +00:00
Konstantin Belousov
4e012582d3 hyperv: Add Hygon Dhyana support.
Submitted by:	Pu Wen <puwen@hygon.cn>
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23563
2020-02-13 19:12:07 +00:00
Gleb Smirnoff
0921628ddc Introduce flag IFF_NEEDSEPOCH that marks Ethernet interfaces that
supposedly may call into ether_input() without network epoch.

They all need to be reviewed before 13.0-RELEASE.  Some may need
be fixed.  The flag is not planned to be used in the kernel for
a long time.
2020-01-23 01:41:09 +00:00
Andriy Gapon
efdba95d62 storvsc: port a Linux patch, properly set residual data length on errors
This change is based on Linux commit 40630f462824ee.  csio.resid should
account for transfer_len only for success and SRB_STATUS_DATA_OVERRUN
condition.

I am not sure how exactly this change works, but I have a report from a
user that they see lots of checksum errors when running a pool scrub
concurrently with iozone -l 1 -s 100G.  After applying this patch the
problem cannot be reproduced.

Reviewed by:	nobody
Sponsored by:	CyberSecure
Differential Revision: https://reviews.freebsd.org/D22312
2020-01-14 13:20:16 +00:00
Hans Petter Selasky
a3b413af0a Fix spelling.
PR:		242891
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2019-12-30 09:22:52 +00:00
Kyle Evans
2899979df9 Revert r355806: kbd drivers: don't double register keyboard drivers
r356087 made it rather innocuous to double-register built-in keyboard
drivers; we now set a flag to indicate that it's been registered and only
act once on a registration anyways. There is no misleading here, as the
follow-up kbd_delete_driver will actually remove the driver as needed now
that the linker set isn't also consulted after kbdinit.
2019-12-26 17:09:36 +00:00
Kyle Evans
5a7a578be9 kbd drivers: don't double register keyboard drivers
Keyboard drivers are generally registered via linker set. In these cases,
they're also available as kmods which use KPI for registering/unregistering
keyboard drivers outside of the linker set.

For built-in modules, we still fire off MOD_LOAD and maybe even MOD_UNLOAD
if an error occurs, leading to registration via linker set and at MOD_LOAD
time.

This is a minor optimization at best, but it keeps the internal kbd driver
tidy as a future change will merge the linker set driver list into its
internal keyboard_drivers list via SYSINIT and simplify driver lookup by
removing the need to consult the linker set.
2019-12-16 16:41:24 +00:00
Kyle Evans
eefc662f0b kbd: provide default implementations of get_fkeystr/diag
Most keyboard drivers are using the genkbd implementations as it is;
formally use them for any that aren't set and make
genkbd_get_fkeystr/genkbd_diag private.
2019-12-16 02:44:56 +00:00
Kyle Evans
1c5c067aad keyboard switch definitions: standardize on c99 initializers
A future change will provide default implementations for some of these where
it makes sense and most of them are already using the genkbd
implementation (e.g. get_fkeystr, diag).
2019-12-16 02:05:44 +00:00
Kyle Evans
4434de9643 kbd drivers: use kbdd_* indirection for diag invocation
These invocations were directly calling enkbd_diag(), rather than
indirection back through kbdd_diag/kbdsw. While they're functionally
equivent, invoking kbdd_diag where feasible (i.e. not in a diag
implementation) makes it easier to visually identify locking needs in these
other drivers.
2019-12-16 01:37:03 +00:00
Andriy Gapon
97d8f008af hyperv/storvsc: stash a pointer to hv_storvsc_request in ccb
A SIM-private field is used for that.
The pointer can be useful when examining a state of a queued ccb.
E.g., a ccb on a da_softc.pending_ccbs.

MFC after:	2 weeks
2019-11-19 07:20:59 +00:00
Wei Hu
23a499203c hyperv/vmbus: Fix the wrong size in ndis_offload structure
Submitted by:	whu
MFC after:	2 weeks
Sponsored by:	Microsoft
2019-07-09 08:21:14 +00:00
Wei Hu
ace5ce7e70 hyperv/vmbus: Update VMBus version 4.0 and 5.0 support.
Add VMBus protocol version 4.0. and 5.0 to support Windows 10 and newer HyperV hosts.

For VMBus 4.0 and newer HyperV, the netvsc gpadl teardown must be done after vmbus close.

Submitted by:	whu
MFC after:	2 weeks
Sponsored by:	Microsoft
2019-07-09 07:24:18 +00:00
Takanori Watanabe
5efca36fbd Distinguish _CID match and _HID match and make lower priority probe
when _CID match.

Reviewed by: jhb, imp
Differential Revision:https://reviews.freebsd.org/D16468
2018-10-26 00:05:46 +00:00
Wei Hu
f0319886ca Do not trop UDP traffic when TXCSUM_IPV6 flag is on
PR:		231797
Submitted by:	whu
Reviewed by:	dexuan
Obtained from:	Kevin Morse
MFC after:	3 days
Sponsored by:	Microsoft
Differential Revision:	https://bugs.freebsd.org/bugzilla/attachment.cgi?id=198333&action=diff
2018-10-22 11:23:51 +00:00
Alan Cox
49bfa624ac Eliminate the arena parameter to kmem_free(). Implicitly this corrects an
error in the function hypercall_memfree(), where the wrong arena was being
passed to kmem_free().

Introduce a per-page flag, VPO_KMEM_EXEC, to mark physical pages that are
mapped in kmem with execute permissions.  Use this flag to determine which
arena the kmem virtual addresses are returned to.

Eliminate UMA_SLAB_KRWX.  The introduction of VPO_KMEM_EXEC makes it
redundant.

Update the nearby comment for UMA_SLAB_KERNEL.

Reviewed by:	kib, markj
Discussed with:	jeff
Approved by:	re (marius)
Differential Revision:	https://reviews.freebsd.org/D16845
2018-08-25 19:38:08 +00:00
Alan Cox
83a90bffd8 Eliminate kmem_malloc()'s unused arena parameter. (The arena parameter
became unused in FreeBSD 12.x as a side-effect of the NUMA-related
changes.)

Reviewed by:	kib, markj
Discussed with:	jeff, re@
Differential Revision:	https://reviews.freebsd.org/D16825
2018-08-21 16:43:46 +00:00
Dimitry Andric
aaf1312351 Fix build of hyperv with base gcc on i386
Summary:
Base gcc fails to compile `sys/dev/hyperv/pcib/vmbus_pcib.c` for i386,
with the following -Werror warnings:

cc1: warnings being treated as errors
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c: In function 'new_pcichild_device':
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:567: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c: In function 'vmbus_pcib_on_channel_callback':
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:940: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c: In function 'hv_pci_protocol_negotiation':
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:1012: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c: In function 'hv_pci_enter_d0':
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:1073: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c: In function 'hv_send_resources_allocated':
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:1125: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c: In function 'vmbus_pcib_map_msi':
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:1730: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]

This is because on i386, several casts from `uint64_t` to a pointer
reduce the value from 64 bit to 32 bit.

For gcc, this can be fixed by an intermediate cast to uintptr_t. Note
that I am assuming the incoming values will always fit into 32 bit!

Differential Revision: https://reviews.freebsd.org/D15753
MFC after:	3 days
2018-08-04 14:57:23 +00:00
Konstantin Belousov
b3a7db3b06 Use SMAP on amd64.
Ifuncs selectors dispatch copyin(9) family to the suitable variant, to
set rflags.AC around userspace access.  Rflags.AC bit is cleared in
all kernel entry points unconditionally even on machines not
supporting SMAP.

Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D13838
2018-07-29 20:47:00 +00:00
Dexuan Cui
d76fb49fd8 hyperv/hn: Fix panic in hypervisor code upon device detach event
Submitted by:	hselasky
Reviewed by:	dexuan
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D16139
2018-07-17 21:05:08 +00:00
Dexuan Cui
96f105d11f hyperv: Fix boot-up after malloc() returns memory of NX by default now
FreeBSD VM can't boot up on Hyper-V after the recent malloc change in
r335068: Make UMA and malloc(9) return non-executable memory in most cases.

The hypercall page here must be executable.
Fix the boot-up issue by adding M_EXEC.

PR:		229167
Sponsored by:	Microsoft
2018-07-07 00:41:04 +00:00
Eric van Gyzen
7898a1f4c4 if_hn: fix use of uninitialized variable
omcast was used without being initialized in the non-multicast case.
The only effect was that the interface's multicast output counter could be
incorrect.

Reported by:	Coverity
CID:		1379662
MFC after:	3 days
Sponsored by:	Dell EMC
2018-05-26 14:14:56 +00:00
Matt Macy
d7c5a620e2 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
Konstantin Belousov
d86c1f0dc1 i386 4/4G split.
The change makes the user and kernel address spaces on i386
independent, giving each almost the full 4G of usable virtual addresses
except for one PDE at top used for trampoline and per-CPU trampoline
stacks, and system structures that must be always mapped, namely IDT,
GDT, common TSS and LDT, and process-private TSS and LDT if allocated.

By using 1:1 mapping for the kernel text and data, it appeared
possible to eliminate assembler part of the locore.S which bootstraps
initial page table and KPTmap.  The code is rewritten in C and moved
into the pmap_cold(). The comment in vmparam.h explains the KVA
layout.

There is no PCID mechanism available in protected mode, so each
kernel/user switch forth and back completely flushes the TLB, except
for the trampoline PTD region. The TLB invalidations for userspace
becomes trivial, because IPI handlers switch page tables. On the other
hand, context switches no longer need to reload %cr3.

copyout(9) was rewritten to use vm_fault_quick_hold().  An issue for
new copyout(9) is compatibility with wiring user buffers around sysctl
handlers. This explains two kind of locks for copyout ptes and
accounting of the vslock() calls.  The vm_fault_quick_hold() AKA slow
path, is only tried after the 'fast path' failed, which temporary
changes mapping to the userspace and copies the data to/from small
per-cpu buffer in the trampoline.  If a page fault occurs during the
copy, it is short-circuit by exception.s to not even reach C code.

The change was motivated by the need to implement the Meltdown
mitigation, but instead of KPTI the full split is done.  The i386
architecture already shows the sizing problems, in particular, it is
impossible to link clang and lld with debugging.  I expect that the
issues due to the virtual address space limits would only exaggerate
and the split gives more liveness to the platform.

Tested by: pho
Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D14633
2018-04-13 20:30:49 +00:00