In preparation for moving sockbuf locks into the containing socket,
provide alternative macros for the sockbuf I/O locks:
SOCK_IO_SEND_(UN)LOCK() and SOCK_IO_RECV_(UN)LOCK(). These operate on a
socket rather than a socket buffer. Note that these locks are used only
to prevent concurrent readers and writters from interleaving I/O.
When locking for I/O, return an error if the socket is a listening
socket. Currently the check is racy since the sockbuf sx locks are
destroyed during the transition to a listening socket, but that will no
longer be true after some follow-up changes.
Modify a few places to check for errors from
sblock()/SOCK_IO_(SEND|RECV)_LOCK() where they were not before. In
particular, add checks to sendfile() and sorflush().
Reviewed by: tuexen, gallatin
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31657
Add vDSO support for timekeeping devices that support the KVM/XEN
paravirtual clock API.
Also, expose, in the userspace-accessible '<machine/pvclock.h>',
definitions that will be needed by 'libc' to support
'VDSO_TH_ALGO_X86_PVCLK'.
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31418
Now that the upper layers all go through a layer to tie into these
information functions that translates an sbuf into char * and len. The
current interface suffers issues of what to do in cases of truncation,
etc. Instead, migrate all these functions to using struct sbuf and these
issues go away. The caller is also in charge of any memory allocation
and/or expansion that's needed during this process.
Create a bus_generic_child_{pnpinfo,location} and make it default. It
just returns success. This is for those busses that have no information
for these items. Migrate the now-empty routines to using this as
appropriate.
Document these new interfaces with man pages, and oversight from before.
Reviewed by: jhb, bcr
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29937
Some code was using it already, but in many places we were testing
SO_ACCEPTCONN directly. As a small step towards fixing some bugs
involving synchronization with listen(2), make the kernel consistently
use SOLISTENING(). No functional change intended.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
The vmbus ISR needs to live in a trampoline. Dynamically allocating a
trampoline at driver initialization time poses some difficulties due to
the fact that the KENTER macro assumes that the offset relative to
tramp_idleptd is fixed at static link time. Another problem is that
native_lapic_ipi_alloc() uses setidt(), which assumes a fixed trampoline
offset.
Rather than fight this, move the Hyper-V ISR to i386/exception.s. Add a
new HYPERV kernel option to make this optional, and configure it by
default on i386. This is sufficient to make use of vmbus(4) after the
4/4 split. Note that vmbus cannot be loaded dynamically and both the
HYPERV option and device must be configured together. I think this is
not too onerous a requirement, since vmbus(4) was previously
non-functional.
Reported by: Harry Schmalzbauer <freebsd@omnilan.de>
Tested by: Harry Schmalzbauer <freebsd@omnilan.de>
Reviewed by: whu, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D30577
Normally raw interrupt handler is provided by the kernel text. But
vmbus module registers its own handler that needs to be mapped into
userspace mapping on PTI kernels.
Reported and reviewed by: whu
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D30310
I saw a situation where the driver set CAM_AUTOSNS_VALID on a failed ccb
even though SRB_STATUS_AUTOSENSE_VALID was not set in the status.
The actual sense data remained all zeros.
The problem seems to be that create_storvsc_request() always sets
hv_storvsc_request::sense_info_len, so checking for sense_info_len != 0
is not enough to determine if any auto-sense data is actually available.
Reviewed by: whu, imp
MFC after: 2 weeks
Sponsored by: CyberSecure
Differential Revision: https://reviews.freebsd.org/D30124
Several protocol methods take a sockaddr as input. In some cases the
sockaddr lengths were not being validated, or were validated after some
out-of-bounds accesses could occur. Add requisite checking to various
protocol entry points, and convert some existing checks to assertions
where appropriate.
Reported by: syzkaller+KASAN
Reviewed by: tuexen, melifaro
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29519
That fixes disabled keyboard input after Xorg server has been stopped.
Reviewed by: whu
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D28171
The RSC support feature introduced a bit field "rm_internal" in
struct rndis_pktinfo with total size unchanged.
The guest does not use this field in the tx path. However we need to
initialize it to zero in case older hosts which are not aware of this
field.
Fixes: a491581f ("Hyper-V: hn: Enable vSwitch RSC support")
MFC after: 2 weeks
Sponsored by: Microsoft
Receive Segment Coalescing (RSC) in the vSwitch is a feature available in
Windows Server 2019 hosts and later. It reduces the per packet processing
overhead by coalescing multiple TCP segments when possible. This happens
mostly when TCP traffics are among different guests on same host.
This patch adds netvsc driver support for this feature.
The patch also updates NVS version to 6.1 as needed for RSC
enablement.
MFC after: 2 weeks
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D29075
When rx packet contains hash value sent from host, store it in
the mbuf's flowid field so when the same mbuf is on the tx path,
the hash value can be used by the host to determine the outgoing
network queue.
MFC after: 2 weeks
Sponsored by: Microsoft
Do not assume that VBE framebuffer metadata can be used. Like with the
EFI fb metadata, it may be null, so we should take care not to
dereference the null vbefb pointer. This avoids a panic when booting
-CURRENT on a gen1 VM in Azure.
Approved by: tsoome
Sponsored by: Miles AS
Differential Revision: https://reviews.freebsd.org/D27533
Implement vt_vbefb to support Vesa Bios Extensions (VBE) framebuffer with VT.
vt_vbefb is built based on vt_efifb and is assuming similar data for
initialization, use MODINFOMD_VBE_FB to identify the structure vbe_fb
in kernel metadata.
struct vbe_fb, is populated by boot loader, and is passed to kernel via
metadata payload.
Differential Revision: https://reviews.freebsd.org/D27373
The try lock loop in HN_LOCK put the thread spinning on cpu if the lock
is not available. It is possible to cause deadlock if the thread holding
the lock is sleeping. Relinquish the cpu to work around this problem even
it doesn't completely solve the issue. The priority inversion could cause
the livelock no matter how less likely it could happen. A more complete
solution may be needed in the future.
Reported by: Microsoft, Netapp
MFC after: 2 weeks
Sponsored by: Microsoft
It is possible that the vmbus pcib channel is revoked during attach path.
The attach path could be waiting for response from host and this response will never
arrive since the channel has already been revoked from host point of view. Check
this situation during wait complete and return failed if this happens.
Reported by: Netapp
MFC after: 2 weeks
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D26486
In hv_storvsc_io_request() when coring, prevent changing of the send channel
from the base channel to another one. storvsc_poll always probes on the base
channel.
Based upon conversations with Microsoft, changed the handling of srb_status
codes. Most we should never get, others yes. All are treated as retry-able
except for two. We should not get these statuses, but if we ever do, the I/O
state is not known.
Submitted by: Alexander Sideropoulos <Alexander.Sideropoulos@netapp.com>
Reviewed by: trasz, allanjude, whu
MFC after: 1 week
Sponsored by: Netapp Inc
Differential Revision: https://reviews.freebsd.org/D25756
On Gen2 VMs, Hyper-V provides mmio space for framebuffer.
This mmio address range is not useable for other PCI devices.
Currently only efifb driver is using this range without reserving
it from system.
Therefore, vmbus driver reserves it before any other PCI device
drivers start to request mmio addresses.
PR: 222996
Submitted by: weh@microsoft.com
Reported by: dmitry_kuleshov@ukr.net
Reviewed by: decui@microsoft.com
Sponsored by: Microsoft
This change adds Hyper-V socket feature in FreeBSD. New socket address
family AF_HYPERV and its kernel support are added.
Submitted by: Wei Hu <weh@microsoft.com>
Reviewed by: Dexuan Cui <decui@microsoft.com>
Relnotes: yes
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D24061
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
switch over to opt-in instead of opt-out for epoch.
Instead of IFF_NEEDSEPOCH, provide IFF_KNOWSEPOCH. If driver marks
itself with IFF_KNOWSEPOCH, then ether_input() would not enter epoch
when processing its packets.
Now this will create recursive entrance in epoch in >90% network
drivers, but will guarantee safeness of the transition.
Mark several tested drivers as IFF_KNOWSEPOCH.
Reviewed by: hselasky, jeff, bz, gallatin
Differential Revision: https://reviews.freebsd.org/D23674
supposedly may call into ether_input() without network epoch.
They all need to be reviewed before 13.0-RELEASE. Some may need
be fixed. The flag is not planned to be used in the kernel for
a long time.
This change is based on Linux commit 40630f462824ee. csio.resid should
account for transfer_len only for success and SRB_STATUS_DATA_OVERRUN
condition.
I am not sure how exactly this change works, but I have a report from a
user that they see lots of checksum errors when running a pool scrub
concurrently with iozone -l 1 -s 100G. After applying this patch the
problem cannot be reproduced.
Reviewed by: nobody
Sponsored by: CyberSecure
Differential Revision: https://reviews.freebsd.org/D22312
r356087 made it rather innocuous to double-register built-in keyboard
drivers; we now set a flag to indicate that it's been registered and only
act once on a registration anyways. There is no misleading here, as the
follow-up kbd_delete_driver will actually remove the driver as needed now
that the linker set isn't also consulted after kbdinit.
Keyboard drivers are generally registered via linker set. In these cases,
they're also available as kmods which use KPI for registering/unregistering
keyboard drivers outside of the linker set.
For built-in modules, we still fire off MOD_LOAD and maybe even MOD_UNLOAD
if an error occurs, leading to registration via linker set and at MOD_LOAD
time.
This is a minor optimization at best, but it keeps the internal kbd driver
tidy as a future change will merge the linker set driver list into its
internal keyboard_drivers list via SYSINIT and simplify driver lookup by
removing the need to consult the linker set.
Most keyboard drivers are using the genkbd implementations as it is;
formally use them for any that aren't set and make
genkbd_get_fkeystr/genkbd_diag private.
A future change will provide default implementations for some of these where
it makes sense and most of them are already using the genkbd
implementation (e.g. get_fkeystr, diag).
These invocations were directly calling enkbd_diag(), rather than
indirection back through kbdd_diag/kbdsw. While they're functionally
equivent, invoking kbdd_diag where feasible (i.e. not in a diag
implementation) makes it easier to visually identify locking needs in these
other drivers.
A SIM-private field is used for that.
The pointer can be useful when examining a state of a queued ccb.
E.g., a ccb on a da_softc.pending_ccbs.
MFC after: 2 weeks
Add VMBus protocol version 4.0. and 5.0 to support Windows 10 and newer HyperV hosts.
For VMBus 4.0 and newer HyperV, the netvsc gpadl teardown must be done after vmbus close.
Submitted by: whu
MFC after: 2 weeks
Sponsored by: Microsoft
error in the function hypercall_memfree(), where the wrong arena was being
passed to kmem_free().
Introduce a per-page flag, VPO_KMEM_EXEC, to mark physical pages that are
mapped in kmem with execute permissions. Use this flag to determine which
arena the kmem virtual addresses are returned to.
Eliminate UMA_SLAB_KRWX. The introduction of VPO_KMEM_EXEC makes it
redundant.
Update the nearby comment for UMA_SLAB_KERNEL.
Reviewed by: kib, markj
Discussed with: jeff
Approved by: re (marius)
Differential Revision: https://reviews.freebsd.org/D16845
became unused in FreeBSD 12.x as a side-effect of the NUMA-related
changes.)
Reviewed by: kib, markj
Discussed with: jeff, re@
Differential Revision: https://reviews.freebsd.org/D16825
Summary:
Base gcc fails to compile `sys/dev/hyperv/pcib/vmbus_pcib.c` for i386,
with the following -Werror warnings:
cc1: warnings being treated as errors
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c: In function 'new_pcichild_device':
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:567: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c: In function 'vmbus_pcib_on_channel_callback':
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:940: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c: In function 'hv_pci_protocol_negotiation':
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:1012: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c: In function 'hv_pci_enter_d0':
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:1073: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c: In function 'hv_send_resources_allocated':
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:1125: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c: In function 'vmbus_pcib_map_msi':
/usr/src/sys/dev/hyperv/pcib/vmbus_pcib.c:1730: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
This is because on i386, several casts from `uint64_t` to a pointer
reduce the value from 64 bit to 32 bit.
For gcc, this can be fixed by an intermediate cast to uintptr_t. Note
that I am assuming the incoming values will always fit into 32 bit!
Differential Revision: https://reviews.freebsd.org/D15753
MFC after: 3 days