The bus is independent of the device, so all devices can be attached to
either a PCI bus or an MMIO bus. For example, QEMU's virtio-rng-device
gives the MMIO variant of virtio-rng-pci, and is now detected.
Reviewed by: andrew, br, brooks (mentor)
Approved by: andrew, br, brooks (mentor)
Differential Revision: https://reviews.freebsd.org/D24730
The non-legacy virtio MMIO specification drops the use of PFNs and
replaces them with physical addresses. Whilst many implementations are
so-called transitional devices, also implementing the legacy
specification, TinyEMU[1] does not. Device-specific configuration
registers have also changed to being little-endian, and must be accessed
using a single aligned access for registers up to 32 bits, and two
32-bit aligned accesses for 64-bit registers.
[1] https://bellard.org/tinyemu/
Reviewed by: br, brooks (mentor)
Approved by: br, brooks (mentor)
Differential Revision: https://reviews.freebsd.org/D24681
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
switch over to opt-in instead of opt-out for epoch.
Instead of IFF_NEEDSEPOCH, provide IFF_KNOWSEPOCH. If driver marks
itself with IFF_KNOWSEPOCH, then ether_input() would not enter epoch
when processing its packets.
Now this will create recursive entrance in epoch in >90% network
drivers, but will guarantee safeness of the transition.
Mark several tested drivers as IFF_KNOWSEPOCH.
Reviewed by: hselasky, jeff, bz, gallatin
Differential Revision: https://reviews.freebsd.org/D23674
When we register an interrupt handler we need to pass the intr_type along in
bus_setup_intr().
The interrupt type matters because it is used to decide if we need to enter
NET_EPOCH. That meant that vtmmio-based if_vtnet did not, which led to panics
with INVARIANTS set.
Sponsored by: Axiado
BIO_READ and BIO_WRITE, we've handled this expanded syntax poorly in
drivers when the driver doesn't support a particular command. Do a
sweep and fix that.
Reported by: imp
In legacy VirtIO drivers, the header must be PCI endianness (little) and the
device-specific region is encoded in the native endian of the guest.
This patch makes the access (read/write) to VirtIO header using the little
endian order. Other read and write access are native endianness. This also
sets the device's IO region as big endian if on big endian machine.
PR: 205178
Submitted by: Andre Silva <afscoelho@gmail.com>
Reported by: Kenneth Salerno <kennethsalerno@yahoo.com>
Reviewed by: bryanv, bdragon, luporl, alfredo
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D23401
This bus does not really have a concept of the initiator ID, so use
a guaranteed dummy one that won't conflict with any real target.
This change fixes a problem with virtio_scsi on GCE where disks get
sequential target IDs starting from one. If there are seven or more
disks, then a disk with the target ID of seven would not be discovered
by FreeBSD as that ID was reserved as the initiator ID -- see
scsi_scan_bus().
Discussed with: bryanv
MFC after: 2 weeks
Sponsored by: Panzura
The patch could be simplier, using only the second chunk to
vtnet_rxq_eof(), that passes full mbufs to pfil(9). Packet
filter would m_free() them in case of returning PFIL_DROPPED.
However, we pretend to be a hardware driver, so we first try
to pass a memory buffer via PFIL_MEMPTR feature. This is mostly
done for debugging purposes, so that one can experiment in bhyve
with packet filters utilizing same features as a true driver.
Don't wait until the vtnet_debugnet_init() call happens, because at that
point we might already have allocated something from
vtnet_tx_header_zone.
Some systems showed this panic:
vtnet0: link state changed to UP
panic: keg vtnet_tx_hdr initialization after use.
cpuid = 5
time = 1578427700
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe004db427f0
vpanic() at vpanic+0x17e/frame 0xfffffe004db42850
panic() at panic+0x43/frame 0xfffffe004db428b0
uma_zone_reserve() at uma_zone_reserve+0xf6/frame 0xfffffe004db428f0
vtnet_debugnet_init() at vtnet_debugnet_init+0x77/frame 0xfffffe004db42930
debugnet_any_ifnet_update() at debugnet_any_ifnet_update+0x42/frame 0xfffffe004db42980
do_link_state_change() at do_link_state_change+0x1b3/frame 0xfffffe004db429d0
taskqueue_run_locked() at taskqueue_run_locked+0x178/frame 0xfffffe004db42a30
taskqueue_run() at taskqueue_run+0x4d/frame 0xfffffe004db42a50
ithread_loop() at ithread_loop+0x1d6/frame 0xfffffe004db42ab0
fork_exit() at fork_exit+0x80/frame 0xfffffe004db42af0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe004db42af0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 12 tid 100011 ]
Stopped at kdb_enter+0x37: movq $0,0x1084eb6(%rip)
db>
Reviewed by: cem, markj
Differential Revision: https://reviews.freebsd.org/D23073
This is a 32-bit structure embedded in each vm_page, consisting mostly
of page queue state. The use of a structure makes it easy to store a
snapshot of a page's queue state in a stack variable and use cmpset
loops to update that state without requiring the page lock.
This change merely adds the structure and updates references to atomic
state fields. No functional change intended.
Reviewed by: alc, jeff, kib
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D22650
This patch is part of an effort to make bhyve networking (in particular TCP)
faster. The key strategy to enhance TCP throughput is to let the whole packet
datapath work with TSO/LRO packets (up to 64KB each), so that the per-packet
overhead is amortized over a large number of bytes.
This capability is supported in the guest by means of the vtnet(4) driver,
which is able to handle TSO/LRO packets leveraging the virtio-net header
(see struct virtio_net_hdr and struct virtio_net_hdr_mrg_rxbuf).
A bhyve VM exchanges packets with the host through a network backend,
which can be vale(4) or if_tap(4).
While vale(4) supports TSO/LRO packets, if_tap(4) does not.
This patch extends if_tap(4) with the ability to understand the virtio-net
header, so that a tapX interface can process TSO/LRO packets.
A couple of ioctl commands have been added to configure and probe the
virtio-net header. Once the virtio-net header is set, the tapX interface
acquires all the IFCAP capabilities necessary for TSO/LRO.
Reviewed by: kevans
Differential Revision: https://reviews.freebsd.org/D21263
Debugnet is a simplistic and specialized panic- or debug-time reliable
datagram transport. It can drive a single connection at a time and is
currently unidirectional (debug/panic machine transmit to remote server
only).
It is mostly a verbatim code lift from netdump(4). Netdump(4) remains
the only consumer (until the rest of this patch series lands).
The INET-specific logic has been extracted somewhat more thoroughly than
previously in netdump(4), into debugnet_inet.c. UDP-layer logic and up, as
much as possible as is protocol-independent, remains in debugnet.c. The
separation is not perfect and future improvement is welcome. Supporting
INET6 is a long-term goal.
Much of the diff is "gratuitous" renaming from 'netdump_' or 'nd_' to
'debugnet_' or 'dn_' -- sorry. I thought keeping the netdump name on the
generic module would be more confusing than the refactoring.
The only functional change here is the mbuf allocation / tracking. Instead
of initiating solely on netdump-configured interface(s) at dumpon(8)
configuration time, we watch for any debugnet-enabled NIC for link
activation and query it for mbuf parameters at that time. If they exceed
the existing high-water mark allocation, we re-allocate and track the new
high-water mark. Otherwise, we leave the pre-panic mbuf allocation alone.
In a future patch in this series, this will allow initiating netdump from
panic ddb(4) without pre-panic configuration.
No other functional change intended.
Reviewed by: markj (earlier version)
Some discussion with: emaste, jhb
Objection from: marius
Differential Revision: https://reviews.freebsd.org/D21421
- Remove a dead variable from the amd64 pmap_extract_and_hold().
- Fix grammar in the vm_page_wire man page.
Reported by: alc
Reviewed by: alc, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21639
We want to allocate a contiguous memory block anywhere in memory, but
expressed this as having to be between 0 and 0xffffffff. This limits us
on 64-bit machines, and outright breaks on machines where memory is
mapped above that address range.
Allow the full address range to be used for this allocation.
Sponsored by: Axiado
Until r349278, bhyve presented a seg_max to the guest that was too large.
Detect this case and clamp it to the virtqueue size. Otherwise, we would
fail the "too many segments to enqueue" assertion in virtqueue_enqueue().
I hit this by running a guest with a MAXPHYS of 256 KB.
Reviewed by: bryanv cem
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20703
Register MODULE_PNP_INFO for virtio devices using the newbus PNP information
provided by the previous commit. Matching can be quite simple; existing
probe routines only matched on bus (implicit) and device_type. The same
matching criteria are retained exactly, but is now also available to
devmatch(8).
Reviewed by: bryanv, markj; imp (earlier version)
Differential Revision: https://reviews.freebsd.org/D20407
Expose the same fields and widths from both vtio buses, even though they
don't quite line up; several virtio drivers can attach to both buses,
and sharing a PNP info table for both seems more convenient.
In practice, I doubt any virtio driver really needs to match on anything
other than bus and device_type (eliminating the unused entries for
vtmmio), and also in practice device_type is << 2^16 (so far, values
range from 1 to 20). So it might be fine to only expose a 16-bit
device_type for PNP purposes. On the other hand, I don't see much harm
in overkill here.
Reviewed by: bryanv, markj (earlier version)
Differential Revision: https://reviews.freebsd.org/D20406
random(4) masks unregistered entropy sources. Prior to this revision,
virtio_random(4) did not correctly register a random_source and did not
function as a source of entropy.
Random source registration for loadable pure sources requires registering a
poll callback, which is invoked periodically by random(4)'s harvestq
kthread. The periodic poll makes virtio_random(4)'s periodic entropy
collection redundant, so this revision removes the callout.
The current random source API is somewhat limiting, so simply fail to attach
any virtio_random devices if one is already registered as a source. This
scenario is expected to be uncommon.
While here, handle the possibility of short reads from the hypervisor random
device gracefully / correctly. It is not clear why a hypervisor would
return a short read or if it is allowed by spec, but we may as well handle
it.
Reviewed by: bryanv (earlier version), markm
Security: yes (note: many other "pure" random sources remain broken)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20419
I introduced an obvious compiler error in r346282, so this change fixes
that.
Unfortunately, RANDOM_LOADABLE isn't covered by our existing tinderbox, and
it seems like there were existing latent linking problems. I believe these
were introduced on accident in r338324 during reduction of the boolean
expression(s) adjacent to randomdev.c and hash.c. It seems the
RANDOM_LOADABLE build breakage has gone unnoticed for nine months.
This change correctly annotates randomdev.c and hash.c with !random_loadable
to match the pre-r338324 logic; and additionally updates the HWRNG drivers
in MD 'files.*', which depend on random_device symbols, with
!random_loadable (it is invalid for the kernel to depend on symbols from a
module).
(The expression for both randomdev.c and hash.c was the same, prior to
r338324: "optional random random_yarrow | random !random_yarrow
!random_loadable". I.e., "random && (yarrow || !loadable)." When Yarrow
was removed ("yarrow := False"), the expression was incorrectly reduced to
"optional random" when it should have retained "random && !loadable".)
Additionally, I discovered that virtio_random was missing a MODULE_DEPEND on
random_device, which breaks kld load/link of the driver on RANDOM_LOADABLE
kernels. Address that issue as well.
PR: 238223
Reported by: Eir Nym <eirnym AT gmail.com>
Reviewed by: delphij, markm
Approved by: secteam(delphij)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20466
Prior to this revision, vtpci's BUS_READ_IVAR method on VIRTIO_IVAR_SUBVENDOR
accidentally returned the PCI subdevice.
The typo seems to have been introduced with the original commit adding
VIRTIO_IVAR_{{SUB,}DEVICE,{SUB,}VENDOR} to virtio_pci. The commit log and code
strongly suggest that the ivar was intended to return the subvendor rather than
the subdevice; it was likely just a copy/paste mistake.
Go ahead and rectify that.
Checksum offloading for SCTP is not currently specified for virtio.
If the hypervisor announces checksum offloading support, it means TCP
and UDP checksum offload. If an SCTP packet is sent and the host announced
checksum offload support, the hypervisor inserts the IP checksum (16-bit)
at the correct offset, but this is not the right checksum, which is a CRC32c.
This results in all outgoing packets having the wrong checksum and therefore
breaking SCTP based communications.
This patch removes SCTP checksum offloading support from the virtio
network interface.
Thanks to Felix Weinrank for making me aware of the issue.
Reviewed by: bryanv@
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20147
Because of a typo, the code was mistakenly resetting the
vtnrx_vq pointer rather than vtntx_tq.
Reviewed by: bryanv
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D19015
netmap(4) support for vtnet(4) was incomplete and had multiple bugs.
This commit fixes those bugs to bring netmap on vtnet in a functional state.
Changelist:
- handle errors returned by virtqueue_enqueue() properly (they were
previously ignored)
- make sure netmap XOR rest of the kernel access each virtqueue.
- compute the number of netmap slots for TX and RX separately, according to
whether indirect descriptors are used or not for a given virtqueue.
- make sure sglist are freed according to their type (mbufs or netmap
buffers)
- add support for mulitiqueue and netmap host (aka sw) rings.
- intercept VQ interrupts directly instead of intercepting them in txq_eof
and rxq_eof. This simplifies the code and makes it easier to make sure
taskqueues are not running for a VQ while it is in netmap mode.
- implement vntet_netmap_config() to cope with changes in the number of queues.
Reviewed by: bryanv
Approved by: gnn (mentor)
MFC after: 3 days
Sponsored by: Sunny Valley Networks
Differential Revision: https://reviews.freebsd.org/D17916
This allows the memory mapped I/O virtio driver to attach when we boot
with ACPI tables, for example in some cases with QEMU emulating arm64.
MFC after: 1 month
given in random(4).
This includes updating of the relevant man pages, and no-longer-used
harvesting parameters.
Ensure that the pseudo-unit-test still does something useful, now also
with the "other" algorithm instead of Yarrow.
PR: 230870
Reviewed by: cem
Approved by: so(delphij,gtetlow)
Approved by: re(marius)
Differential Revision: https://reviews.freebsd.org/D16898
original initialization, so we don't miss few registers to
configure.
This fixes vtnet(4) operation with QEMU's virtio-net-device.
Tested in QEMU with FreeBSD/RISC-V.
Reviewed by: bryanv
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15821
Uses of mallocarray(9).
The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.
Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.
Reported by: wosch
PR: 225197
Focus on code where we are doing multiplications within malloc(9). None of
these is likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.
This initial sweep only covers malloc(9) calls with M_NOWAIT. No good
reason but I started doing the changes before r327796 and at that time it
was convenient to make sure the sorrounding code could handle NULL values.
Since we have no control over the name, the MAKEDEV_CHECKNAME flag must be
used to return an error on an invalid (to devfs) name instead of panicing.
r305900 that originally added this feature also introduced a few other bugs:
- Proper locking not performed
- Theoretically broke the expectation that the control event buffer would
not span more than one pages, but did not update the CTASSERT that was
in place to prevent this. However, since the struct virtio_console_control
and the bulk buffer together were quite small, this could not have happened.
Also workaround an QEMU VirtIO spec violation in that it includes the NUL
terminator in the buffer length when the spec says it is not included.
PR: 223531
MFC after: 1 week
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
Currently in Virtio driver without TSO/GSO features enabled, the max scatter
gather segments for the TX path can be 4, which limits the support for 9K JUMBO
frames. 9K JUMBO frames results in more than 4 scatter gather segments and
virtio driver fails to send the frame down to host OS. With TSO/GSO feature
enabled max scatter gather segments can be 64, then 9K JUMBO frames are fine,
this is making virtio driver to support JUMBO frames only with TSO/GSO.
Increasing the VTNET_MIN_TX_SEGS which is the case for non TSO/GSO to 32 to
support upto 64K JUMBO frames to Host.
Submitted by: Lohith Bellad <lohithbsd@gmail.com>
Reviewed by: adrian
Differential Revision: https://reviews.freebsd.org/D8803
The sim_vid, hba_vid, and dev_name fields of struct ccb_pathinq are
fixed-length strings. AFAICT the only place they're read is in
sbin/camcontrol/camcontrol.c, which assumes they'll be null-terminated.
However, the kernel doesn't null-terminate them. A bunch of copy-pasted code
uses strncpy to write them, and doesn't guarantee null-termination. For at
least 4 drivers (mpr, mps, ciss, and hyperv), the hba_vid field actually
overflows. You can see the result by doing "camcontrol negotiate da0 -v".
This change null-terminates those fields everywhere they're set in the
kernel. It also shortens a few strings to ensure they'll fit within the
16-character field.
PR: 215474
Reported by: Coverity
CID: 1009997 1010000 1010001 1010002 1010003 1010004 1010005
CID: 1331519 1010006 1215097 1010007 1288967 1010008 1306000
CID: 1211924 1010009 1010010 1010011 1010012 1010013 1010014
CID: 1147190 1010017 1010016 1010018 1216435 1010020 1010021
CID: 1010022 1009666 1018185 1010023 1010025 1010026 1010027
CID: 1010028 1010029 1010030 1010031 1010033 1018186 1018187
CID: 1010035 1010036 1010042 1010041 1010040 1010039
Reviewed by: imp, sephe, slm
MFC after: 4 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D9037
Differential Revision: https://reviews.freebsd.org/D9038