This only supports the legacy virtqueue format that is now called
"Split Virtqueues". Support for the new "Packed Virtqueues" described
in v1.1 is left for a later date.
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27857
Use the existing legacy PCI driver as the basis for shared code
between the legacy and modern PCI drivers. The existing virtio_pci
kernel module will contain both the legacy and modern drivers.
Changes to the virtqueue and each device driver (network, block, etc)
for V1 support come in later commits.
Update the MMIO driver to reflect the VirtIO bus method changes, but
the modern compliance can be improved on later.
Note that the modern PCI driver requires bus_map_resource() to be
implemented, which is not the case on all archs.
The hw.virtio.pci.transitional tunable default value is zero so
transitional devices will continue to be driven via the legacy
driver.
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27856
And subsequent fix 576b099a.
By adding the mergable header to the vtnet_rx_header structure, the size
was increased by 2 bytes, breaking the alignment of this structure as
described the in preceding comments.
Furthermore, the mergable header does not belong the structure. With the
mergable feature, the header is placed in line with the data, so there is
no need for a separate segment, and misleading to follow the mergable
header with any padding.
The V1 header is effectively identical to mergable header, and the driver
has long supported the mergable feature. Revert this so the later changes
that add V1 support can show how V1 is derived from the existing mergable
buffers support, and to facilitate a later MFC.
Reviewed by: grehan (mentor)
Differential Revision: https://reviews.freebsd.org/D27855
Move prison_hold, prison_hold_locked ,prison_proc_hold, and
prison_proc_free to a more intuitive part of the file (together with
with prison_free and prison_free_locked), and add or improve comments
to these and others, to better describe what's going in the prison
reference cycle.
No functional changes.
Use a machdep.nirq tunable intead of compile-time constant NIRQ
as a value for maximum number of interrupts. It allows keep a system
footprint small by default with an option to increase the limit
for large systems like server-grade ARM64
Reviewd by: mhorne
Differential Revision: https://reviews.freebsd.org/D27844
Submitted by: Klara, Inc.
Sponsored by: Ampere Computing
Rather than depending on malloc() returning 16-byte aligned chunks,
allocate some extra pad bytes and ensure that key schedules are
appropriately aligned.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC (Netgate)
Differential Revision: https://reviews.freebsd.org/D28157
The context record contains key material precomputed by the driver at
session creation time. Rather than storing various components of the
context record in each session, go a bit further and store the full
context record image so that safexcel_process() can simply copy the
image into each request submitted to the hardware. This simplifies the
data path and eliminates a bunch of unnecessary conditional logic that
was getting executed for each request.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC (Netgate)
Rather than preallocating a set of requests and moving them between
queues during state transitions, maintain a shadow of the command
descriptor ring to track the driver context of each request. This is
simpler and requires less synchronization between safexcel_process() and
the ring interrupt handler.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC (Netgate)
Rather than returning a hard error in this case, return ERESTART so that
upper layers get a chance to retry the request (or drop it, depending on
the desired policy).
This case is hard to hit due to the somewhat low bound on queued
requests, but that will no longer be true after an upcoming change.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC (Netgate)
This gives better performance in some tests than the previous policy of
statically binding each session to a ring.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC (Netgate)
Use the number of items scanned to control the duration of the shrink
loop. Otherwise, if a consumer like TTM is not able to free the number
of items requested for some reason, the shrinker keeps looping forever.
Reviewed by: manu
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28224
Similar to the message printed by aesni(4), let the user know if the
driver is unsupported by their CPU.
PR: 252543
Reported by: gbe
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
prison_isvalid() checks if a prison record can be used at all, i.e.
pr_ref > 0. This filters out prisons that aren't fully created, and
those that are either in the process of being dismantled, or will be
at the next opportunity. While the check for pr_ref > 0 is simple
enough to make without a convenience function, this prepares the way
for other measures of prison validity.
prison_isalive() checks not only validity as far as the useablity of
the prison structure, but also whether the prison is visible to user
space. It replaces a test for pr_uref > 0, which is currently only
used within kern_jail.c, and not often there.
Both of these functions also assert that either the prison mutex or
allprison_lock is held, since it's generally the case that unlocked
prisons aren't guaranteed to remain useable for any length of time.
This isn't entirely true, for example a thread can assume its own
prison is good, but most exceptions will exist inside of kern_jail.c.
Based on discussions on freebsd-arch@, enable KERN_TLS in
GENERIC on amd64, but leave it disabled via the
sysctl kern.ipc.tls.enable. Users wishing to enable
ktls must set kern.ipc.tls.enable=1
While here, fix wording in NOTES to mention that KERN_TLS
also does receive now.
Sponsored by: Netflix
Reviewed by: allanjude
Differential Revision: https://reviews.freebsd.org/D28163
The original comment suggests an optimization, which was proven wrong.
Reported by: nc
Reviewed by: kp, nc
Approved by: kp (mentor)
Differential Revision: https://reviews.freebsd.org/D23727
ng_tag(4) operate on arbitrary data of mbuf_tags(9). Those structures
are padded to the next multiple of the alignment by the compiler.
Hence a valid argument has be at most as long as the data received.
PR: 241462
Reviewed by: kp
Approved by: kp (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22140
it gets unused.
Currently if_purgeifaddrs() uses in6_purgeaddr() to remove IPv6
ifaddrs. in6_purgeaddr() does not trrigger prefix removal if
number of linked ifas goes to 0, as this is a low-level function.
As a result, if_purgeifaddrs() purges all IPv4/IPv6 addresses but
keeps corresponding IPv6 prefixes.
Fix this by creating higher-level wrapper which handles unused
prefix usecase and use it in if_purgeifaddrs().
Differential revision: https://reviews.freebsd.org/D28128
This stops busdma bounce making assumptions about alignment of malloc(9)
results, which are no longer true.
Also add assert that the result of malloc_aligned() fits into single
page, which is the assumption of the code.
Reported by: dim
Reviewed by: andrew, jah, markj
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28147
Change the power-of-two malloc zones to require alignment equal to the
size [*]. Current uma allocator already provides such alignment, so in
fact this change does not change anything except providing future-proof
setup.
Suggested by: markj [*]
Reviewed by: andrew, jah, markj
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28147
Summary:
Linux's irq_work queue was created for asynchronous execution of code from contexts where spin_lock's are not available like "hardware interrupt context". FreeBSD's fast taskqueues was created for the same purposes.
Drm-kmod 5.4 uses irq_work_queue() at least in one place to schedule execution of task/work from the critical section that triggers following INVARIANTS-induced panic:
```
panic: acquiring blockable sleep lock with spinlock or critical section held (sleep mutex) linuxkpi_short_wq @ /usr/src/sys/kern/subr_taskqueue.c:281
cpuid = 6
time = 1605048416
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe006b538c90
vpanic() at vpanic+0x182/frame 0xfffffe006b538ce0
panic() at panic+0x43/frame 0xfffffe006b538d40
witness_checkorder() at witness_checkorder+0xf3e/frame 0xfffffe006b538f00
__mtx_lock_flags() at __mtx_lock_flags+0x94/frame 0xfffffe006b538f50
taskqueue_enqueue() at taskqueue_enqueue+0x42/frame 0xfffffe006b538f70
linux_queue_work_on() at linux_queue_work_on+0xe9/frame 0xfffffe006b538fb0
irq_work_queue() at irq_work_queue+0x21/frame 0xfffffe006b538fd0
semaphore_notify() at semaphore_notify+0xb2/frame 0xfffffe006b539020
__i915_sw_fence_notify() at __i915_sw_fence_notify+0x2e/frame 0xfffffe006b539050
__i915_sw_fence_complete() at __i915_sw_fence_complete+0x63/frame 0xfffffe006b539080
i915_sw_fence_complete() at i915_sw_fence_complete+0x8e/frame 0xfffffe006b5390c0
dma_i915_sw_fence_wake() at dma_i915_sw_fence_wake+0x4f/frame 0xfffffe006b539100
dma_fence_signal_locked() at dma_fence_signal_locked+0x105/frame 0xfffffe006b539180
dma_fence_signal() at dma_fence_signal+0x72/frame 0xfffffe006b5391c0
dma_fence_is_signaled() at dma_fence_is_signaled+0x80/frame 0xfffffe006b539200
dma_resv_add_shared_fence() at dma_resv_add_shared_fence+0xb3/frame 0xfffffe006b539270
i915_vma_move_to_active() at i915_vma_move_to_active+0x18a/frame 0xfffffe006b5392b0
eb_move_to_gpu() at eb_move_to_gpu+0x3ad/frame 0xfffffe006b539320
eb_submit() at eb_submit+0x15/frame 0xfffffe006b539350
i915_gem_do_execbuffer() at i915_gem_do_execbuffer+0x7d4/frame 0xfffffe006b539570
i915_gem_execbuffer2_ioctl() at i915_gem_execbuffer2_ioctl+0x1c1/frame 0xfffffe006b539600
drm_ioctl_kernel() at drm_ioctl_kernel+0xd9/frame 0xfffffe006b539670
drm_ioctl() at drm_ioctl+0x5cd/frame 0xfffffe006b539820
linux_file_ioctl() at linux_file_ioctl+0x323/frame 0xfffffe006b539880
kern_ioctl() at kern_ioctl+0x1f4/frame 0xfffffe006b5398f0
sys_ioctl() at sys_ioctl+0x12a/frame 0xfffffe006b5399c0
amd64_syscall() at amd64_syscall+0x121/frame 0xfffffe006b539af0
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe006b539af0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800a6f09a, rsp = 0x7fffffffe588, rbp = 0x7fffffffe640 ---
KDB: enter: panic
```
Here, the dma_resv_add_shared_fence() performs a critical_enter() and following call of schedule_work() from semaphore_notify() triggers 'acquiring blockable sleep lock with spinlock or critical section held' panic.
Switching irq_work implementation to fast taskqueue fixes the panic for me.
Other report with the similar bug: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=247166
Reviewed By: hselasky
Differential Revision: https://reviews.freebsd.org/D27171
rtinit[1]() is a function used to add or remove interface address prefix routes,
similar to ifa_maintain_loopback_route().
It was intended to be family-agnostic. There is a problem with this approach
in reality.
1) IPv6 code does not use it for the ifa routes. There is a separate layer,
nd6_prelist_(), providing interface for maintaining interface routes. Its part,
responsible for the actual route table interaction, mimics rtenty() code.
2) rtinit tries to combine multiple actions in the same function: constructing
proper route attributes and handling iterations over multiple fibs, for the
non-zero net.add_addr_allfibs use case. It notably increases the code complexity.
3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag
for p2p connections, host routes and p2p routes are handled in the same way.
Additionally, mapping IFA flags to RTF flags makes the interface pretty messy.
It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface
aliases.
4) rtinit() is the last customer passing non-masked prefixes to rib_action(),
complicating rib_action() implementation.
5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive"
ifa messages in certain corner cases.
To address all these points, the following has been done:
* rtinit() has been split into multiple functions:
- Route attribute construction were moved to the per-address-family functions,
dealing with (2), (3) and (4).
- funnction providing net.add_addr_allfibs handling and route rtsock notificaions
is the new routing table inteface.
- rtsock ifa notificaion has been moved out as well. resulting set of funcion are only
responsible for the actual route notifications.
Side effects:
* /32 alias does not result in interface routes (/32 route and "host" route)
* RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses
Differential revision: https://reviews.freebsd.org/D28186
AMD 10GbE hardware is designed to have two buffers per receive descriptor to
support split header feature. For this purpose, the driver was designed to use
2 iflib freelists per receive queue. So, that buffers from 2 freelists are used
to refill an entry in the receive descriptor. The current design holds good
with regular data traffic.
But, when netmap comes into play, the current design will not fit in. The
current netmap interfaces and netmap implementation in iflib doesn't seem
to accomodate the design of 2 freelists per receive queue. So, exercising
Netmap capability with inbuilt tools like bridge, pkt-gen doesn't work with
the 2 freelists driver design.
So, the driver design is changed to accomodate the current netmap interfaces
and netmap implementation in iflib by using single freelist per receive queue
approach when Netmap capability is exercised without disturbing the current
2 freelists approach.
The dev.ax.sph_enable tunable can be set to 0 to configure the single
free list mode.
Thanks to Stephan Dewt for his Initial set of code changes for the stated
problem.
Submitted by: rajesh1.kumar_amd.com
Approved by: vmaffione
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D27797
disk failure.
Each vnode has an embedded lock that controls access to its contents.
However vnodes describing a UFS snapshot all share a single snapshot
lock to coordinate their access and update. As part of mounting a
UFS filesystem with snapshots, each of the vnodes describing a
snapshot has its individual lock replaced with the snapshot lock.
When the filesystem is unmounted the vnode's original lock is
returned replacing the snapshot lock.
When a disk fails while the UFS filesystem it contains is still
mounted (for example when a thumb drive is removed) UFS forcibly
unmounts the filesystem. The loss of the drive causes the GEOM
subsystem to orphan the provider, but the consumer remains until
the filesystem has finished with the unmount. Information describing
the snapshot locks was being prematurely cleared during the orphaning
causing the return of the snapshot vnode's original locks to fail.
The fix is to not clear the needed information prematurely.
Sponsored by: Netflix
with snapshots.
Each vnode has an embedded lock that controls access to its contents.
However vnodes describing a UFS snapshot all share a single snapshot
lock to coordinate their access and update. As part of mounting a
UFS filesystem with snapshots, each of the vnodes describing a
snapshot has its individual lock replaced with the snapshot lock.
When the filesystem is unmounted the vnode's original lock is
returned replacing the snapshot lock.
The lock order reversal happens because vnode locks must be acquired
before snapshot locks. When unmounting we must lock both the snapshot
lock and the vnode lock before swapping them so that the vnode will
be continuously locked during the swap. For each vnode representing
a snapshot, we must first acquire the snapshot lock to ensure
exclusive access to it and its original lock. We then face a lock
order reversal when we try to acquire the original vnode lock. The
problem is eliminated by doing a non-blocking exclusive lock on the
original lock which will always succeed since there are no users
of that lock.
Sponsored by: Netflix
This allows us to use it when we only need to check if the virtual address
is valid. For example when checking if an address in the DMAP region is
mapped.
Reviewed by: kib, markj
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D27621
The old vendor tree was never fully merged and doing partial merge isn't
supported with git subtree merge so a new one was created.
Switch the build to use the new DTS from sys/contrib/device-tree
This also bump the DTS used to be in sync with Linux 5.9
While here change the way to get the linux version, simply hardcode
the value in sys/dts/freebsd-compatible.dts and use awk to get that
to put it in the CFLAGS.
As a bonus we now have the bindings docs available
in sys/contrib/device-tree/Bindings/ so no need to link to the Linux repo
or to the vendor tree.
It changed the #pinctrl-cells value to be equal to 2 and the macro
that generates the values.
Based on the bindings docs a value of 2 is only acceptable if the node
used pinctrl-single,bits and not pinctrl-single,pins
This allow booting further on the beaglebone black with 5.9 DTS
A more complete fix for this function is being worked on in D28054. Fix
the uninitialized variable error so that builds can at least proceed.
Reported by: several