This is compatible with Linux, and some driver error paths depend on it.
Reviewed by: bz, emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32337
The flag should be accessible from non-current threads.
Reviewed by: markj
Tested by: trasz
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32252
We actually do not know is it safe or not to flush cache for random
BAR/register page existing in the system. It is well-known that for
instance LAPICs cannot tolerate cache flush. As report indicates,
there are more such devices.
This issue typically affects AMD machines which do not report self-snoop,
causing real CLFLUSH invocation on the mapped pages. Intels do self-snoop,
so this change should be nop for them, and unsafe devices, if any, are
already ignored.
Reported and tested by: manu
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32318
Similar to pmap_page_set_memattr() by setting MD page cache attribute
to the argument. Unlike pmap_page_set_memattr(), does not flush cache
for the direct mapping of the page.
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32318
The default value for LAPIC registers page physical address
is usually right. Having this value available early makes
pmap_force_invalidate_cache_range(), used on non-self-snoop machines,
avoid flushing LAPIC range for early calls.
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32318
At least for SAS that we only support now disks are typically
connected to the same bus as the enclosure. Limiting the search
scope makes it much faster on systems with multiple buses and
thousands of disks.
Reviewed by: imp
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D32305
Remove *_MATCH_NONE enums, making no sense and so never used. Make
*_MATCH_ANY enums 0 (no any match flags set), previously used by
*_MATCH_NONE. Bump CAM_VERSION to 0x1a reflecting those changes and
add compat shims.
When traversing through buses and devices do not descend if we can
already see that requested pattern does not match the bus or device.
It allows to save significant amount of time on system with thousands
of disks when doing limited searches.
Reviewed by: imp
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D32304
This function is actively used by sbuf_vprintf(), so this simple
inlining in half reduces time of kern.geom.confxml generation.
MFC after: 2 weeks
Sponsored by: iXsystem, Inc.
In some contexts it is illegal to wait for memory allocation, causing
kernel panic. By default sbuf_new passes M_WAITOK to malloc,
which caused crashes when sdhci_dumpcaps or sdhci_dumpregs was callend in
non sutiable context.
Add SBUF_NOWAIT flag to sbuf_new to fix this.
Obtained from: Semihalf
Differential revision: https://reviews.freebsd.org/D32075
- Ensure that the computed MFT record size isn't negative or larger than
maxphys before trying to read $Volume.
- Guard against truncated records in volume metadata.
- Ensure that the record length is large enough to contain the volume
name.
- Verify that the (UTF-16-encoded) volume name's length is a multiple of
two.
PR: 258833, 258914
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
This is the same underlying problem as 2624598064, just for bus ranges
rather than windows. SiFive's HiFive Unmatched has the following
topology:
Root Port <---> Bridge <---> Bridge <-+-> Bridge <---> (Unused)
(pcib0) (pcib1) (pcib2) | (pcib3)
+-> Bridge <---> xHCI
| (pcib4)
+-> Bridge <---> M.2 E-key
| (pcib5)
+-> Bridge <---> M.2 M-key
| (pcib6)
+-> Bridge <---> x16 slot
(pcib7)
If a device is plugged into the x16 slot that itself has a bridge, such
as many graphics cards, we currently fail to allocate a bus number for
its child bus (and so pcib_attach_child skips adding a child bus for
further enumeration) as, when the new child bridge attaches, it attempts
to allocate a bus number from its parent (pcib7) which in turn attempts
to grow its own bus range by calling bus_adjust_resource on its own
parent (pcib2) whose bus rman cannot accommodate the request and needs
to itself be extended by calling its own parent (pcib1). Note that
pcib3-7 do not face the same issue when they attach since pcib1 ends up
managing bus numbers 1-255 from the beginning and so never needs to grow
its own range.
Reviewed by: jhb, mav
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D32011
No in-tree drivers are supported for RISC-V (given it supports UEFI we
could enable the EFI framebuffer, but U-Boot has very limited hardware
support and EDK2 remains a work in progress), but drm-kmod exists with
drivers for video cards that can be used with the HiFive Unmatched.
Reviewed by: imp, jhb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D32001
One of the three uses is already guarded; this guards the remaining ones
to support architectures like riscv that do not provide write-combining,
and is needed to build drm-kmod on riscv.
Reviewed by: hselasky, manu
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31999
pmap_change_attr is required by drm-kmod so we need the function to
exist. Since the Svpbmt extension is on the horizon we will likely end
up with a real implementation of it, so this stub implementation does
all the necessary page table walking to validate the input, ensuring
that no new errors are returned once it's implemented fully (other than
due to out of memory conditions when demoting L2 entries) and providing
a skeleton for that future implementation.
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31996
These began to become obsolete in d6d64f0f2c (r137739) and the deal
was later sealed in 003e18aef4 (r137801) when vfs.fifofs.fops was
dropped and vop-bypass for pipes became mandatory.
PR: 225934
Suggested by: markj
Reviewe by: kib, markj
Differential Revision: https://reviews.freebsd.org/D32270
Callout c_time is always bigger or equal than the scheduled time. It
is also smaller than sbinuptime() and can't change while the callback
is running. So we reliably can use it instead of sbinuptime() here.
In case there was a race and the callout was rescheduled to the later
time, the callback will be called again.
According to profiles it saves ~5% of the timer interrupt time even
with fast TSC timecounter.
MFC after: 1 month
The fsckpid mount option was introduced in 927a12ae16 along with a
couple sysctl's to support SU+J with snapshots. However, those sysctl's
were never used and eventually removed in f2620e9ceb.
There are no in-tree consumers of this mount option.
Reviewed by: mckusick, kib
Differential Revision: https://reviews.freebsd.org/D32015
For a pNFS server configuration, an NFSv4.2 Deallocate operation
is proxied to the DS(s). The code that parsed the reply for the
proxy RPC is broken and did not process the pre-operation attributes.
This patch fixes this problem.
This bug would only affect pNFS servers built from recent main/FreeBSD14
sources.
Before this change kern.sched.interact sysctl setting above 32 gave
all interactive threads identical priority of PRI_MIN_INTERACT due to
((PRI_MAX_INTERACT - PRI_MIN_INTERACT + 1) / sched_interact) turning
zero. Setting the sysctl lower reduced the range of used priority
levels up to half, that is not great either.
Change of the operations order should fix the issue, always using full
range of priorities, while overflow is impossible there since both
score and priority values are small. While there, make the variables
unsigned as they really are.
MFC after: 1 month
We only use nvme_completion_poll in the initialization path. The
commands they queue and wait for finish quickly as they involve no I/O
to the drive's media. These command take about 20-200 microsecnds
each. Set the wait time to 1us and then increase it by 1.5 each
successive iteration (max 1ms). This reduces initialization time by
80ms in cpervica's tests.
Use this same technique waiting for RDY state transitions. This saves
another 20ms. In total we're down from ~330ms to ~2ms.
Tested by: cperciva
Sponsored by: Netflix
Reviewed by: mav
Differential Review: https://reviews.freebsd.org/D32259
A core segment is bounded in size only by memory size. On 64-bit
architectures this means a segment can be much larger than 4GB.
However, compress_chunk() takes only a u_int, clamping segment size to
4GB-1, resulting in a truncated core. Everything else, including the
compressor internally, uses size_t, so use size_t at the boundary here.
This dates back to the original refactor back in 2015 (r279801 /
aa14e9b7).
MFC after: 1 week
Sponsored by: Juniper Networks, Inc.
NOTE_ABSTIME may also have a zero timeout, which indicates that we
should still fire immediately as an absolute time in the past. A test
has been added for this one as well.
Fixes: 9c999a259f ("kqueue: don't arbitrarily restrict long-past...")
Point hat: kevans
Reported by: syzbot+1c8d1154f560b3930042@syzkaller.appspotmail.com
That fixes memory leak on last GELI provider destroyed, introduced
in 2dbc9a388e. This patch was originally developed late 2019 and
the flag was necessary to prevent zone drainage under memory pressure.
Today, with f09cbea31a the UMA is fixed not to drain into reserves.
Discussed with: jtl, markj
Fixes: 2dbc9a388e
PR: 258787
The FreeBSD nvme driver has reset the nvme controller twice on attach to
address a theoretical issue assuring the hardware is in a known
state. However, exierence has shown the second reset is unnecessary and
increases the time to boot. Eliminate the second reset. Should there be
a situation when you need a second reset (for buggy or at least somewhat
out of the mainstream hardware), the hardware option NVME_2X_RESET will
restore the old behavior. Document this in nvme(4).
If there's any trouble at all with this, I'll add a sysctl tunable to
control it.
Sponsored by: Netflix
Reviewed by: cperciva, mav
Differential Revision: https://reviews.freebsd.org/D32241
After some study of the code and the standard, I think we can just drop
the pause(), unconditionally. If we're not initialized, then there's
nothing to wait for from a software perspective. If we are initialized,
then there might be outstanding I/O. If so, then the qpair 'recovery
state' will transition to WAITING in nvme_ctrlr_disable_qpairs, which
will ignore any interrupts for items that complete before we complete
the reset by setting cc.en=0.
If we go on to fail the controller, we'll cancel the outstanding I/O
transactions. If we reset the controller, the hardware throws away
pending transactions and we retry all the pending I/O transactions. Any
transactions that happend to complete before cc.en=0 will have the same
effect in the end (doing the same transaction twice is just inefficient,
it won't affect the state of the device any differently than having done
it once).
The standard imposes no wait times here, so it isn't needed from that
perspective.
Unanswered Question: Do we may need to disable interrupts while we
disable in legacy mode since those are level-sensitive.
Sponsored by: Netflix
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D32248
The don't touch the mmio of the drive after we do a EN 1->0 transition
is only for a tiny number of dirves that have this unforunate issue.
Sponsored by: Netflix
Rewrite the nested if's using the preferred FreeBSD style for branches
of ifs that return. NFC. Minor tweaks to the comments to better fit new
code layout.
Sponsored by: Netflix
Reviewed by: mav, chuck (prior rev, but comments rolled in)
Differential Revision: https://reviews.freebsd.org/D32245
Remove the 5ms delays after writing the administrative queue
registers. These delays are from the very earliest days of the driver
(they are in the first commit) and were most likely vestiges of the
Chatham NVMe prototype card that was used to create this driver. Many of
the workarounds necessary for it aren't necessary for standards
compliant cards. The original driver had other areas marked for Chatham,
but these were not. They are unneeded. There's three lines of supporting
evidence.
First, the NVMe standards make no mention of a delay time after these
registers are written. Second, the Linux driver doesn't have them, even
as an option. Third, all my nvme cards work w/o them.
To be safe, add a write barrier between setting up the admin queue and
enabling the controller.
Sponsored by: Netflix
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D32247