All I/O requests through the taste consumers are synchronous, done
with g_read_data() and without any locks held. It makes no sense
to delegate the I/O to g_down/g_up threads.
This removes many of context switches during disk retaste.
MFC after: 2 weeks
If the NVMe Controller doesn't support Namespace Management, it should
return "Invalid Namespace or Format" when the Host request Identify
Namespace with the global NSID value.
Fixes UNH IOL 16.0 Test 9.1, Case 6
Reviewed by: imp, allanjude
Tested by: jason@tubnor.net
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D33578
NVMe Controllers which do not support Endurance Groups must return an
error when the Endurance Group Event Aggregate Log Change Notices bit is
set in Set Features, Asynchronous Event Configuration.
Fixes UNH IOL Test 3.12, Case 8
Tested by: jason@tubnor.net
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D33577
The function which checks for a valid LBA range mistakenly named an
input value as NLB ("Number of Logical Blocks") instead of "number of
blocks". The NVMe specification defines NLB as a zero-based value (i.e.
NLB=0x0 represents 1 block, 0x1 is 2 blocks, etc.), but the passed
parameter is a 1's-based value.
Fix is to rename the variable to avoid future confusion.
While in the neighborhood, also check that the starting LBA is less than
the size of the backing storage to avoid an integer overflow.
Reviewed by: imp, allanjude, jhb
Tested by: jason@tubnor.net
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D33575
Implement basic support for the SEL field of Get Features. This returns
information about Namespace Specific features.
Fixes UNH ILO 16.0 Test 1.2, Case 13
Reviewed by: imp, allanjude
Tested by: jason@tubnor.net
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D33574
Compliant v1.4 Controllers must report a Controller Type (CNTRLTYPE).
Also, do not advertise secure erase functionality in the Format NVM
Attributes field of the Identify Controller data structure as the
Controller does not implement secure erase.
Fixes UNH ILO Test 1.1, Case 2
Reviewed by: imp, allanjude
Tested by: jason@tubnor.net
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D33573
This adds the ability for a guest OS to send Set / Get Feature,
Temperature Threshold commands. The implementation assumes a constant
temperature and will generate an Asynchronous Event Notification if the
specified threshold is above/below this value. Although the
specification allows 9 temperature values, this implementation only
implements the Composite Temperature.
While in the neighborhood, move the clear of the CSTS register in the
reset function after all other cleanup. This avoids a race with the
guest thinking the reset is complete (i.e. CSTS.RDY = 0) before the NVMe
emulation is actually complete with the reset.
Fixes UNH IOL 16.0 Test 1.7, cases 1, 2, and 4.
Tested by: jason@tubnor.net
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D33572
Be more conservative and only support the Features mandatory for an I/O
Controller.
Avoids a "hang" in UNH test 1.2.10 associated with Predictable Latency
Mode Configuration and Host Behavior Support features.
Tested by: jason@tubnor.net
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D33571
The NVMe emulation checked if the Asynchronous Event Request Limit
(a.k.a AERL) would be exceeded in pci_nvme_aer_add(), but this function
is only called from nvme_opc_async_event_req() which also checks for
exceeding the AERL.
Reviewed by: imp, allanjude
Tested by: jason@tubnor.net
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D33570
Modify the Get Log Page command to parse the Log Page Offset fields to
support more recent versions of the NVMe specification.
Fixes various tests for UNH Test 1.3.*
Reviewed by: imp, allanjude
Tested by: jason@tubnor.net
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D33568
Return an error if the feature specified in Set Features is Namespace
specific but the Namespace ID uses the Global Namespace tag.
Fixes UNH Test 1.2.7
Reviewed by: imp, allanjude
Tested by: jason@tubnor.net
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D33566
The NVM Format command is unique among the Admin commands in that it
needs to finish asynchronously. For this reason, the emulation code
invented a synthetic completion status (NVME_NO_STATUS) to indicate that
the command was still in progress and the command processing loop should
not generate a completion message. The implementation used the value
0xffff for the synthetic value as this set both the Status Code and
Status Code Type fields to reserved values.
Format initialized the completion status to this value and expected
error cases to override it with a status code/type appropriate to the
situation. The macros used to set the NVMe status are careful not to
modify bit 0 (i.e. the phase bit), which with the synthetic completion
status, causes the phase bit to get out of sync. When running tests in a
guest with illegal NVM Format commands, Admin commands would eventually
hang because it appeared there were no completions due to the incorrect
phase bit value.
Fix is to only set NVME_NO_STATUS if the blockif delete command
succeeds. While in the neighborhood, add a missing break statement when
NVM Format is not supported.
Reviewed by: imp, allanjude
Tested by: jason@tubnor.net
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D33565
Merge commit c7c84b90879f from llvm git (by Adrian Prantl):
[DwarfDebug] Refuse to emit DW_OP_LLVM_arg values wider than 64 bits
DwarfExpression::addUnsignedConstant(const APInt &Value) only supports
wider-than-64-bit values when it is used to emit a top-level DWARF
expression representing the location of a variable. Before this change,
it was possible to call addUnsignedConstant on >64 bit values within a
subexpression when substituting DW_OP_LLVM_arg values.
This can trigger an assertion failure (e.g. PR52584, PR52333) when it
happens in a fragment (DW_OP_LLVM_fragment) expression, as
addUnsignedConstant on >64 bit values splits the constant into separate
DW_OP_pieces, which modifies DwarfExpression::OffsetInBits.
This change papers over the assertion errors by bailing on overly wide
DW_OP_LLVM_arg values. A more comprehensive fix might be to be to split
wide values into pointer-sized fragments.
[0] https://github.com/llvm/llvm-project/blob/e71fa03/llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp#L799-L805
Patch by Ricky Zhou!
Differential Revision: https://reviews.llvm.org/D115343
MFC after: 3 days
- Use Ar macros for arguments
- Stylize the argument synopsis to the -d flag
- Change the width of the list to one of the actual tags in the list
- Stylize "ugen" and "/dev/ugen" with Cm as those are constant strings,
which are usually treated as command modifiers.
- Break long lines to reduce the number of warnings from linters
- Fix examples; the -d flag is now required when specifying the unit and
the address with the "dot notation".
MFC after: 2 weeks
The geom_gate API provides 2 distinct paths for exchanging error
details between the kernel and the userland client: Including an error
code in the g_gate_ctl_io structure passed in the ioctl(2) call or
having the ioctl(2) call return -1 with an error code in errno. The
latter reflects errors in the ioctl(2) call itself whilst the former
reflects errors within the geom_gate instance.
The G_GATE_CMD_START ioctl blocks waiting for an I/O request to be
directed to the geom_gate instance and the wait can fail
(necessitating an error return) if the geom_gate instance is destroyed
or if the msleep(9) fails. The code previously treated both error
cases indentically: Returning ECANCELED as a geom_gate instance error
(which the ggatec treats as a fatal error). Whilst this is the correct
behaviour if the geom_gate instance is destroyed, a msleep(9) failure
is unrelated to the geom_gate instance itself and should be reported
as an ioctl(2) "failure". The distinction is important because
msleep(9) can return ERESTART, which means the system call should be
retried (and this will occur automatically as part of the generic
syscall return processing).
This change alters the msleep(9) handling to directly return the error
code from msleep(9), which ensures ERESTART is correctly handled,
rather than being treated as a fatal error.
Reviewed by: Johannes Totz <jo@bruelltuete.com>
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D33996
Providing a timestamp with seconds granularity helps make it obvious
that the display is updating.
Reviewed by: mckusick
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29181
- stop on first error
- improve awk script: print the last two characters for bigram - not the second word
- remove unnecessary checks
- use mktemp
- refactor
If there are no more entries, or if we fail to restore the rcvif of a
queued mbuf dn_dequeue() can return NULL.
Cope with this.
Reviewed by: glebius
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34078
When we remove an interface it is first removed from the interface list
V_ifnet (by if_unlink_ifnet()) and marked as IFF_DYING. We then wait for
any possible references to stop being used (i.e.
epoch_wait/epoch_drain_callbacks) before we tear it fully down.
However, the index in ifindex_table is not removed, so m_rcvif_restore()
can still find the (now dying) interface.
This results in panics, for example when dummynet restores the rcvif
pointer and passes a packet to ip6_input() we can panic because the
AF_INET6 domain has already been removed (so we end up dereferencing a
NULL pointer there).
Check that the interface is not dying before we restore it, which is
equivalent to checking its presence in V_ifnet, and thus ensures that
future accesses (while in NET_EPOCH) are safe.
Reviewed by: glebius
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34076
This was prompted by the recent pkexec vulnerability (CVE-2021-4034).
This change is being made on general principle for setuid/setgid
binaries and is not in response to an actual issue.
Reviewed by: kevans, markj (both earlier)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34087
- Check that catopen() succeeded before calling catclose(). musl will
crash in the latter if the catalogue descriptor is -1.
- Keep the message catalogue open for most of sort(1)'s actual
operation.
- Don't use catgets(3) to print error messages if catopen(3) had failed.
Reviewed by: arichardson, emaste
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34081
Add a nested include of <sys/systm.h> for recently added assertions.
Without this, existing code (such as in drm-kmod) needs to be patched
to add the newly required header.
While here, rewrite the assertions using KASSERT().
Reviewed by: dougm, alc, imp, kib
Differential Revision: https://reviews.freebsd.org/D34070
When an iSCSI target session is terminated, an internal nexus reset
task is posted to abort existing tasks belonging to the session.
Previously, the ctl_io for this internal nexus reset stored a pointer
to the session in the slot that normally holds a pointer to the PDU
from the initiator that triggered the I/O request. The completion
handler then assumed that any nexus reset I/O was due to an internal
request and fetched the session pointer (instead of the PDU pointer)
from the ctl_io. However, it is possible to trigger a nexus reset via
an on-the-wire task management PDU. If such a PDU were sent to the
target, then the completion handler would incorrectly treat this
request as an internal request and treat the pointer to the received
PDU as a pointer to the session instead.
To fix, allocate a dummy PDU for the internal reset task and use an
invalid opcode to differentiate internal nexus resets from resets
requested by the initiator.
PR: 260449
Reported by: Robert Morris <rtm@lcs.mit.edu>
Reviewed by: mav
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D34055
All I/O requests through the taste consumer are synchronous, done
with g_read_data() and without any locks held. It makes no sense
to delegate the I/O to g_down/g_up threads.
This removes many of context switches during disk retaste.
MFC after: 2 weeks
The only cases when direct dispatch does not make sense is for I/O
submission from down thread and for completion from up thread. In
all other cases, if both consumer and producer are OK about it, we
can save on context switches.
MFC after: 2 weeks
Unlike normal consumers all taste consumer I/O is synchronous, done
with g_read_data() and without any locks held. It makes no sense to
delegate I/O submission to g_down thread.
This should remove number of context switches during disk retaste.
MFC after: 2 weeks
Per RFC2822 the maximum transmitted line length is "998 characters...
excluding the CRLF." In a file the maximum is 999 with the \n included.
Previously mail containing a line with exactly 999 characters would
bounce.
PR: 208261
Reported by: Helge Oldach
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Don't emit messages; this isn't any different from a Linux kernel
built without OPTIONS_SECCOMP, so the userspace already needs to know
how to deal with it. This is also similar with how we handle seccomp
in linux_prctl().
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D33808
Import bsddialog 0.1 Utility and Library, fully refatorized, API stable,
manuals completed, easier to maintain and improve.
Update deps for new API:
add mixedgauge consts, delete __DECONST and add bsddialog_geterror()
info to avoid silent errors
* tzsetup
* kbdmap
* distextract
Differential Revision: https://reviews.freebsd.org/D34066
When a filesystem is mounted all of its associated snapshots must
be activated. It first allocates a snapshot lock (snaplk) that will
be shared by all the snapshot vnodes associated with the filesystem.
As part of each snapshot file activation, it must replace its own
ufs vnode lock with the snaplk. In this way acquiring the snaplk
gives exclusive access to all the snapshots for the filesystem.
A write to a ufs vnode first acquires the ufs vnode lock for the
file to be written then acquires the snaplk. Once it has the snaplk,
it can check all the snapshots to see if any of them needs to make
a copy of the block that is about to be written. This ffs_copyonwrite()
code path establishes the ufs vnode followed by snaplk locking
order.
When a filesystem is unmounted it has to release all of its snapshot
vnodes. Part of doing the release is to revert the snapshot vnode
from using the snaplk to using its original vnode lock. While holding
the snaplk, the vnode lock has to be acquired, the vnode updated
to reference it, then the snaplk released. Acquiring the vnode lock
while holding the snaplk violates the ufs vnode then snaplk order.
Because the vnode lock is unused, using LK_EXCLUSIVE | LK_NOWAIT
to acquire it will always succeed and the LK_NOWAIT prevents the
reverse lock order from being recorded.
This change was made in January 2021 (173779b98f) to avoid an LOR
violation in ffs_snapshot_unmount(). The same LOR issue was recently
found again when removing a snapshot in ffs_snapremove() which must
also revert the snaplk to the original vnode lock as part of freeing it.
The unwind in ffs_snapremove() deals with the case in which the
snaplk is held as a recursive lock holding multiple references.
Specifically an equal number of references are made on the vnode
lock. This change factors out the lock reversion operations into a
new function revert_snaplock() which handles both the recursive
locks and avoids the LOR. The new revert_snaplock() function is
then used in both ffs_snapshot_unmount() and in ffs_snapremove().
Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33946