This effectively mirrors our libc implementation, but with minor fudging --
name needs to be copied in from userspace, so we just copy it straight into
stack-allocated memfd_name into the correct position rather than allocating
memory that needs to be cleaned up.
The sealing-related fcntl(2) commands, F_GET_SEALS and F_ADD_SEALS, have
also been implemented now that we support them.
Note that this implementation is still not quite at feature parity w.r.t.
the actual Linux version; some caveats, from my foggy memory:
- Need to implement SHM_GROW_ON_WRITE, default for memfd (in progress)
- LTP wants the memfd name exposed to fdescfs
- Linux allows open() of an fdescfs fd with O_TRUNC to truncate after dup.
(?)
Interested parties can install and run LTP from ports (devel/linux-ltp) to
confirm any fixes.
PR: 240874
Reviewed by: kib, trasz
Differential Revision: https://reviews.freebsd.org/D21845
The NVMe specification requires unused entries in the Identify, Active
Namespace ID data to be zero. Fix is bzero the provided page, similar to
what is done for the Namespace Descriptors list.
Fixes UNH Tests 2.6 and 2.9
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24901
Dataset Management range specifications may have a zero length (a.k.a.
an empty range definition). Handle the case of all ranges being empty by
completing with Success (DSM commands are advisory only). For
Deallocate, skip empty range definitions when sending TRIM's to the
backing storage.
Fixes UNH Test 2.2.4
Reviewed by: imp
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24900
If the Predictable Latency Mode is not supported, NVMe Controllers must
return Invalid Field in Command status for the Get Features command
with IDs:
- Predictable Latency Mode Config
- Predictable Latency Mode Window
Fixes UNH Tests 3.6
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24899
This adds support for NVMe Get Features, Interrupt Vector Config
parameter error checking done by the UNH compliance tests.
Fixes UNH Tests 1.6.8 and 5.5.6
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24898
This commit updates the Identify Controller data to advertise the
Controller supports a single firmware slot and that firmware slot 1 is
read-only. Additionally, it returns an "Invalid Firmware Slot" error
when the host issues any Firmware Commit command (a.k.a. Firmware
Activate).
Fixes UNH Test 5.5.3
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24897
This adds support to bhyve's NVMe device emulation for processing Async
Event Requests but not returning them (i.e. Async Event Notifications).
Fixes UNH Test 5.5.2
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24896
Add checks that the combination of Starting LBA and Number of Logical
Blocks in a command will not exceed the range of the underlying storage.
Note that because NVMe specifices the Starting LBA as a uint64_t, care
must be taken when converting it and the block count to avoid an integer
overflow.
Fixes UNH Tests 2.2.3, 2.3.2, and 2.4.2
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24895
SMART data in NVMe includes statistics for number of read and write
commands issued as well as the number of "data units" read and written.
NVMe defines "data unit" as thousands of 512 byte blocks (e.g. 1 data
unit is 1-1,000 512 byte blocks, 3 data units are 2,001-3,000 512 byte
blocks).
This patch implements counters for:
- Data Units Read
- Data Units Written
- Host Read Commands
- Host Write Commands
and exposes the values when the guest reads the SMART/Health Log Page.
Fixes UNH Test 1.3.8
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24894
For NVMe emulation, validate the Data Set Management LBA ranges do not
exceed the capacity of the backing storage. If they do, return an "LBA
Out of Range" error.
Fixes UNH Test 2.2.3
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24893
NVMe controllers advertise their Max Data Transfer Size (MDTS) to limit
the number of page descriptors in an I/O request. Take advantage of this
and size the struct pci_nvme_ioreq accordingly.
Ensuring these values match both future-proofs the code and allows
removing some complexity which only exists to handle this possibility.
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24891
Split the NVM I/O function (i.e. nvme_opc_write_read) into separate
functions - one for RAM based backing-store and another for disk based
backing-store for easier maintenance. No functional changes.
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24890
The Format NVM command mainly allows the host to specify the block size
and protection information used for the Namespace. As the bhyve
implementation simply maps the capabilities of the backing storage
through to the guest, there isn't anything to implement. But a side
effect of the format is the NVMe Controller shall not return any data
previously written (i.e. erase previously written data). This patch
implements this later behavior to provide a compliant implementation.
Fixes UNH Test 1.6
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24889
Create a generic Get/Set Features by saving off the contents of CDW11
from the Set command and returning the saved value in the completion of
the Get command. Implementation allows providing optional implementation
for both Set and Get.
Add infrastructure to determine which feature ID's are namespace
specific and flag violations of this category of error.
Also adds the feature specific behavior of Set Features, Number of
Queues to only allow this command once per Controller reset.
Fixes UNH Tests 1.2, 5.4, and 5.5.6
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24887
Fix the logic in nvme_opc_get_log_page to calculate the number of DWORDS
(uint32_t) instead of WORDS (uint16_t) for the byte length. And only
return the allowed number of Log Page bytes as determined by the user
request and actual size of the requested log page.
Fixes UNH Test 1.3
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24885
Consolidate the code which writes Completion Queue entries and updates
the CQ doorbell value. While in the neighborhood, convert the "toggle CQ
phase bit" code to use an XOR operation instead of an "if/else" branch.
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24882
The NVMe code attempted to ensure thread safety through a combination of
using atomics and a "busy" flag. But this approach leads to unavoidable
race conditions.
Fix is to use per-queue mutex locks to ensure thread safety within the
queue processing code. While in the neighborhood, move all the queue
initialization code to a common function.
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D19841
This adds support for the NVMe I/O command Flush. For block-based
devices, submit a DIOCGFLUSH to the backing storage. Otherwise, command
is treated like a NOP and completes with a Successful status.
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24880
This refactors the NVMe I/O command processing function to make adding
new commands easier. The main change is to move command specific
processing (i.e. Read/Write) to separate functions for each NVMe I/O
command and leave the common per-command processing in the existing
pci_nvme_handle_io_cmd() function.
While here, add checks for some common errors (invalid Namespace ID,
invalid opcode, LBA out of range).
Add myself to the Copyright holders
Reviewed by: imp
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24879
Convert the debug and warning logging macros to be parameterized and
correctly use bhyve's PRINTLN macro.
Reviewed by: imp
Tested by: Jason Tubnor
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24878
Suppose a thread is running on a CPU in a NUMA domain with no physical
RAM. When an item is freed to a first-touch zone, it ends up in the
cross-domain bucket. When the bucket is full, it gets placed in another
domain's bucket queue. However, when allocating an item, UMA will
always go to the keg upon a per-CPU cache miss because the empty
domain's bucket queue will always be empty. This means that a non-empty
domain's bucket queues can grow very rapidly on such systems. For
example, it can easily cause mbuf allocation failures when the zone
limit is reached.
Change cache_alloc() to follow a round-robin policy when running on an
empty domain.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25355
When job control is not enabled, the shell ignores SIGINT while waiting for
a foreground process unless that process exits on SIGINT. In this case, the
foreground process is sleep and it does not exit on SIGINT because the
signal is only sent to the shell. Depending on order of events, this could
cause the SIGINT to be unexpectedly ignored.
On lightly loaded bare metal, the chance of this happening tends to be less
than 0.01% but with higher loads and/or virtualization it becomes more
likely.
Starting the sleep in background and using the wait builtin ensures SIGINT
will not be ignored.
PR: 247559
Reported by: lwhsu
MFC after: 1 week
For 1000Mb mode to work reliably TX/RX delays need to be configured
between the TX/RX clock and the respective signals on the PHY
to compensate for differing trace lengths on the PCB.
Reviewed by: manu
MFC after: 1 week
with python3.8 from Focal triggers those.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25491
AcpiOsMapMemory is used for device memory when e.g. an _INI method wants
to access physical memory, however, aarch64 pmap_mapbios is hardcoded to
writeback. Search for the correct memory type to use in pmap_mapbios.
Submitted by: Greg V <greg_unrelenting.technology>
Differential Revision: https://reviews.freebsd.org/D25201
llvmorg-10.0.1-rc2-0-g77d76b71d7d.
Also add a few more llvm utilities under WITH_CLANG_EXTRAS:
* llvm-dwp, a utility for merging DWARF 5 Split DWARF .dwo files into
.dwp (DWARF package files)
* llvm-size, a size(1) replacement
* llvm-strings, a strings(1) replacement
MFC after: 3 weeks
Posix says that the interpretation of the locale string is
"implementation-defined", so we ought to document what is
actually recognized.
Also add a cross reference to locale(1).
PR: 247553
MFC after: 1 week
This flag will now show the processor number on which a process is running.
This change was inspired by PR129965. Initially I didn't think that the
patch attached to it was correct -- it sacrificed ki_estcpu use in "cpu"
for ki_lastcpu and I thought that the old functionality should be kept and
the new (cpu#) one added to it. But I've since discovered that ki_estcpu is
sched_4bsd-specific. What's worse, it represents the same thing as
ki_pctcpu, except ki_pctcpu is universal -- so "%cpu" has been using it
successfully. Therefore, I've decided to replace information based on
ki_estcpu with information based on ki_oncpu/ki_lastcpu.
Key parts of the code and manual changes were borrowed from top(1).
PR: 129965
Reported by: Nikola Knežević
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25377