Add ICL_NOCOPY flag to icl_pdu_append_data(), specifying that the method
can just reference the data buffer instead of immediately copying it.
Extend the offload KPI with optional PDU queue method, allowing to specify
completion callback, called when all the data referenced by above has been
transferred and won't be accessed any more (the buffers can be freed).
Implement the above functionality in software iSCSI driver using mbufs
with external storage and reference counter. Note that some NICs (ixl(4))
may keep the mbuf in TX queue for a long time, so CTL has to be ready.
Add optional method to struct ctl_scsiio for buffer reference counting.
Implement it for CTL block backend, allowing to delay free of the struct
ctl_be_block_io and memory it references as needed. In first reincarnation
of the patch I tried to delay whole I/O as it is done for FibreChannel,
that was cleaner, but due to the above callback delays I had to rewrite
it this way to not leave LUN referenced potentially for hours or more.
All together on sequential read from ZFS ARC this saves about 30% of CPU
time and memory bandwidth by avoiding one of 3 memory copies (the other
two are from ZFS ARC to DMU cache and then from DMU cache to CTL buffers).
On tests with 2x Xeon Silver 4114 this allows to reach full line rate of
100GigE NIC. Tests with Gold CPUs and two 100GigE NICs are stil TBD,
but expectations to saturate them are pretty high. ;)
Discussed with: Chelsio
Sponsored by: iXsystems, Inc.
VHDX is the successor of Microsoft's VHD file format. It increases
maximum capacity of the virtual drive to 64TB and introduces features
to better handle power/system failures.
VHDX is the required format for 2nd generation Hyper-V VMs.
Reviewed by: marcel
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D25184
Rework to simplify and impose sane url syntax.
That is we allow for file://[devname[:fstype]]/package
Reviewed by: stevek
MFC after: 1 week
Sponsored by: Juniper Networks
Differential Revision: https://reviews.freebsd.org//D25134
Currently we only call sbi_shutdown in cpu_reset, which means we reach
"Please press any key to reboot." even when RB_POWEROFF is set, and only
once the user presses a key do we then shutdown. Instead, register a
shutdown_final event handler and make an SBI shutdown call if
RB_POWEROFF is set.
Reviewed by: br, jhb (mentor), kp
Approved by: br, jhb (mentor), kp
Differential Revision: https://reviews.freebsd.org/D25183
As of r359457 we removed the GDB_LIBEXEC option, always installing in-tree
gdb into /usr/libexec/. Thus, there is now no need for crashinfo to include
/usr/bin/gdb in the list of pathnames to check when looking for gdb.
is to know the time to first byte in and time to first byte out. Currently we
have no way to know these all we have is t_starttime. That (t_starttime) tells us
what time the 3 way handshake completed. We don't know when the first
request came in or how quickly we responded. Nor from a client perspective
do we know how long from when we sent out the first byte before the
server responded.
This small change adds the ability to track the TTFB's. This will show up in
BB logging which then can be pulled for later analysis. Note that currently
the tracking is via the ticks variable of all three variables. This provides
a very rough estimate (hz=1000 its 1ms). A follow-on set of work will be
to change all three of these values into something with a much finer resolution
(either microseconds or nanoseconds), though we may want to make the resolution
configurable so that on lower powered machines we could still use the much
cheaper ticks variable.
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D24902
This can happen with very large kernels (e.g. ones embedding a root
filesystem). The DTB written by OpenSBI/BBL is quite small so this is
unlikely to hit important data, but if it does this can result in very
confusing and hard-to-debug crashes. Add a KASSERT() and a verbose print
to catch this problem with debug kernels.
While this will not print any output by default if it fails (that would
depend on EARLY_PRINTF), at least the kernel now halts reliably instead
of randomly crashing.
Reviewed By: mhorne
Differential Revision: https://reviews.freebsd.org/D25153
By default OpenSBI and BBL will pass the DTB at a 2MB-aligned address.
However, by default there are no 2MB aligned regions between the SBI and
the kernel, so we have to choose a 2MB aligned region after the kernel.
OpenSBI defaults to placing the DTB 32MB after the start of the kernel but
this is not sufficient for a kernel with a large MFS embedded.
We could increase this offset to a larger number (e.g. 64/128/256) but that
imposes restrictions on the minimum RAM size.
Another solution would be to place the DTB between OpenSBI and the kernel
at 1MB alignment, but current locore.S code assumes 2MB alignment.
With this change I can now boot on QEMU with an OpenSBI configured to
store the DTB at an offset of 1MB.
See also https://github.com/riscv/opensbi/issues/169
Reviewed By: mhorne
Differential Revision: https://reviews.freebsd.org/D25151
If the POWER firmware detects a bad CPU core, it will "GUARD" it out,
marking it disabled. Any attempt to spin up a bad CPU will trigger a panic
later on when waiting for threads on said core to wake up. Support limping
along on fewer cores instead.
x86 boot uses loader(8) and the boot2-direct-to-kernel process is not
supported. Remove the documentation, which doesn't document a working
process and leads to confusion.
PR: 247074
Reported by: Alex K.
This was intended to test the locking used in the MacOS X kernel on a
FreeBSD system, to make use of WITNESS and other debugging infrastructure.
This hasn't been used for ages, to take it out to reduce the #ifdef
complexity.
MFC after: 1 week
This test should no longer provoke large amounts of traffic, which can
overwhelm single-core systems, preventing them from making progress in the
tests.
The test can now be re-enabled.
PR: 246448
An EXAMPLE section was adding with some basic examples that show the use of
uniq(1) with various flags.
Submitted by: fernape@
Approved by: bcr@
MFC after: 4 days
Relnotes: yes (EXAMPLE section for uniq(1))
Differential Revision: https://reviews.freebsd.org/D25149
As timeout(9) was removed and all consumers were converted to
callout(9), reference it instead for the description of sbt, pr,
and flags arguments.
Reviewed by: trasz
Differential Revision: https://reviews.freebsd.org/D25165
Use %hs (locale-based encoding) instead of %s (UTF-8) format for
strings that are expected to be in current locale encoding (date/time,
process names/argument list).
PR: 241491
Reviewed by: phil
Differential Revision: https://reviews.freebsd.org/D22160
This logic is running the beacon receive bits in STA+AP mode on both the
STA and AP side. The STA side sees its beacons from the BSS fine; the
AP side is seeing other beacons on the same channel but with the BSS
node for some odd reason. (I think it's a valid reason, but I currently
forget what that valid reason is.)
So, just to be cleaner about things, don't run the nexttbtt/etc bits
at all if we're in hostap mode. If I ever get mesh working then maybe
I'll make sure it works right on mesh+ap and mesh+sta modes.
Whilst here, log the VAP i'm being called on to make it clearer what
is going on. I may end up adding a VAP dprintf version of this at
some point.
Tested:
* AR9380, STA (DWDS client) + hostap on the same NIC
This removes the requirement to know what's in the ifp.
(If someone wants a quick clean-up task, it'd be nice to convert instances
of ifp dereferencing for if_xname over to this method.)
Somewhat predictably, software often wants to use \x27/\x24 among others so
that they can decline worrying about ugly escaping, if said escaping is even
possible. Right now, this software is using these and getting the wrong
results, as we'll interpret those as x27 and x24 respectively. Some examples
of this, when an exp-run was ran, were science/octopus and misc/vifm.
Go ahead and process these at all times. We allow either one or two digits,
and the tests account for both. If extra digits are specified, e.g. \x2727,
then the third and fourth digits are interpreted literally as one might
expect.
PR: 229925
MFC after: 2 weeks
As of r361857 all BINUTILS options are disabled by default - ports
have been changed to depend on binutils if they require GNU as, and
all base system assembly files have been switched to use Clang's
integrated assembler.
Relnotes: Yes
Sponsored by: The FreeBSD Foundation
This updates the logic to allow:
* A-MPDU if available;
* A-MSDU if available and A-MPDU is off/NACKed;
* A-MPDU+A-MSDU if it's available and negotiated;
* Fast frames if the node is 11abg (and not HT/VHT.)
This allows for things to fail back to A-MSDU or fast frames
if A-MPDU isn't available rather than needing to be non-HT/non-VHT.
It also allows A-MPDU+A-MSDU to work if it's negotiated.
Tested:
* AR9380, STA + AP mode (A-MPDU, A-MSDU, FF, A-MPDU+A-MSDU)
* RT5350, STA mode (A-MSDU, FF)
* AR9170, STA mode (A-MSDU, FF)
synchronous inode update.
The IN_SIZEMOD and IN_IBLKDATA flags indicate changes to the
file size and block pointer fields in the inode. When these
fields have been changed, the fsync() and fsyncdata() system
calls must write the inode to ensure their semantics that the
file is on stable store.
The IN_SIZEMOD and IN_IBLKDATA flags cannot be cleared until
a synchronous write of the inode is done. If they are cleared
on an asynchronous write, then the inode may not yet have been
written to the disk when an fsync() or fsyncdata() call is done.
Absent these flags, these calls would not know that they needed
to write the inode. Thus, these flags only can be cleared on
synchronous writes of the inode. Since the inode will be locked
for the duration of the I/O that writes it to disk, no fsync()
or fsyncdata() will be able to run before the on-disk inode
is complete.
Reviewed by: kib
MFC with: -r361785
Differential revision: https://reviews.freebsd.org/D25072
Add xref to all SIM devices we currently have (including a rough indication
which ones are likely to fail).
Update to include all the CAM options.
Fix a few igor nits while I'm here.
Some automation tries to detect if nvd or nda is in used, and the presence of
both confuses it. Provide a knob to turn off nvd alias creation
(kern.cam.nda.nvd_compat=0) for these situations. The default is the same:
create the nvd compat link.
Summary:
Radix on AIM, and all of Book-E (currently), can do direct addressing of
user space, instead of needing to map user addresses into kernel space.
Take advantage of this to optimize the copy(9) functions for this
behavior, and avoid effectively NOP translations.
Test Plan: Tested on powerpcspe, powerpc64/booke, powerpc64/AIM
Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D25129
Summary:
The point of this addition is to cache CPU behavior 'features', to avoid
having to recompute based on CPU, etc.
The first such use case is to avoid the unnecessary manipulation of the
SLBs (Segment Lookaside Buffers) when using the Radix pmap on POWER9.
Since we already get the PCPU pointer wherever we swap the SLB entries,
we can use a cached flag to check if it's necessary to perform the
operation anyway, and skip it when not.
Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D24908