This commit adds BLAKE3 checksums to OpenZFS, it has similar
performance to Edon-R, but without the caveats around the latter.
Homepage of BLAKE3: https://github.com/BLAKE3-team/BLAKE3
Wikipedia: https://en.wikipedia.org/wiki/BLAKE_(hash_function)#BLAKE3
Short description of Wikipedia:
BLAKE3 is a cryptographic hash function based on Bao and BLAKE2,
created by Jack O'Connor, Jean-Philippe Aumasson, Samuel Neves, and
Zooko Wilcox-O'Hearn. It was announced on January 9, 2020, at Real
World Crypto. BLAKE3 is a single algorithm with many desirable
features (parallelism, XOF, KDF, PRF and MAC), in contrast to BLAKE
and BLAKE2, which are algorithm families with multiple variants.
BLAKE3 has a binary tree structure, so it supports a practically
unlimited degree of parallelism (both SIMD and multithreading) given
enough input. The official Rust and C implementations are
dual-licensed as public domain (CC0) and the Apache License.
Along with adding the BLAKE3 hash into the OpenZFS infrastructure a
new benchmarking file called chksum_bench was introduced. When read
it reports the speed of the available checksum functions.
On Linux: cat /proc/spl/kstat/zfs/chksum_bench
On FreeBSD: sysctl kstat.zfs.misc.chksum_bench
This is an example output of an i3-1005G1 test system with Debian 11:
implementation 1k 4k 16k 64k 256k 1m 4m
edonr-generic 1196 1602 1761 1749 1762 1759 1751
skein-generic 546 591 608 615 619 612 616
sha256-generic 240 300 316 314 304 285 276
sha512-generic 353 441 467 476 472 467 426
blake3-generic 308 313 313 313 312 313 312
blake3-sse2 402 1289 1423 1446 1432 1458 1413
blake3-sse41 427 1470 1625 1704 1679 1607 1629
blake3-avx2 428 1920 3095 3343 3356 3318 3204
blake3-avx512 473 2687 4905 5836 5844 5643 5374
Output on Debian 5.10.0-10-amd64 system: (Ryzen 7 5800X)
implementation 1k 4k 16k 64k 256k 1m 4m
edonr-generic 1840 2458 2665 2719 2711 2723 2693
skein-generic 870 966 996 992 1003 1005 1009
sha256-generic 415 442 453 455 457 457 457
sha512-generic 608 690 711 718 719 720 721
blake3-generic 301 313 311 309 309 310 310
blake3-sse2 343 1865 2124 2188 2180 2181 2186
blake3-sse41 364 2091 2396 2509 2463 2482 2488
blake3-avx2 365 2590 4399 4971 4915 4802 4764
Output on Debian 5.10.0-9-powerpc64le system: (POWER 9)
implementation 1k 4k 16k 64k 256k 1m 4m
edonr-generic 1213 1703 1889 1918 1957 1902 1907
skein-generic 434 492 520 522 511 525 525
sha256-generic 167 183 187 188 188 187 188
sha512-generic 186 216 222 221 225 224 224
blake3-generic 153 152 154 153 151 153 153
blake3-sse2 391 1170 1366 1406 1428 1426 1414
blake3-sse41 352 1049 1212 1174 1262 1258 1259
Output on Debian 5.10.0-11-arm64 system: (Pi400)
implementation 1k 4k 16k 64k 256k 1m 4m
edonr-generic 487 603 629 639 643 641 641
skein-generic 271 299 303 308 309 309 307
sha256-generic 117 127 128 130 130 129 130
sha512-generic 145 165 170 172 173 174 175
blake3-generic 81 29 71 89 89 89 89
blake3-sse2 112 323 368 379 380 371 374
blake3-sse41 101 315 357 368 369 364 360
Structurally, the new code is mainly split into these parts:
- 1x cross platform generic c variant: blake3_generic.c
- 4x assembly for X86-64 (SSE2, SSE4.1, AVX2, AVX512)
- 2x assembly for ARMv8 (NEON converted from SSE2)
- 2x assembly for PPC64-LE (POWER8 converted from SSE2)
- one file for switching between the implementations
Note the PPC64 assembly requires the VSX instruction set and the
kfpu_begin() / kfpu_end() calls on PowerPC were updated accordingly.
Reviewed-by: Felix Dörre <felix@dogcraft.de>
Reviewed-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de>
Co-authored-by: Rich Ercolani <rincebrain@gmail.com>
Closes#10058Closes#12918
The latest version of the ec2-scripts package includes a completely
rewritten "use EC2 ephemeral disks for swap space" script. Now that
we have something which works on recent versions of FreeBSD, turn it
on since it's a great way to use the ephemeral disks.
Note that the option for controlling this, ec2_ephemeral_swap_enable,
is not the same as the option (ec2_ephemeralswap_enable) used with the
previous ephemeral-swap script; this change was deliberate to avoid
astonishment for users who upgraded their ec2-scripts package and had
a setting left behind in rc.conf.
Prior to 9b6edf364e WITHOUT_KERNEL_SYMBOLS split kernel debug data
into standalone debug files at build time, but did not install those
files. As of 9b6edf364e it stopped splitting the debug data, leaving
it in the kernel and modules (the default kernel configs include
DEBUG=-g).
Revert 9b6edf364e and introduce a new build-time SPLIT_KERNEL_DEBUG
knob, as some people rely on the pre-9b6edf364eb0 WITHOUT_KERNEL_SYMBOLS
behaviour and that was imp's original intent.
PR: 264433
Reviewed by: eugen, imp
MFC after: 3 weeks
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35437
Passwords may be accepted by both the PasswordAuthentication and
KbdInteractiveAuthentication authentication schemes. Add a reference to
the latter in the description/comment for PasswordAuthentication, as it
otherwise may seem that "PasswordAuthentication no" implies passwords
will be disallowed.
This situation should be clarified with more extensive documentation on
the authentication schemes and configuration options, but that should be
done in coordination with upstream OpenSSH. This is a minimal change
that will hopefully clarify the situation without requiring an extensive
local patch set.
PR: 263045
Reviewed by: manu (earlier version)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35272
The KASSERT criteria needs to be checked against the
sendbuffer so_snd in a subsequent version.
Reviewed By: tuexen, #transport
PR: 263445
MFC after: 1 week
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D35431
While at it, fix double initialization of the "drv_ioctl_data_d" structure
and the "mask" variable.
Reviewed by: kib@
MFC after: 1 week
Sponsored by: NVIDIA Networking
Missed another NULL dereference during KASSERTS after traversing
the scoreboard. While at it, scratch the goto by making the
traversal conditional, and remove duplicate checks using an
unconditional loop with all checks inside.
Reviewed By: hselasky
PR: 263445
MFC after: 1 week
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D35428
Do this by reducing the size of the MBUF_PEXT_MAX_PGS, causing "struct mbuf" to
be bigger than M_SIZE, and also add a missing padding field to ensure 64-bit
alignment.
Reviewed by: gallatin@
Reported by: Elliott Mitchell
Differential revision: https://reviews.freebsd.org/D35339
MFC after: 1 week
Sponsored by: NVIDIA Networking
Debug data is enabled via `makeoptions DEBUG=-g` in the kernel config
file (e.g. GENERIC).
If debug data is enabled and WITHOUT_KERNEL_SYMBOLS is set then debug
data is included in the kernel and module files.
PR: 264433
Discussed with: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
mlx4_QP_FLOW_STEERING_DETACH_wrapper first removes the steering
rule (which results in freeing the rule structure), and then
references a field in this struct (the qp number) when releasing the
busy-status on the rule's qp.
Since this memory was freed, it could reallocated and changed.
Therefore, the qp number in the struct may be incorrect,
so that we are releasing the incorrect qp. This leaves the rule's qp
in the busy state (and could possibly release an incorrect qp as well).
Fix this by saving the qp number in a local variable, for use after
removing the steering rule.
Linux commit:
3b01fe7f91c8e4f9afc4fae3c5af72c14958d2d8
PR: 264469
MFC after: 1 week
Sponsored by: NVIDIA Networking
Basic TLS RX offloading uses the "csum_flags" field in the mbuf packet
header to figure out if an incoming mbuf has been fully offloaded or
not. This information follows the packet stream via the LRO engine, IP
stack and finally to the TCP stack. The TCP stack preserves the mbuf
packet header also when re-assembling packets after packet loss. When
the mbuf goes into the socket buffer the packet header is demoted and
the offload information is transferred to "m_flags" . Later on a
worker thread will analyze the mbuf flags and decide if the mbufs
making up a TLS record indicate a fully-, partially- or not decrypted
TLS record. Based on these three cases the worker thread will either
pass the packet on as-is or recrypt the decrypted bits, if any, or
decrypt the packet as usual.
During packet loss the kernel TLS code will call back into the network
driver using the send tag, informing about the TCP starting sequence
number of every TLS record that is not fully decrypted by the network
interface. The network interface then stores this information in a
compressed table and starts asking the hardware if it has found a
valid TLS header in the TCP data payload. If the hardware has found a
valid TLS header and the referred TLS header is at a valid TCP
sequence number according to the TCP sequence numbers provided by the
kernel TLS code, the network driver then informs the hardware that it
can resume decryption.
Care has been taken to not merge encrypted and decrypted mbuf chains,
in the LRO engine and when appending mbufs to the socket buffer.
The mbuf's leaf network interface pointer is used to figure out from
which network interface the offloading rule should be allocated. Also
this pointer is used to track route changes.
Currently mbuf send tags are used in both transmit and receive
direction, due to convenience, but may get a new name in the future to
better reflect their usage.
Reviewed by: jhb@ and gallatin@
Differential revision: https://reviews.freebsd.org/D32356
Sponsored by: NVIDIA Networking
So that the asserts and the actual code see the same values.
Differential revision: https://reviews.freebsd.org/D32356
MFC after: 1 week
Sponsored by: NVIDIA Networking
When packets are received they may traverse several network interfaces like
vlan(4) and lagg(9). When doing receive side offloads it is important to
know the first network interface entry point, because that is where all
offloading is taking place. This makes it possible to track receive
side route changes for multiport setups, for example when lagg(9) receives
traffic from more than one port. This avoids having to install multiple
offloading rules for the same stream.
This field works similar to the existing "rcvif" mbuf packet header field.
Submitted by: jhb@
Reviewed by: gallatin@ and gnn@
Differential revision: https://reviews.freebsd.org/D35339
Sponsored by: NVIDIA Networking
Sponsored by: Netflix
The TLS receive tags are allocated directly from the receiving interface,
because mbufs are flowing in the opposite direction and then route change
checks are not useful, because they only work for outgoing traffic.
Differential revision: https://reviews.freebsd.org/D32356
Sponsored by: NVIDIA Networking
The TLS receive tags are allocated directly from the receiving interface,
because mbufs are flowing in the opposite direction and then route change
checks are not useful, because they only work for outgoing traffic.
Differential revision: https://reviews.freebsd.org/D32356
Sponsored by: NVIDIA Networking
Remove bounce buffering code for blkback and only attach if Xen
creates IOMMU entries for grant mapped pages.
Such bounce buffering consumed a non trivial amount of memory and CPU
resources to do the memory copy, when it's been a long time since Xen
has been creating IOMMU entries for grant maps.
Refuse to attach blkback if Xen doesn't advertise that IOMMU entries
are created for grant maps.
Sponsored by: Citrix Systems R&D
Handle tearing down a blkback that hasn't been fully initialized. This
requires carefully checking that fields are allocated before trying to
access them. Also communication memory is allocated before setting
XBBF_RING_CONNECTED, so gating it's freeing on XBBF_RING_CONNECTED
being set is wrong and will lead to memory leaks.
Also stop using xbb_disconnect() in error paths. Use xenbus_dev_fatal
and let the normal disconnection procedure take care of the cleanup.
Reported by: Ze Dupsys <zedupsys@gmail.com>
Sponsored by: Citrix Systems R&D
xenbus needs to keep track of the devices exposed on xenstore, so that
it can trigger frontend and backend device creation.
Removal of backend devices is currently detected by checking the
existence of the device (backend) xenstore directory, but that's prone
to races as the device driver would usually add entries to such
directory itself, so under certain circumstances it's possible for a
driver to add node to the directory after the toolstack has removed
it. This leads to devices not removed, which can eventually exhaust
the memory of FreeBSD.
Fix this by checking for the existence of the 'state' node instead of
the directory, as such node will always be present when a device is
active, and will be removed by the toolstack when the device is shut
down. In order to avoid any races with the updating of the 'state'
node by FreeBSD and the toolstack removing it use a transaction in
xenbusb_write_ivar() for that purpose.
Reported by: Ze Dupsys <zedupsys@gmail.com>
Sponsored by: Citrix Systems R&D
Adding a few KASSERT() to validate sanity of sack holes, and
bail out if sack hole is inconsistent to avoid panicing non-invariant builds.
Reviewed By: hselasky, glebius
PR: 263445
MFC after: 1 week
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D35387
This allows to profile already running high-priority threads, that
otherwise by blocking thread migration to respective CPUs blocked PMC
management, i.e. profiling could start only when workload completed.
While there, return the thread to its original CPU after iterating
the list. Otherwise all threads using PMC end up on the last CPU.
MFC after: 1 month
This ensures read-only PT_LOAD segments are not marked as writable in
the phdr flags.
Reviewed by: markj
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D35398
In iommu_gas_lowermatch and iommu_gas_uppermatch, a subtree search is
quickly terminated if the largest available free space in the subtree
is below a limit, where that limit is related to the size of the
allocation request. However, that limit is too small; it does not
account for both of the guard pages that will surround the allocated
space, but only for one of them. Consequently, it permits the search
to proceed through nodes that cannot produce a successful allocation
for all the requested space. Fix that limit to improve search
performance.
Reviewed by: alc, kib
Submitted by: Weixi Zhu (wxzhu@rice.edu)
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D35414
Make it a complex, but a single for(;;) statement. The previous cycle
with some loop logic in the beginning and some loop logic at the end
was confusing. Both me and markj@ were misleaded to a conclusion that
some checks are unnecessary, while they actually were necessary.
While here, handle an edge case found by Mark, when on 64-bit platform
an incorrect message from userland would underflow length counter, but
return without any error. Provide a test case for such message.
Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D35375
Summary:
It can be useful to see a summary of CPU caches on bootup. This is done
for most platforms already, so add this to arm64, in the form of (taken
from Apple M1 pro test):
L1 cache: 192KB (instruction), 128KB (data)
L2 cache: 12288KB (unified)
This is printed out per-CPU, only under bootverbose.
Future refinements could instead determine if a cache level is shared
with other cores (L2 is shared among cores on some SoCs, for instance),
and perform a better calculation to the full true cache sizes. For
instance, it's known that the M1 pro, on which this test was done, has 2
12MB L2 clusters, for a total of 24MB. Seeing each CPU with 12288KB L2
would make one think that there's 12MB * NCPUs, for possibly 120MB
cache, which is incorrect.
Sponsored by: Juniper Networks, Inc.
Reviewed by: #arm64, andrew
Differential Revision: https://reviews.freebsd.org/D35366
Make sure to check for NULL pointers and also check all search criterias,
not only the first one!
Bump the FreeBSD version.
Reviewed by: manu@
Differential Revision: https://reviews.freebsd.org/D35403
MFC after: 1 week
Sponsored by: NVIDIA Networking
In lkpi_sta_assoc_to_run() we are going through some code segments
twice (auth->assoc, assoc->authorized). The 2nd time we shall not
re-gain a reference on the net80211 node as otherwise it'll leak.
Likewise we do not have to re-set lsta and sta.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Based on manual inspection the skbs are not freed in those unlikely
cases, though all would give an error message so would have gone noticed
if happened.
While here fix a typo in one of these error messages.
MFC after: 3 days
Manually free the mbuf in certain error cases from net80211 to not
leak it.
Note that the differences between ieee80211_input_mimo() and
ieee80211_input_mimo_all(), the former not consuming the mbuf while
the later does, is confusing.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days