This change introduces loadable fib lookup modules based on
DPDK rte_lpm lib targeted for high-speed lookups in large-scale tables.
It is based on the lookup framework described in D27401.
IPv4 module is called dpdk_lpm4. It wraps around rte_lpm [1] library.
This library implements variation of DIR24-8 [2] lookup algorithm.
Module provide lockless route lookups and in-place incremental updates,
allowing for good RIB performance.
IPv6 module is called dpdk_lpm6. It wraps around rte_lpm6 [3] library.
Implementation can be seen as multi-bit trie where the stride or number of bits
inspected on each level varies from level to level.
It can vary from 1 to 14 memory accesses, with 5 being the average value
for the lengths that are most commonly used in IPv6.
Module provide lockless route lookups for global unicast addresses
and in-place incremental updates, allowing for good RIB performance.
Implementation details:
* wrapper code lives in `sys/contrib/dpdk_rte_lpm/dpdk_lpm[6].c`.
* rte_lpm[6] implementation contains both RIB and FIB code.
. RIB ("rule_") code, backed by array of hash tables part has been commented out,
as base radix already provides all the necessary primitives.
* link-local lookups are currently implemented as base radix lookup.
This part should be converted to something like read-only radix trie.
Usage detail:
Compile kernel with option FIB_ALGO and load dpdk_lpm4/dpdk_lpm6
module at any time. They will be picked up automatically when
amount of routes raises to several thousand.
[1]: https://doc.dpdk.org/guides/prog_guide/lpm_lib.html
[2]: http://yuba.stanford.edu/~nickm/papers/Infocom98_lookup.pdf
[3]: https://doc.dpdk.org/guides/prog_guide/lpm6_lib.html
Differential Revision: https://reviews.freebsd.org/D27412
POSIX O_DSYNC means that writes include an implicit fdatasync(2), just
as O_SYNC implies fsync(2).
VOP_WRITE() functions that understand the new IO_DATASYNC flag can act
accordingly, but we'll still pass down IO_SYNC so that file systems that
don't understand it will continue to provide the stronger O_SYNC
behaviour.
Flag also applies to fcntl(2).
Reviewed by: kib, delphij
Differential Revision: https://reviews.freebsd.org/D25090
Fix non-FreeBSD CI build after v1.4.8. This definition was only used in
zstd(1), which isn't part of non-FreeBSD CI (I guess). The ifdef was
added in v1.4.5 import.
Upstream does not currently support shared-linked zstd(1), but I have
proposed https://github.com/facebook/zstd/pull/2450 . If that is
adopted, we can add -DZSTD_PROGRAMS_LINK_SHARED to our libzstd build and
drop some diffs.
Reported by: uqs
lua: avoid gcc -Wreturn-local-addr bug
Avoid a bug with gcc's -Wreturn-local-addr warning with some
obfuscation. In buggy versions of gcc, if a return value is an
expression that involves the address of a local variable, and even if
that address is legally converted to a non-pointer type, a warning may
be emitted and the value of the address may be replaced with zero.
Howerver, buggy versions don't emit the warning or replace the value
when simply returning a local variable of non-pointer type.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90737
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes#11337
spa: avoid type narrowing warning
Building the spa module for i386 caused gcc to emit
-Wint-to-pointer-cast "cast to pointer from integer of different size"
because spa.spa_did was uint64_t but pthread_join (via thread_join in
spa_deactivate) takes a pointer (32-bit on i386). Define spa_did to be
pointer-size instead. For now spa_did is in fact never non-zero and the
thread_join could instead be ifdef'd out, but changing the size of
spa_did may be more useful for the future.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes#11336
FreeBSD libzfs: gcc requires __thread after static
Building libzfs with gcc on FreeBSD failed because gcc is picky about
the order of keywords in declarations with __thread, whereas clang is
more relaxed.
https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Ryan Libby <rlibby@FreeBSD.org>
Closes#11331
Fix compiling on FreeBSD + gcc - don't assume illmnos bits
This looks like it was once from the illumnos compat code.
FreeBSD doesn't have cmn_err as a compiler format attribute, so
it definitely errors out.
It doesn't show up on LLVM because it doesn't trigger at all.
Add in the format flags but keep them behind #if 0 for now;
there are too many format issues that trigger when one does
format checking in the shared code.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: adrian chadd <adrian@freebsd.org>
Closes#11068Closes#11069
Fix pointer-is-uint64_t-sized assumption in the ioctl path
This shows up when compiling freebsd-head on amd64 using gcc-6.4.
The lib32 compat build ends up tripping over this assumption.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: adrian chadd <adrian@freebsd.org>
Closes#11068Closes#11069
Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.
Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*). Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.
Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys. Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight. Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.
Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.
Suggested by: mav (*)
Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27225
* Use the new API of ena_trace_*
* Fix typo syndrom --> syndrome
* Remove validation of the Rx req ID (already performed in the ena-com)
* Remove usage of deprecated ENA_ASSERT macro
Submitted by: Ido Segev <idose@amazon.com>
Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27115
The latest generation hardware requires IO CQ (completion queue)
descriptors memory to be aligned to a 4K. It needs that feature for
the best performance.
Allocating unaligned descriptors will have a big performance impact as
the packet processing in a HW won't be optimized properly. For that
purpose adjust ena_dma_alloc() to support it.
It's a critical fix, especially for the arm64 EC2 instances.
Submitted by: Ido Segev <idose@amazon.com>
Obtained from: Amazon, Inc
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D27114
They are only there to provide less innacurate statistics for debuggers.
However, this is quite heavy-weight and instead it would be better to
teach debuggers how to obtain the necessary information.
This deduplicates 2 sets of caches using the same sizes.
Memory savings fluctuate a lot, one sample result is buildworld on zfs
saving ~180MB RAM in reduced page count associated with zio caches.
The ZIL will be opened on the first write, not earlier.
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Mariusz Zaborski <oshogbo@vexillium.org>
OpenZFS Pull Request: https://github.com/openzfs/zfs/pull/11152
PR: 250934
My script to convert git commits to svn patch does not handle binary
files correctly, and r367387 committed a set of empty files as a result.
MFC with: r367387
Sponsored by: Rubicon Communications, LLC (Netgate)
lz4 port from illumos to Linux added a 16KB per-CPU cache to accommodate for
the missing 16KB malloc. FreeBSD supports this size, making the extra cache
harmful as it can't share buckets.
The use of atomic_sub_64() in zfs_zstd.c was breaking the 32-bit build on
platforms without native 64-bit atomics due to atomic_sub_64() not being
available, and no fallback being provided in _STANDALONE.
Provide a standalone stub to match atomic_add_64() using simple math.
While this is not actually atomic, it does not matter in libsa context,
since it always runs single-threaded and does not run under a scheduler.
Reviewed by: mjg (in email)
This applies:
commit c4ede65bdf
Author: Mateusz Guzik <mjguzik@gmail.com>
Date: Fri Oct 30 23:26:10 2020 +0100
zstd: track allocator statistics
Note that this only tracks sizes as requested by the caller.
Actual allocated space will almost always be bigger (e.g., rounded up to
the next power of 2 or page size). Additionally the allocated buffer may
be holding other areas hostage. Nonetheless, this is a starting point
for tracking memory usage in zstd.
from openzfs
Foundation copyrights, approved by emaste@. It does not include
files which carry other people's copyrights; if you're one
of those people, feel free to make similar change.
Reviewed by: emaste, imp, gbe (manpages)
Differential Revision: https://reviews.freebsd.org/D26980
hese kstats are often expensive to compute so we want to avoid them
unless specifically requested.
The following kstats are affected by this change:
kstat.zfs.${pool}.multihost
kstat.zfs.${pool}.misc.state
kstat.zfs.${pool}.txgs
kstat.zfs.misc.fletcher_4_bench
kstat.zfs.misc.vdev_raidz_bench
kstat.zfs.misc.dbufs
kstat.zfs.misc.dbgmsg
PR: 249258
Reported by: mjg
Reviewed by: mjg, allanjude
Obtained from: https://github.com/openzfs/zfs/pull/11099
Sponsored by: iXsystems, Inc.
- fix panic due to tqid overflow
- Improve libzfs_error_init messages
- Expose zfetch_max_idistance tunable
- Make dbufstat work on FreeBSD
- Fix EIO after resuming receive of new dataset over an existing one
Some vnodes come with a hack which inherits the fplookup flag despite having vops
which don't provide the routine.
Reported by: YAMAMOTO Shigeru <shigeru@os-hackers.jp>