Notable upstream pull request merges:
#12285 Introduce a tunable to exclude special class buffers from L2ARC
#12689 Check l2cache vdevs pending list inside the vdev_inuse()
#12735 Enable edonr in FreeBSD
#12743 FreeBSD: fix world build after de198f2
#12745 Restore dirty dnode detection logic
Obtained from: OpenZFS
OpenZFS commit: 269b5dadcfd1d5732cf763dddcd46009a332eae4
DIOCKEEPCOUNTERS used to overlap with DIOCGIFSPEEDV0, which has been
fixed in 14, but remains in stable/12 and stable/13.
Support the old, overlapping, call under COMPAT_FREEBSD13.
Reviewed by: jhb
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33001
Turn on compat option for older FreeBSD versions (i.e. 12). We do not
enable the compat options for 11 or older because riscv was never
supported in those versions.
Reviewed by: jrtc27 (previous version)
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33015
Since vfs.nfsd.srvmaxio can only be set when nfsd.ko
is loaded, but nfsd is not running, setting it in
/etc/sysctl.conf is not feasible when "options NFSD"
was not specified for the kernel.
This patch adds a new rc variable nfs_server_maxio,
which sets vfs.nfsd.srvmaxio at the correct time.
rc.conf.5 will be patched separately.
Reviewed by: 0mp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D32997
There likely should be a macro for the ports that support lto, but I'm
making sure that all the mips things build before decommissioning it and
this is the only thing that's broken...
Sponsored by: Netflix
This reverts commit 020f4112559ebf7e94665c9a69f89d21929ce82a.
Because now ASLR is enabled by default for 64-bit architectures
and the purpose of the installation menu is to allow choosing
additional 'mitigation'/'hardening' options that are originally
disabled, remove the ASLR knob from bsdinstall.
Discussed with: emaste
Obtained from: Semihalf
Sponsored by: Stormshield
Address Space Layout Randomization (ASLR) is an exploit mitigation
technique implemented in the majority of modern operating systems.
It involves randomly positioning the base address of an executable
and the position of libraries, heap, and stack, in a process's address
space. Although over the years ASLR proved to not guarantee full OS
security on its own, this mechanism can make exploitation more difficult.
Tests on the tier 1 64-bit architectures demonstrated that the ASLR is
stable and does not result in noticeable performance degradation,
therefore it should be safe to enable this mechanism by default.
Moreover its effectiveness is increased for PIE (Position Independent
Executable) binaries. Thanks to commit 9a227a2fd642 ("Enable PIE by
default on 64-bit architectures"), building from src is not necessary
to have PIE binaries. It is enough to control usage of ASLR in the
OS solely by setting the appropriate sysctls.
This patch toggles the kernel settings to use address map randomization
for PIE & non-PIE 64-bit binaries. It also disables SBRK, in order
to allow utilization of the bss grow region for mappings. The latter
has no effect if ASLR is disabled, so apply it to all architectures.
As for the drawbacks, a consequence of using the ASLR is more
significant VM fragmentation, hence the issues may be encountered
in the systems with a limited address space in high memory consumption
cases, such as buildworld. As a result, although the tests on 32-bit
architectures with ASLR enabled were mostly on par with what was
observed on 64-bit ones, the defaults for the former are not changed
at this time. Also, for the sake of safety keep the feature disabled
for 32-bit executables on 64-bit machines, too.
The committed change affects the overall OS operation, so the
following should be taken into consideration:
* Address space fragmentation.
* A changed ABI due to modified layout of address space.
* More complicated debugging due to:
* Non-reproducible address space layout between runs.
* Some debuggers automatically disable ASLR for spawned processes,
making target's environment different between debug and
non-debug runs.
In order to confirm/rule-out the dependency of any encountered issue
on ASLR it is strongly advised to re-run the test with the feature
disabled - it can be done by setting the following sysctls
in the /etc/sysctl.conf file:
kern.elf64.aslr.enable=0
kern.elf64.aslr.pie_enable=0
Co-developed by: Dawid Gorecki <dgr@semihalf.com>
Reviewed by: emaste, kib
Obtained from: Semihalf
Sponsored by: Stormshield
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D27666
The code is integrated, builds fine, runs fine, there's not really
any reason not to.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes#12735
m_apply() works on unmapped mbufs, so this will let us elide
mb_unmapped_to_ext() calls preceding sctp_calculate_cksum() calls in
the network stack.
Modify sctp_calculate_cksum() to assume it's passed an mbuf header.
This assumption appears to be true in practice, and we need to know the
full length of the chain.
No functional change intended.
Reviewed by: tuexen, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32941
Some upcoming changes will modify software checksum routines like
in_cksum() to operate using m_apply(), which uses the direct map to
access packet data for unmapped mbufs. This approach of course does not
work on platforms without a direct map, so we have to disallow the use
of unmapped mbufs on such platforms.
I believe this is the right tradeoff: we only configure KTLS on amd64
and arm64 today (and one KTLS consumer, NFS TLS, requires a direct map
already), and the use of unmapped mbufs with plain sendfile is a recent
optimization. If need be, m_apply() could be modified to create
CPU-private mappings of extpg mbuf pages as a fallback.
So, change mb_use_ext_pgs to be hard-wired to zero on systems without a
direct map. Note that PMAP_HAS_DMAP is not a compile-time constant on
some systems, so the default value of mb_use_ext_pgs has to be
determined during boot.
Reviewed by: jhb
Discussed with: gallatin
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32940
In accordance with a SHOULD in RFC 4861, rtsol and rtsold wait a
random time between zero and one (aka MAX_RTR_SOLICITATION_DELAY)
seconds before sending a Router Solicitation, in order to avoid
network congestion if many hosts come online at once. (The
question of how many hosts would be required to cause congestion
by each sending a single packet on a Gbps+ network is left to the
reader.)
The new option -i disables this wait and instructs rtsol and rtsold
to send the Router Solicitation immediately.
Reviewed by: bz, kp (earlier version)
MFC after: 1 week
Relnotes: yes
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D32956
An interface was added to derive an implied TSC frequency from pvclock
in 2015, but this interface was never exposed anywhere user-visible.
Reviewed by: kib, bryanv
Differential Revision: https://reviews.freebsd.org/D32974
This was introduced in 2014 along with the comment (which has since
been deleted):
/* Introduce an annoying delay to stop swamping */
Modern cryptographic random number generators can ingest arbitrarily
large amounts of non-random (or even maliciously selected) input
without losing their security.
Depending on the number of "boot entropy files" present on the system,
this can speed up the boot process by up to 1 second.
Reviewed by: cem
MFC ater: 1 week
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D32984
The mntfs vnode lock should be before topology, as established in
ffs_mountfs(). Extend the locked region in ffs_unmount().
Reported and reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33013
The TLS header length field is set by the kernel, so if it is
incorrect that is an indication of a kernel bug, not an internal error
in the tests.
Prompted by: markj (comment in an earlier review)
Reviewed by: markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D33003
VOP_FSYNC() asserts that the vnode is exclusively locked for NFS.
If we try to execute file with recently modified content, the assert is
triggered.
Reviewed by: rmacklem
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32999
The inline function vn_flush_cached_data() in vnode.h
must not be compiled when building BASE.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Martin Matuska <mm@FreeBSD.org>
Closes#12743
Don't exit early in find_rootfs() when zpool.bootfs
is set to `zfs:AUTO`.
Reviewed-by: Richard Laager <rlaager@wiktel.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: szubersk <szuberskidamian@gmail.com>
Closes#12658
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@ixsystems.com>
Signed-off-by: Pawel Jakub Dawidek <pawel@dawidek.net>
Closes#12748
Special allocation class or dedup vdevs may have roughly the same
performance as L2ARC vdevs. Introduce a new tunable to exclude those
buffers from being cacheable on L2ARC.
Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
Closes#11761Closes#12285
The l2cache device could be added twice because vdev_inuse() does not
check spa_l2cache for added devices. Make l2cache vdevs inuse checking
logic more closer to spare vdevs.
Reviewed-by: George Amanakis <gamanakis@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes#9153Closes#12689
In case if all label checksums will be invalid on any vdev, the pool
will become unimportable. The zhack with newly added cli options could
be used to restore label checksums and make pool importable again.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes#2510Closes#12686
When ZFS is on root, /tmp is a ZFS. This causes zfs_list_004_neg to
fail since `zfs list` on /tmp passes when the test expects it not to.
The fix is to exclude paths that belong to ZFS.
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Palash Gandhi <pbg4930@rit.edu>
Closes#12744
In addition to flushing memory mapped regions when checking holes,
commit de198f2d95 modified the dirty dnode detection logic to check
the dn->dn_dirty_records instead of the dn->dn_dirty_link. Relying
on the dirty record has not be reliable, switch back to the previous
method.
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #11900Closes#12745
This test case may fail on 5.13 and newer Linux kernels if the
/dev/zvol/ device is not created by udev.
Reviewed-by: Rich Ercolani <rincebrain@gmail.com>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #12301
Closes #12738
In case if all label checksums will be invalid on any vdev, the pool
will become unimportable. From other side zdb with -l option will not
provide any useful information why it happened. Add notifications
about corrupted label checksums.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Fedor Uporov <fuporov.vstack@gmail.com>
Closes#2509Closes#12685
Summary: The manual pages need a bit of editing for language and clarity.
Reviewers: oshogbo, #manpages
Subscribers: imp
Differential Revision: https://reviews.freebsd.org/D32976
This makes GOP not probed on some situation (AMD Card on PCIe slot
with EDK2 as we have a SERIAL_IO_PROTOCOL compatible uart).
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D32992
Sponsored by: Beckhoff Automation GmbH & Co. KG
Printing the EFI_HANDLE pointer isn't very useful.
If the handle have a IMAGE_DEVICE_PATH or a DEVICE_PATH protocol print it.
This makes it easier to see which devices are present and what protocol they
expose.
Reviewed by: imp, tsoome
Differential Revision: https://reviews.freebsd.org/D32991
Sponsored by: Beckhoff Automation GmbH & Co. KG
Minor reformatting nits to make mprsas_scsiio_timeout match
mpssas_scsiio_timeout more closely. The differences aren't necessary and
are distracting when comparing the routines. No functional changes.
Sponsored by: Netflix
Normally a UFS/FFS filesystem with a bad check hash can only be
mounted read only. With this commit the mount(8) -f (force) option
can be used to force a read-write mount of a UFS/FFS filesystem with
a bad check hash. Conveniently the filesystem will proceed to
update its on-disk superblock with a corrected check hash.
Sponsored by: Netflix
When doing an initial mount(8) with its -f (force) flag, the MNT_FORCE
flag is not passed through to the underlying filesystem mount routine.
MNT_FORCE is only passed through on later updates to an existing
mount. With this commit the MNT_FORCE flag is now passed through on the
initial mount.
Sanity check: kib
Sponsored by: Netflix
There is no universal way to find the TSC frequency. Newer Intel CPUs
may report it via CPUID leaves 0x15 and 0x16. Sometimes it can be
obtained from the PLATFORM_INFO MSR as well, though we never use that.
On older platforms we derive the frequency using a DELAY(1000000) call,
which uses the 8254 PIT. On some newer platforms the 8254 is apparently
non-functional, leading to bogus calibration results. On such platforms
the TSC frequency must be available from CPUID. It is also possible to
disable calibration with a tunable, in which case we try to parse the
brand string if the TSC freq is not available from CPUID.
CPUID 0x15 provides an authoritative TSC frequency value, but even that
is not always available on new Intel platforms. CPUID 0x16 provides the
specified processor base frequency, which is not the same as the TSC
frequency. Empirically, it is close enough for early boot, but too far
off for timekeeping: on a Comet Lake NUC, CPUID 0x16 yields 1600MHz but
the TSC frequency is rougly 1608MHz, leading to frequent clock stepping
when NTP is in use.
Thus we have a situation where we cannot calibrate using the PIT and
cannot obtain a precise frequency from CPUID (or MSRs). This change
seeks to address that by using the CPUID 0x16 value during early boot
and refining the calibration later once ACPI-based timecounters are
available. TSC frequency detection is thus split into two phases:
Early phase:
- On Intel platforms, query CPUID 0x15 and 0x16 and use that value
initially if available.
- Otherwise, get an estimate using the PIT, reducing the delay loop to
100ms from 1s.
- Continue to register the TSC as the CPU ticks provider early, even
though the frequency may be off. Otherwise any code executed during
boot that uses cpu_ticks() (e.g., context switching) gets tripped up
when the ticks provider changes.
Later phase:
- In SI_SUB_CLOCKS, once the timehands are initialized, load the current
TSC and timecounter (sbinuptime()) values at the beginning and end of
a 1s interval and use the timecounter frequency (typically from
kvmclock, HPET or the ACPI PM timer) to estimate the TSC frequency.
- Update the TSC timecounter, global tsc_freq and CPU ticker with the
new frequency and finally register the TSC as a timecounter.
Reviewed by: kib, jhb (previous version)
Discussed with: imp, cperciva
MFC after: 6 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32512
This is how most SYSINITs are defined. Also annotate the dummy
parameter with __unused. No functional change intended.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
The -a option also requires passing specific environment variables to
instance of rtld doing tracing.
PR: 259069
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
For some cloud/virtualization use cases it can be convenient to grow the
filesystem on boot any time the disk/partition happens to be larger, but
not fail if it remains the same size.
Continue to emit a message if we have no action to take, but exit with
status 0 if the size remains the same.
Reviewed by: trasz
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32856
Modify the GPIO pins only on the Base-T cards and even there drive all
of them low instead of putting them in hi-z state. For the rest (this
is the common case), directly power off the PLLs of the high speed
serdes. This is the simplest method that does not involve or conflict
with the firmware but still works with all T4-T6 cards regardless of
what's plugged into the port.
This fixes a problem where the peer wouldn't always see a link down if
it is connected to the device using a -CR4 copper cable.
MFC after: 3 weeks
Sponsored by: Chelsio Communications