This fixes -Wint-to-pointer-cast errors with GCC when compiling on
i386 where physical addresses are not the same size as pointers.
Reviewed by: mav, imp
Differential Revision: https://reviews.freebsd.org/D36751
The telnetd codebase is unmaintained and has a number of quality
issues. Telnet has been largely supplanted by ssh. If needed, a port is
available (net/freebsd-telnetd), but a more maintained implementation
should be prefered.
While the telnet client suffers from the same issues, it is deemed
to be of lower risk and is required to connect to legacy devices, so
it remains.
Reviewed by: emaste, imp
Differential Revision: https://reviews.freebsd.org/D36620
ZED does not take any action for disk removal events if there is no
spare VDEV available. Added zpool_vdev_remove_wanted() in libzfs
and vdev_remove_wanted() in vdev.c to remove the VDEV through ZED
on removal event. This means that if you are running zed and
remove a disk, it will be properly marked as REMOVED.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Ameer Hamza <ahamza@ixsystems.com>
Closes#13797
Scoreboard is needed a moment when smp_started == true. If some kernel
daemon thread is started before scoreboard is inited, and does some pmap
operation that requires TLB maintanence, which races with SMP startup,
we might dereference NULL invl_scoreboard. This is particularly easy
to trigger when EARLY_AP_STARTUP is not defined.
Reported by: glebius
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D36766
Protocols such as netlink may need a large socket receive buffer,
measured in tens of megabytes. This change allows netlink to
set larger socket buffers (given the privs are in place), without
requiring user to manuall bump maxsockbuf.
Reviewed by: glebius
Differential Revision: https://reviews.freebsd.org/D36747
Allow to set custom per-protocol handlers for the socket buffers
ioctls by introducing pr_setsbopt callback with the default value
set to the currently-used sbsetopt().
Reviewed by: glebius
Differential Revision: https://reviews.freebsd.org/D36746
On systems with different CPUs we may print all the ID registers for
all CPUs. Reduce this to just print them when they change from the
previous CPU.
Sponsored by: The FreeBSD Foundation
The implementation of sysctl_search_oid no longer relies on the
initial value of nodes to be all NULL, so remove the comment that
demands it and let the caller stop enforcing it.
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D36768
This cherry-picks upstream eb9bec0a5d
The current value causes significant artificial slowdown during mass
parallel file removal, which can be observed both on FreeBSD and Linux
when running real workloads.
Sample results from Linux doing make -j 96 clean after an allyesconfig
modules build:
before: 4.14s user 6.79s system 48% cpu 22.631 total
after: 4.17s user 6.44s system 153% cpu 6.927 total
FreeBSD results in the ticket.
See https://github.com/openzfs/zfs/issues/13932
The current value causes significant artificial slowdown during mass
parallel file removal, which can be observed both on FreeBSD and Linux
when running real workloads.
Sample results from Linux doing make -j 96 clean after an allyesconfig
modules build:
before: 4.14s user 6.79s system 48% cpu 22.631 total
after: 4.17s user 6.44s system 153% cpu 6.927 total
FreeBSD results in the ticket.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Closes#13932Closes#13938
Add missing header.
Properly ignore return values.
Memory leak/unchecked malloc. We do allocate a bit too early (and
fail to validate the result). From this, smatch is angry when we
overwrite the value of 'node' later.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Signed-off-by: Toomas Soome <tsoome@me.com>
Closes#13941
Bit 28 is used by an internal Nutanix feature which might be
upstreamed in the future.
Bit 29 is the last unused bit. It is reserved to indicate a
to-be-designed extension to the stream format which will accomodate
more feature flags.
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Issue #13795Closes#13796
Commit 985c33b132 added DMU_BACKUP_FEATURE_BLAKE3 but it is not used by
the code.
Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Christian Schwarz <christian.schwarz@nutanix.com>
Issue #13795Closes#13796
Clang's static analyzer found that config.uid is uninitialized when
zfs_key_config_load() returns an error.
Oddly, this was not included in the unchecked return values that
Coverity found.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes#13957
Coverity caught unsafe use of `strcpy()` in `ztest_dmu_objset_own()`,
`nfs_init_tmpfile()` and `dump_snapshot()`. It also caught an unsafe use
of `strlcat()` in `nfs_init_tmpfile()`.
Inspired by this, I did an audit of every single usage of `strcpy()` and
`strcat()` in the code. If I could not prove that the usage was safe, I
changed the code to use either `strlcpy()` or `strlcat()`, depending on
which function was originally used. In some cases, `snprintf()` was used
to replace multiple uses of `strcat` because it was cleaner.
Whenever I changed a function, I preferred to use `sizeof(dst)` when the
compiler is able to provide the string size via that. When it could not
because the string was passed by a caller, I checked the entire call
tree of the function to find out how big the buffer was and hard coded
it. Hardcoding is less than ideal, but it is safe unless someone shrinks
the buffer sizes being passed.
Additionally, Coverity reported three more string related issues:
* It caught a case where we do an overlapping memory copy in a call to
`snprintf()`. We fix that via `kmem_strdup()` and `kmem_strfree()`.
* It caught `sizeof (buf)` being used instead of `buflen` in
`zdb_nicenum()`'s call to `zfs_nicenum()`, which is passed to
`snprintf()`. We change that to pass `buflen`.
* It caught a theoretical unterminated string passed to `strcmp()`.
This one is likely a false positive, but we have the information
needed to do this more safely, so we change this to silence the false
positive not just in coverity, but potentially other static analysis
tools too. We switch to `strncmp()`.
* There was a false positive in tests/zfs-tests/cmd/dir_rd_update.c. We
suppress it by switching to `snprintf()` since other static analysis
tools might complain about it too. Interestingly, there is a possible
real bug there too, since it assumes that the passed directory path
ends with '/'. We add a '/' to fix that potential bug.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes#13913
Coverity complains about a possible NULL pointer dereference. This is
impossible, but it suspects it because we do a NULL check against
`spa->spa_root_vdev`. This NULL check was never necessary and makes the
code harder to understand, so we drop it.
In particular, we dereference `spa->spa_root_vdev` when `new_state !=
POOL_STATE_UNINITIALIZED && !hardforce`. The first is only true when
spa_reset is called, which only occurs under fault injection. The
second is true unless `zpool export -F $POOLNAME` is used. Therefore,
we effectively *always* dereference the pointer. In the cases where we
do not, there is no reason to think it is unsafe. Therefore this change
is safe to make.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes#13905
Apply the fix from upstream.
http://www.lua.org/bugs.html#5.2.2-1https://www.opencve.io/cve/CVE-2014-5461
It should be noted that exploiting this requires the `SYS_CONFIG`
privilege, and anyone with that privilege likely has other opportunities
to do exploits, so it is unlikely that bad actors could exploit this
unless system administrators are executing untrusted ZFS Channel
Programs.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes#13949
In #13871, zfs_vdev_aggregation_limit_non_rotating and
zfs_vdev_aggregation_limit being signed was pointed out as a possible
reason not to eliminate an unnecessary MAX(unsigned, 0) since the
unsigned value was assigned from them.
There is no reason for these module parameters to be signed and upon
inspection, it was found that there are a number of other module
parameters that are signed, but should not be, so we make them unsigned.
Making them unsigned made it clear that some other variables in the code
should also be unsigned, so we also make those unsigned. This prevents
users from setting negative values that could potentially cause bad
behaviors. It also makes the code slightly easier to understand.
Mostly module parameters that deal with timeouts, limits, bitshifts and
percentages are made unsigned by this. Any that are boolean are left
signed, since whether booleans should be considered signed or unsigned
does not matter.
Making zfs_arc_lotsfree_percent unsigned caused a
`zfs_arc_lotsfree_percent >= 0` check to become redundant, so it was
removed. Removing the check was also necessary to prevent a compiler
error from -Werror=type-limits.
Several end of line comments had to be moved to their own lines because
replacing int with uint_t caused us to exceed the 80 character limit
enforced by cstyle.pl.
The following were kept signed because they are passed to
taskq_create(), which expects signed values and modifying the
OpenSolaris/Illumos DDI is out of scope of this patch:
* metaslab_load_pct
* zfs_sync_taskq_batch_pct
* zfs_zil_clean_taskq_nthr_pct
* zfs_zil_clean_taskq_minalloc
* zfs_zil_clean_taskq_maxalloc
* zfs_arc_prune_task_threads
Also, negative values in those parameters was found to be harmless.
The following were left signed because either negative values make
sense, or more analysis was needed to determine whether negative values
should be disallowed:
* zfs_metaslab_switch_threshold
* zfs_pd_bytes_max
* zfs_livelist_min_percent_shared
zfs_multihost_history was made static to be consistent with other
parameters.
A number of module parameters were marked as signed, but in reality
referenced unsigned variables. upgrade_errlog_limit is one of the
numerous examples. In the case of zfs_vdev_async_read_max_active, it was
already uint32_t, but zdb had an extern int declaration for it.
Interestingly, the documentation in zfs.4 was right for
upgrade_errlog_limit despite the module parameter being wrongly marked,
while the documentation for zfs_vdev_async_read_max_active (and friends)
was wrong. It was also wrong for zstd_abort_size, which was unsigned,
but was documented as signed.
Also, the documentation in zfs.4 incorrectly described the following
parameters as ulong when they were int:
* zfs_arc_meta_adjust_restarts
* zfs_override_estimate_recordsize
They are now uint_t as of this patch and thus the man page has been
updated to describe them as uint.
dbuf_state_index was left alone since it does nothing and perhaps should
be removed in another patch.
If any module parameters were missed, they were not found by `grep -r
'ZFS_MODULE_PARAM' | grep ', INT'`. I did find a few that grep missed,
but only because they were in files that had hits.
This patch intentionally did not attempt to address whether some of
these module parameters should be elevated to 64-bit parameters, because
the length of a long on 32-bit is 32-bit.
Lastly, it was pointed out during review that uint_t is a better match
for these variables than uint32_t because FreeBSD kernel parameter
definitions are designed for uint_t, whose bit width can change in
future memory models. As a result, we change the existing parameters
that are uint32_t to use uint_t.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Neal Gompa <ngompa@datto.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes#13875
Coverity found a bug in `zfs_secpolicy_create_clone()` where it is
possible for us to pass an unterminated string when `zfs_get_parent()`
returns an error. Upon inspection, it is clear that using `strlcpy()`
would have avoided this issue.
Looking at the codebase, there are a number of other uses of `strncpy()`
that are unsafe and even when it is used safely, switching to
`strlcpy()` would make the code more readable. Therefore, we switch all
instances where we use `strncpy()` to use `strlcpy()`.
Unfortunately, we do not portably have access to `strlcpy()` in
tests/zfs-tests/cmd/zfs_diff-socket.c because it does not link to
libspl. Modifying the appropriate Makefile.am to try to link to it
resulted in an error from the naming choice used in the file. Trying to
disable the check on the file did not work on FreeBSD because Clang
ignores `#undef` when a definition is provided by `-Dstrncpy(...)=...`.
We workaround that by explictly including the C file from libspl into
the test. This makes things build correctly everywhere.
We add a deprecation warning to `config/Rules.am` and suppress it on the
remaining `strncpy()` usage. `strlcpy()` is not portably avaliable in
tests/zfs-tests/cmd/zfs_diff-socket.c, so we use `snprintf()` there as a
substitute.
This patch does not tackle the related problem of `strcpy()`, which is
even less safe. Thankfully, a quick inspection found that it is used far
more correctly than strncpy() was used. A quick inspection did not find
any problems with `strcpy()` usage outside of zhack, but it should be
said that I only checked around 90% of them.
Lastly, some of the fields in kstat_t varied in size by 1 depending on
whether they were in userspace or in the kernel. The origin of this
discrepancy appears to be 04a479f706 where
it was made for no apparent reason. It conflicts with the comment on
KSTAT_STRLEN, so we shrink the kernel field sizes to match the userspace
field sizes.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Ryan Moeller <ryan@iXsystems.com>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes#13876
When receiving full/newfs on existing dataset, then it should be done
with "-F" flag. Its enforced for initial receive in checks done in
zfs_receive_one function of libzfs. Similarly, on resuming full/newfs
recv on existing dataset, it should be done with "-F" flag.
When dataset doesn't exist, then full/new recv is done on newly created
dataset and it's marked INCONSISTENT. But when receiving on existing
dataset, recv is first done on %recv and its marked INCONSISTENT.
Existing dataset is not marked INCONSISTENT. Resume of full/newfs
receive with dataset not INCONSISTENT indicates that its resuming newfs
on existing dataset. So, enforce "-F" flag in this case.
Also return an error from dmu_recv_resume_begin_check() in zfs kernel,
when its resuming full/newfs recv without force.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Jitendra Patidar <jitendra.patidar@nutanix.com>
Closes#13856Closes#13857
Merge commit 307ace7f20d5 from llvm git (by David Sherwood):
[LoopVectorize] Ensure the VPReductionRecipe is placed after all it's inputs
When vectorising ordered reductions we call a function
LoopVectorizationPlanner::adjustRecipesForReductions to replace the
existing VPWidenRecipe for the fadd instruction with a new
VPReductionRecipe. We attempt to insert the new recipe in the same
place, but this is wrong because createBlockInMask may have
generated new recipes that VPReductionRecipe now depends upon. I
have changed the insertion code to append the recipe to the
VPBasicBlock instead.
Added a new RUN with tail-folding enabled to the existing test:
Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
Differential Revision: https://reviews.llvm.org/D129550
Reported by: yuri
PR: 264834
MFC after: 3 days
The SAN interceptors simply take a bus_space_tag_t, so we're
dropping qualifiers here. const semantics with a ptr typedef mean we'd
have to drop or change the bus_space_tag_t abstraction used in the SAN
sanitizers in order to make a compatible change there, which likely
isn't worth it.
Reviewed by: andrew, markj
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D36764
and we checked that it is not reclaimed.
Reviewed by: markj, rmacklem
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D36722
Clang's static analyzer found a bad free caused by skein_mac_atomic().
It will allocate a context on the stack and then pass it to
skein_final(), which attempts to free it. Upon inspection,
skein_digest_atomic() also has the same problem.
These functions were created to match the OpenSolaris ICP API, so I was
curious how we avoided this in other providers and looked at the SHA2
code. It appears that SHA2 has a SHA2Final() helper function that is
called by the exported sha2_mac_final()/sha2_digest_final() as well as
the sha2_mac_atomic() and sha2_digest_atomic() functions. The real work
is done in SHA2Final() while some checks and the free are done in
sha2_mac_final()/sha2_digest_final().
We fix the use after free in the skein code by taking inspiration from
the SHA2 code. We introduce a skein_final_nofree() that does most of the
work, and make skein_final() into a function that calls it and then
frees the memory.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Closes#13954
sysctl_search_old makes several tests in a loop that can be removed.
The first test in the loop is only ever true on the first loop
iteration, and is always true on that iteration, so its work can be
done before the loop begins.
The upper and lower bounds on the loop variable 'indx' are each tested
on each iteration, but 'indx' is changed in one direction or the other
only once within the loop, so only one bound needs to be checked.
Two ways remain in the loop that nodes[indx] can change (after one of
them is put before the loop start), and one of them applies exactly
when indx has been incremented, so no separate test for that case
requires testing.
Restructure and add comments that makes clearer that this is a basic
depth-first search.
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D36741
Rack has had the ability to timeout connections that just sit idle automatically. This
feature of course is off by default and requires the user set it on (though the socket option
has been missing in tcp_usrreq.c). Lets get the progress timeout fully supported in
the base stack as well as rack.
Reviewed by: tuexen
Sponsored by: Netflix Inc
Differential Revision: https://reviews.freebsd.org/D36716
sysctl_register_oid must check the uniqueness of any newly computed
oid_number in sysctl_register_oid.
Reviewed by: asomers
MFC with: d3f96f6610
Differential Revision: https://reviews.freebsd.org/D36743
To avoid using the sysctl list macros directly in external kernel modules.
Reviewed by: asomers, manu and asiciliano
Differential Revision: https://reviews.freebsd.org/D36748
MFC after: 1 week
Sponsored by: NVIDIA Networking
Specifically, when we receive a violation and we're configured to panic,
kasan_enabled gets unset before we descend into panic(). At this point,
there's no longer any reason to allow marking as kasan_shadow_check() is
disabled -- we have some inherent risk of faulting or panicking if the
system's in a bad enough state with no benefit.
Reviewed by: markj
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D36742
The "update" mount option must be specified when the "snapshot"
mount option is used. Return EINVAL if the "snapshot" option is
specified without the "update" option also requested.
Reported by: Robert Morris
Reviewed by: kib
PR: 265362
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
As with amd64 use a per-domain pv chunk lock to reduce contention as
chunks get created and removed all the time.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36307
As with amd64 batch chunk removal in pmap_remove_pages to move it out
of the pv list lock. This is one of the main contested locks when
running poudriere on a 160 core Ampere Altra server.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36305
Reduce holes in pmap_bootstrap_state by moving freemempos after the
pointers as they are more likely to change size in any future ABI.
Sponsored by: The FreeBSD Foundation
* Factor out queue selection (epair_select_queue()) and mbuf
preparation (epair_prepare_mbuf()) from epair_menq(). It simplifies
epair_menq() implementation and reduces the amount of dependencies
on the neighbouring epair.
* Use dedicated epair_set_state() instead of 2-lines copy-paste
* Factor out unit selection code (epair_handle_unit()) from
epair_clone_create(). It simplifies the clone creation logic.
Reviewed By: kp
Differential Revision: https://reviews.freebsd.org/D36689
Summary:
Move SIOCAIFADDR_IN6 (current "primary" ioctl to add an IPv6
interface address) handling code to the dedicated in6_addifaddr()
function and make it a part of KPI. This allows in-kernel users to
add/delete interfaces addresses without relying on ioctl interface.
Subscribers: imp, ae, glebius
Differential Revision: https://reviews.freebsd.org/D36713
Rework the pmap_bootstrap table generation so we can use it with
partially filled out page tables after the DMAP has been bootstrapped.
This allows it to be reused by the later bootstrap code.
Sponsored by: The FreeBSD Foundation
* change current NTP services offered by the FreeBSD Installer;
* no longer offer ntpdate to be enabled and started on boot;
* start offering the option to make ntpd set the date and time on boot itself.
The motivation for this change comes from the ntpdate(8) manpage:
Note: The functionality of this program is now available in the ntpd(8)
program. See the -q command line option in the ntpd(8) page. After a
suitable period of mourning, the ntpdate utility is to be retired from
this distribution.
Approved by: cy (src), dteske (src)
Differential Revision: https://reviews.freebsd.org/D36206