As long as we support ZFS on 32-bit platforms we should do this for all
64-bit variables that are modified in a lockless fashion using atomic
operations. Otherwise, there is a risk of a reading a torn value.
Here is a rationale for why I am doing this in dmu_object_alloc_impl:
- it's very recent code
- the code deals with object IDs and a number of objects in a file
system can overflow 32 bits
- incorrect allocation of an object ID may result in hard to debug
problems
- fixing all plain reads of 64-bit atomic variables is not a trivial
undertaking to do in one shot, so I chose to do it incrementally
MFC after: 3 weeks
X-MFC after: r353301, r353176
Make sure the vnet_shutdown field is not set until after all
VNET_SYSUNINIT()'s in the SI_SUB_VNET_DONE subsystem have been
executed. Especially the vnet_if_return() functions requires that
if_move() is still operational.
Reported by: lwhsu@
MFC after: 1 week
Sponsored by: Mellanox Technologies
At the moment i386 does not provide 64-bit atomic operations in
userland. Exposing some atomic_*_64 defines can cause unnecessary
confusion.
Discussed with: kib
MFC after: 2 weeks
In case the implementation ever changes from using a chain of next pointers,
then changing the macro definition will be necessary, but changing all the
files that iterate over vm_map entries will not.
Drop a counter in vm_object.c that would have an effect only if the
vm_map entry count was wrong.
Discussed with: alc
Reviewed by: markj
Tested by: pho (earlier version)
Differential Revision: https://reviews.freebsd.org/D21882
Add generic PVR values for PowerISA 2.07 and 3.00. This allows booting pseries
in QEMU with compatibilty mode enabled.
Submitted by: Shawn Anastasio <shawn@anastas.io>
|
This adds two implementations for each atomic_fcmpset_ and atomic_cmpset_
short and char functions, selectable at compile time for the target
architecture. By default, it uses a generic shift-and-mask to perform atomic
updates to sub-components of 32-bit words from <sys/_atomic_subword.h>.
However, if ISA_206_ATOMICS is defined it uses the ll/sc instructions for
halfword and bytes, introduced in PowerISA 2.06. These instructions are
supported by all IBM processors from POWER7 on, as well as the Freescale/NXP
e6500 core. Although the e5500 and e500mc both implement PowerISA 2.06 they
do not implement these instructions.
As part of this, clean up the atomic_(f)cmpset_acq and _rel wrappers, by
using macros to reduce code duplication.
ISA_206_ATOMICS requires clang or newer binutils (2.20 or later).
Differential Revision: https://reviews.freebsd.org/D21682
Acquire the inp lock before checking whether the socket is already bound,
and around updates to the inp_vflag field.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21867
About 11 minutes of poudriere -s -j 104 and probing on return value of
trylocks reveals that over 10% of attempts fail, which in turn means
there are more atomics performed than necessary.
Trylocking was there to try preventing migration, but it's not very likely
to happen if the lock is uncontested.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21925
When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.
However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.
Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.
On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().
This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.
Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.
This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.
Reviewed by: gallatin, hselasky, cy, adrian, kristof
Differential Revision: https://reviews.freebsd.org/D19111
The new sysctl was not added to the siftr.4 man page at the time.
This updates the man page, and removes one left over trailing whitespace.
Submitted by: Richard Scheffenegger
Reviewed by: bcr@
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D21619
* Fix force_sync_path, which ensures that a file is fully flushed to disk.
Apparently "zpool history"'s performance has improved, but exporting and
importing the pool still works.
* Fix file_dva by using undocumented zdb syntax to clarify that we're
interested in the pool's root file system, not the pool itself. This
should also fix the zpool_clear_001_pos test.
* Remove a redundant cleanup step
MFC after: 2 weeks
Sponsored by: Axcient
Differential Revision: https://reviews.freebsd.org/D21901
These tests have never worked correctly
* Replace runwattr with sudo
* Fix a scoping bug with the "dtst" variable
* Cleanup user properties created during tests
* Eliminate the checks for refreservation and send support. They will always
be supported.
* Fix verify_fs_snapshot. It seemed to assume that permissions would not yet
be delegated, but that's not how it's actually used.
* Combine verify_fs_promote with verify_vol_promote
* Remove some useless sleeps
* Fix backwards condition in verify_vol_volsize
* Remove some redundant cleanup steps in the tests. cleanup.ksh will handle
everything.
* Disable some parts of the tests that FreeBSD doesn't support:
* Creating snapshots with mkdir
* devices
* shareisci
* sharenfs
* xattr
* zoned
The sharenfs parts could probably be reenabled with more work to remove the
Solarisms.
MFC after: 2 weeks
Sponsored by: Axcient
Differential Revision: https://reviews.freebsd.org/D21898
ZFS has grown some additional properties that hadn't been added to the
config file yet. While I'm here, improve the error message, and remove a
superfluous command.
MFC after: 2 weeks
Sponsored by: Axcient
This provides a framework to define a template describing
a set of "variables of interest" and the intended way for
the framework to maintain them (for example the maximum, sum,
t-digest, or a combination thereof). Afterwards the user
code feeds in the raw data, and the framework maintains
these variables inside a user-provided, opaque stats blobs.
The framework also provides a way to selectively extract the
stats from the blobs. The stats(3) framework can be used in
both userspace and the kernel.
See the stats(3) manual page for details.
This will be used by the upcoming TCP statistics gathering code,
https://reviews.freebsd.org/D20655.
The stats(3) framework is disabled by default for now, except
in the NOTES kernel (for QA); it is expected to be enabled
in amd64 GENERIC after a cool down period.
Reviewed by: sef (earlier version)
Obtained from: Netflix
Relnotes: yes
Sponsored by: Klara Inc, Netflix
Differential Revision: https://reviews.freebsd.org/D20477
This test attempts to corrupt a file-backed vdev by deleting it and then
recreating it with truncate. But that doesn't work, because the pool
already has the vdev open, and it happily hangs on to the open-but-deleted
file. Fix by truncating the file without deleting it.
MFC after: 2 weeks
Sponsored by: Axcient
* Adapt zvol_misc_001_neg to use dumpon instead of Solaris's dumpadm
* Disable zvol_misc_003_neg, zvol_misc_005_neg, and zvol_misc_006_pos,
because they involve using a zvol as a dump device, which FreeBSD does not
yet support.
MFC after: 2 weeks
Sponsored by: Axcient
Remove the now obsolete vnet_state field. This greatly simplifies the
detection of VNET shutdown and avoids code duplication.
Discussed with: bz@
MFC after: 1 week
Sponsored by: Mellanox Technologies
The compatibility code for the atomic operations in ZFS code is a bit
messy. In some cases the native definitions are directly made
available, in some cases there are emulated operations in
opensolaris_atomic.c and in yet other cases there are atomic operations
implemented in assembly that were obtained from OpenSolaris / illumos.
This commit adds atomic_swap_64 for use with i386 userland.
The code is copied from illumos.
I am not sure why FreeBSD does not provide that operation natively.
Maybe because we try (or pretend) to support processors that did not
have the necessary instructions.
While here I also added atomic_load_64 for the same reasons.
This is original code based on iilumos atomic_swap_64 and FreeBSD
atomic_load_acq_64_i586.
Pointyhat to: avg
MFC after: 1 week
8423 8199 7432 Implement large_dnode pool feature
7432 Large dnode pool feature
8199 multi-threaded dmu_object_alloc()
8423 Implement large_dnode pool feature
10406 large_dnode changes broke zfs recv of legacy stream
llumos/illumos-gate@54811da5ac54811da5achttps://www.illumos.org/issues/8423https://www.illumos.org/issues/8199https://www.illumos.org/issues/7432illumos/illumos-gate@811964cd9f811964cd9fhttps://www.illumos.org/issues/10406
ZoL issues:
Improved dnode allocation #6564
Clean up large dnode code #6262
Fix dnode_hold() freeing dnode behavior #8172
Fix dnode allocation race #6414, #6439
Partial: Raw sends must be able to decrease nlevels #6821, #6864
Remove unnecessary txg syncs from receive_object() Closes#7197
This updates FreeBSD large_dnode code (that was imported from ZoL) to a
version that was committed to illumos. It has some cleanups,
improvements and fixes comparing to what we have in FreeBSD now.
I think that the most significant update is 8199 multi-threaded
dmu_object_alloc().
This commit reverts r351077 that was a revert of r351074 and r351076 and
restores those changes. Required atomic operations should be available
now on all platforms where we build ZFS.
Obtained from: illumos
MFC after: 3 weeks
DTS Import of Linux 5.3 added a patch that rework the L3 mmc instance
in the AM335x SoC but removed the status = 'disabled' on the node.
This cause the kernel to probe the device even if the board doesn't
have this mmc used and since we don't correctly activate the clock
for this module we panic with an external data abort.
Beaglebone(s) don't have this device anyway so simply disabling it.
Patch for the DTS was sent upstream.
https://patchwork.kernel.org/patch/11176921/
PR: 241089
Reported by: phk
Previously, the code used a plain store on platforms that lacked
atomic_swap_64 and possibly some other platforms as the condition worked
only if atomic_swap_64 was a macro.
MFC after: 1 week
X-MFC after: r353166, r353167
Some 32-bit platforms do not provide 64-bit atomic operations that ZFS
requires, either in userland or at all. We emulate those operations for
those platforms using a mutex. That is not entirely correct and it's
very efficient. Besides, the loads are plain loads, so torn values are
possible.
Nevertheless, the emulation seems to work for some definition of work.
This change adds atomic_swap_64, which is already used in ZFS code, and
atomic_load_64 that can be used to prevent torn reads.
MFC after: 1 week
According to ian, the only armv6 cpu we support is the 1176, so this
change is effectively a no-op.
The change is just to make the code more self-consistent.
The issue was noticed by a standalone module build for armv6.
Reviewed by: ian
MFC after: 3 weeks
This was committed due to what was later diagnosed as an msdosfs bug
preventing in-place strip. This bug was fixed in r352564, and we agreed to
keep the workaround in for a bit to allow the driver fix a suitable amount
of propagation time for folks building/installing powerpc/ubldr, seeing as
how we were not in any hurry to revert.
Logic was backwards. The function returns true if it *is* running as a
hypervisor, whereas we want to only call the CAS utility if we're running as a
guest.
Reported by: Shawn Anastasio <shawn@anastas.io>
no longer worked. The problem was that the defines used the
same space as the VLAN id. This commit does three things.
1) Move the LRO used fields to the PH_per fields. This is
safe since the entire PH_per is used for IP reassembly
which LRO code will not hit.
2) Remove old unused pace fields that are not used in mbuf.h
3) The VLAN processing is not in the mbuf queueing code. Consequently
if a VLAN submits to Rack or BBR we need to bypass the mbuf queueing
for now until rack_bbr_common is updated to handle the VLAN properly.
Reported by: Brad Davis
Root vnodes looekd up all the time, e.g. when crossing a mount point.
Currently used routines always perform a costly lookup which can be
trivially avoided.
Reviewed by: jeff (previous version), kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21646
The current 256-lock sized array is a problem in the following ways:
- it's way too small
- there are 2 locks per cacheline
- it is not NUMA-aware
Solve these issues by introducing per-superpage locks backed by pages
allocated from respective domains.
This significantly reduces contention e.g. during poudriere -j 104.
See the review for results.
Reviewed by: kib
Discussed with: jeff
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21833
As pointed out by mjg, without the parentheses the calculations done against
these macros are incorrect, resulting in only 1/3 of locks being used.
Reported by: mjg
Since local UEFI console is implemented on top of framebuffer,
we need to avoid redrawing the whole screen ourselves, but let
Simple Text Mode to do the scroll for us.
that this path is taken by setting the tail pointer correctly.
There is still bug related to handling unordered unfragmented messages
which were delayed in deferred handling.
This issue was found by OSS-Fuzz testing the usrsctp stack and reported in
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=17794
MFC after: 3 days
Diff partially stolen from CheriBSD; these bits need -Wl,-z,notext in order
to build in an LLVM world. They are needed for all flavors/sizes of MIPS.
This will eventually get fixed in LLVM, but it's unclear when.
Reported by: arichardson, emaste
Differential Revision: https://reviews.freebsd.org/D21696
At least some of the characters in E000-F8FF range are used by Powerline
fonts, and having no attributes for these ranges in UnicodeData.txt
other than "Other, Private Use" it should be safe to mark all of them as
printable. Some actually were before r340491, so this fixes the
regression introduced there as well.
PR: 240911
Reviewed by: bapt
Tested by: Daniel Ponte <amigan@gmail.com>
Differential Revision: https://reviews.freebsd.org/D21850