POOL_STATE_SPARE and POOL_STATE_L2CACHE were not handled correctly
and thus the cache and spare disks would not be correctly probed.
Reported by: Michael Schmiedgen <schmiedgen@gmx.net>,
Matthew D. Fuller <fullermd@over-yonder.net>
Tested by: Michael Schmiedgen <schmiedgen@gmx.net>,
flo
MFC after: 5 days
In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.
The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.
Conducted and reviewed by: attilio
Tested by: pho
... otherwise the current thread might be holding ARC locks and thus run
into a deadlock. This happens, for example, when a thread does memory
allocation in the ARC code and runs into KVA shortage.
Also, it really makes the most sense to wait in pageproc, so that the
results of ARC reclamation are seen before the page cache is acted upon.
In other cases where vm_lowmem is invoked, e.g. on KVA space shortage,
the callers perform multiple attempts (up to 8) and wait for rather
long intervals between them (up to 4 seconds), so ARC reclaim results
should become visible even without explicit waiting on the ARC thread.
Note that this is not a critical issue for typical ZFS usages where KVA
space should already be large enough. On amd64 systems setting KVA size
to twice the physical memory size is known to mitigate KVA fragmentation
issues in practice.
Side note: perhaps vm_lowmem 'how' parameter should be used to
differentiate between causes of the event.
Reported by: Nikolay Denev <ndenev@gmail.com>
MFC after: 19 days
getnewvnode_reserve helps to avoid "recursing" back into zfs code
via getnewvnode when that latter needs to reclaim some vnodes.
zfs code may hold a number of locks around getnewvnode and doesn't
expect any recursion to happen on those locks, because that never
happens in solaris.
I believe that this change also eleiminates a need for the delayed
znode destruction via the taskqueue.
Many thanks to kib for devising getnewvnode_reserve.
Reported by: flo
Tested by: bapt, kwm, swills
MFC after: 2 weeks
X-MFC after: r241556
... instead of deferring the action until first open.
Unlike upstream this has no benefit on FreeBSD.
We know that as soon as the provider is created it is going to be tasted
and thus opened. Initial mediasize of zero causes tasting failure
and subsequent retasting because of the size change.
MFC after: 14 days
This should allow to mount a dataset as a root filesystem even if
it belongs to a pool that is not described in zpool.cache.
This adds some overhead to the boot process though.
If the root filesystem's pool is found in zpool.cache, the by default
its cached configuration will be used for import.
vfs.zfs.rootpool.prefer_cached_config could be set to zero to force
the config to be retasted.
Discussed with: gibbs, pjd, des
MFC after: 25 days
doesn't exist on a dataset we are starting from. For example if we
have the following configuration:
tank
tank/foo
tank/foo@snap
tank/bar
tank/bar@snap
We can execute:
# zfs destroy -t tank@snap
eventhough tank@snap doesn't exit.
Unfortunately it is not possible to do the same with recursive rename:
# zfs rename -r tank@snap tank@pans
cannot open 'tank@snap': dataset does not exist
...until now. This change allows to recursively rename snapshots even if
snapshot doesn't exist on the starting dataset.
Sponsored by: rsync.net
MFC after: 2 weeks
The code builds a map of regions that were freed. On every write the
code consults the map and eventually removes ranges that were freed
before, but are now overwritten.
Freed blocks are not TRIMed immediately. There is a tunable that defines
how many txg we should wait with TRIMming freed blocks (64 by default).
There is a low priority thread that TRIMs ranges when the time comes.
During TRIM we keep in-flight ranges on a list to detect colliding
writes - we have to delay writes that collide with in-flight TRIMs in
case something will be reordered and write will reached the disk before
the TRIM. We don't have to do the same for in-flight writes, as
colliding writes just remove ranges to TRIM.
Sponsored by: multiplay.co.uk
This work includes some important fixes and some improvements obtained
from the zfsonlinux project, including TRIMming entire vdevs on pool
create/add/attach and on pool import for spare and cache vdevs.
Obtained from: zfsonlinux
Submitted by: Etienne Dechamps <etienne.dechamps@ovh.net>
Do this by checking if spa_namespace_lock is already held and not taking
it again in that case.
Add a comment explaining why that is done and why it is safe.
Reviewed by: pjd
MFC after: 24 days
Since all attribute values start at 8-byte aligned boundary, we would
previously incorrectly calculate dn_bonuslen if any attribute but the
last had a variable-length value with length not multiple of 8.
Reported by: Nicolas Rachinsky <fbsd-mas-0@ml.turing-complete.org>
Tested by: Nicolas Rachinsky <fbsd-mas-0@ml.turing-complete.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com> (for upstream)
MFC after: 2 weeks
- skip length_idx index for a replaced variable-sized attribute
- skip length_idx index for a removed variable-sized attribute
- also re-arranged code to make sure that length_idx is always
incremented for variable-sized attributes
- additionally add an assertion that the number of actually produced
attributes is the same as the expected number of resulting
attributes
In cooperation with: Matthew Ahrens <mahrens@delphix.com>
Tested by: Trent Nelson <trent@snakebite.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com> (for upstream)
To do: get this upstreamed
MFC after: 2 weeks
When calling a revoke(2) on a dtrace device, dtrace_close() could be
called, even if threads are still stuck in the device. Defer the actual
deallocation of datastructures to the cdevpriv destructor.
While there, remove the unneeded D_TRACKCLOSE and D_NEEDMINOR flags. For
the helper device, we never need it. For the regular dtrace devices, we
only need these flags on FreeBSD pre-8.
MFC after: 1 month
1) It is not useful to call "devfs_clear_cdevpriv()" from
"d_close" callbacks, hence for example read, write, ioctl and
so on might be sleeping at the time of "d_close" being called
and then then freed private data can still be accessed.
Examples: dtrace, linux_compat, ksyms (all fixed by this patch)
2) In sys/dev/drm* there are some cases in which memory will
be freed twice, if open fails, first by code in the open
routine, secondly by the cdevpriv destructor. Move registration
of the cdevpriv to the end of the drm open routines.
3) devfs_clear_cdevpriv() is not called if the "d_open" callback
registered cdevpriv data and the "d_open" callback function
returned an error. Fix this.
Discussed with: phk
MFC after: 2 weeks
Bryan Cantrill implemented the equivalent of semi-log graph
paper for Dtrace so llquantize will use one logarithmic and
one linear scale.
Special thanks to Mark Peek for providing fix to an
assertion and to Fabian Keill for testing the port.
Illumos Revision: 13355:15b74a2a9a9d
Reference:
https://www.illumos/issues/905
Obtained from: Illumos
Tested by: Fabian Keill, mp
MFC after: 4 days
differentiate between an incremental and full stream.
Be sure not to generate guid equal to 0.
Reported by: someone who saw 0 being generated as 64bit random guid
MFC after: 3 days
1948 zpool list should show more detailed pool information
Display per-vdev information with "zpool list -v".
The added expandsize property has currently no value on FreeBSD.
This changeset allows adding expansion support to individual vdevs
in the future.
References:
https://www.illumos.org/issues/1948
Obtained from: illumos (issue #1948)
MFC after: 2 weeks
vn_rlimit_fsize takes uio->uio_offset and uio->uio_resid into account
when determining whether given write would exceed RLIMIT_FSIZE.
When APPEND flag is specified, ZFS updates uio->uio_offset to point to the
end of file.
But this happens after a call to vn_rlimit_fsize, so vn_rlimit_fsize check
can be rendered ineffective by thread that opens some file with O_APPEND
and lseeks below RLIMIT_FSIZE before calling write.
Submitted by: Mateusz Guzik <mjguzik at gmail dot com>
MFC after: 2 weeks
2703 add mechanism to report ZFS send progress
If the zfs send command is used with the -v flag, the amount of bytes
transmitted is reported in per second updates.
References:
https://www.illumos.org/issues/2703
Obtained from: illumos (issue #2703)
MFC after: 2 weeks
to follow the example of OpenSolaris and its descendants, which implemented
cpu as an inline that took a value out of curthread. At certain points in
the FreeBSD scheduler curthread->td_oncpu will no longer be valid (in
particukar, just before the thread gets descheduled) so instead I have
implemented this as its own built-in variable.
Sponsored by: Sandvine Inc.
MFC after: 1 week
accesses of the cache member of vm_object objects.
- Use novel vm_page_is_cached() for checks outside of the vm subsystem.
Reviewed by: alc
MFC after: 2 weeks
X-MFC: r234039
On FreeBSD the direct ioctl argument is automatically copied in/out
as necesary by the kernel ioctl entry point.
PR: kern/164445
Submitted by: Luis Garces-Erice <lge@ieee.org>
Tested by: Attila Nagy <bra@fsn.hu>
MFC after: 5 days
- header and stub .c file for fasttrap module. It's not supported on
MIPS yet, but there is no way to disable support completely
- Do as amd64 trying to limit allocated memory