illumos/illumos-gate@1fd3785ff6
spa_load_impl has grown out of proportions. It is currently over 700
lines long and makes it very hard to follow or debug the import process
even for experienced ZFS developers. The objective is to split it up
in a series of well commented functions.
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
illumos/illumos-gate@1fd3785ff6
spa_load_impl has grown out of proportions. It is currently over 700
lines long and makes it very hard to follow or debug the import process
even for experienced ZFS developers. The objective is to split it up
in a series of well commented functions.
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Pavel Zakharov <pavel.zakharov@delphix.com>
illumos/illumos-gate@36a64e6284
To prevent kmem_cache reaping from blocking other system resources, turn
kmem_cache_reap_now() (which blocks) into kmem_cache_reap_soon(). Callers
to kmem_cache_reap_soon() should use kmem_cache_reap_active(), which
exploits #9017's new taskq_empty().
Reviewed by: Bryan Cantrill <bryan@joyent.com>
Reviewed by: Dan McDonald <danmcd@joyent.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Yuri Pankov <yuripv@yuripv.net>
Author: Tim Kordas <tim.kordas@joyent.com>
FreeBSD does not use taskqueue for kmem caches reaping, so this change
is less dramatic then it is on Illumos, just limiting reaping to 1 time
per second. It may possibly be improved later, if needed.
illumos/illumos-gate@36a64e6284
To prevent kmem_cache reaping from blocking other system resources, turn
kmem_cache_reap_now() (which blocks) into kmem_cache_reap_soon(). Callers
to kmem_cache_reap_soon() should use kmem_cache_reap_active(), which
exploits #9017's new taskq_empty().
Reviewed by: Bryan Cantrill <bryan@joyent.com>
Reviewed by: Dan McDonald <danmcd@joyent.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Yuri Pankov <yuripv@yuripv.net>
Author: Tim Kordas <tim.kordas@joyent.com>
illumos/illumos-gate@f06dce2c1f
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Gordon Ross <gordon.w.ross@gmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Andrew Stormont <astormont@racktopsystems.com>
Mostly const-correctness fixes. There were also some variable-shadowing,
unused variable, and a couple of sockaddr type-correctness changes. I also had
trouble with cast-align warnings. I was able to prove that one of them was a
false positive. But ultimately I had to disable the warning program-wide to
deal with the others.
Reviewed by: cem
MFC after: 3 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D14460
illumos/illumos-gate@f06dce2c1f
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Gordon Ross <gordon.w.ross@gmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Andrew Stormont <astormont@racktopsystems.com>
To minimize the time spent scanning all of the directories in pass 2
(Check Pathnames), fsck uses a search order based on the location
of their first block. Zero length directories have no first block,
so the array being used to hold the block numbers of directory
inodes was of zero length. Thus a lookup was done past the end of
the array getting at best a random value and at worst a segment
fault. For zero length directories, this change allocates a one
element block array and initializes it to zero. The effect is that
all zero length directories are handled first in pass 2.
Reviewed by: brooks
Differential Revision: https://reviews.freebsd.org/D14163
This seems to have been arbitrary; bootlock_password and password don't seem
to have any documented length restrictions, and loader(8) probably shouldn't
care about whatever GELI passphrase length restrictions might exist.
Reported by: Kalle Carlbark <kalle.carlbark+freebsd@kcbark.net>
As noted in D14267 load_elf.c has a variety of indentation styles. Move
to standard 8 column hard tab indents, 4 space second level indents.
Also includes some whitespace cleanups found by clang-format.
The current route(8) manpage shows that "flush" is an argument to
the optional -n flag, rather than a separate subcommand. Correct
this to properly show flush as a route subcommand.
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
Reviewed by: rgrimes
Differential Revision: https://reviews.freebsd.org/D14401
Only require a gateway to be specified on a route add request. On
a route change request that does not specify the gateway, the
gateway will remain the same. This allows changing other route
parameters without having to re-specifying the gateway, like in
"route change 10.0.0.0/8 -mtu 9000".
Update the route(8) manpage to explicitly call out this usage
as being supported.
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
Reviewed By: eugen (rtsock.c change), rgrimes
Differential Revision: https://reviews.freebsd.org/D14291
illumos/illumos-gate@0fb055e81f
At present it is possible to boot from a root pool that is on RAIDZ but not
one that is on RAIDZ2 or RAIDZ3. This is because, at the time the pool
version is checked to ensure support for dual/triple parity, the uberblock
has not yet been loaded into the SPA and therefore the code determines that
the pool version is too old and returns ENOTSUP.
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Andy Fiddaman <omnios@citrus-it.co.uk>
FreeBSD already had this fixed, so this is just a diff reduction.
Curiously, changing whitespace seems to cause the md5 of the .o files to differ
these days hence the following testing strategy:
Tested by: objdump -d | md5 (both in-tree clang and lang/gcc6)
illumos/illumos-gate@0fb055e81f
At present it is possible to boot from a root pool that is on RAIDZ but not
one that is on RAIDZ2 or RAIDZ3. This is because, at the time the pool
version is checked to ensure support for dual/triple parity, the uberblock
has not yet been loaded into the SPA and therefore the code determines that
the pool version is too old and returns ENOTSUP.
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Andy Stormont <astormont@racktopsystems.com>
Reviewed by: Toomas Soome <tsoome@me.com>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Andy Fiddaman <omnios@citrus-it.co.uk>
Attempt to autoboot when we open the default menu, and only when we open the
default menu. This alleviates the need for checking menu.already_autoboot,
because we're not trying to autoboot every time we open a submenu.
I note that escaping to loader prompt and going back to the menu (by running
require('menu').run() at the loader prompt) will happily work and not
re-initiate the autoboot sequence since "Escape to loader prompt" disables
the autoboot_delay.
Instead of based it off of whether 'kernels' was specified, base it off of a
new variable: kernels_autodetect. If set to yes, we'll run the autodetection
bits and add any detected kernels to the already existing list *after* both
'kernel' and 'kernels'.
illumos/illumos-gate@5cabbc6b49https://www.illumos.org/issues/7614:
This project allows top-level vdevs to be removed from the storage pool with
“zpool remove”, reducing the total amount of storage in the pool. This
operation copies all allocated regions of the device to be removed onto other
devices, recording the mapping from old to new location. After the removal is
complete, read and free operations to the removed (now “indirect”) vdev must
be remapped and performed at the new location on disk. The indirect mapping
table is kept in memory whenever the pool is loaded, so there is minimal
performance overhead when doing operations on the indirect vdev.
The size of the in-memory mapping table will be reduced when its entries
become “obsolete” because they are no longer used by any block pointers in
the pool. An entry becomes obsolete when all the blocks that use it are
freed. An entry can also become obsolete when all the snapshots that
reference it are deleted, and the block pointers that reference it have been
“remapped” in all filesystems/zvols (and clones). Whenever an indirect block
is written, all the block pointers in it will be “remapped” to their new
(concrete) locations if possible. This process can be accelerated by using
the “zfs remap” command to proactively rewrite all indirect blocks that
reference indirect (removed) vdevs.
Note that when a device is removed, we do not verify the checksum of the data
that is copied. This makes the process much faster, but if it were used on
redundant vdevs (i.e. mirror or raidz vdevs), it would be possible to copy
the wrong data, when we have the correct data on e.g. the other side of the
mirror. Therefore, mirror and raidz devices can not be removed.
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: John Kennedy <john.kennedy@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Reviewed by: Tim Chase <tim@chase2k.com>
Approved by: Garrett D'Amore <garrett@damore.org>
Author: Prashanth Sreenivasa <pks@delphix.com>
This looks a little bit differently than the forth version for the time
being, just to get off the ground- rather than a paging system, it's
implemented as a simple carousel like the kernel selector.
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D14436
Having these compiled into the module causes the kobj method descriptors
to be resolved incorrectly (by the compile-time linker instead of the
kernel linker), which then leads to hours of frustrating debugging.
For the benefit of lualoader, add all bootenvs to environment when
init_zfs_bootenv is invoked. All of the boot environment logic can then be
implemented in pure lua, rather than going back and forth with C to
implement paging.
This stores all boot environments in bootenvs[idx] and the final count of
bootenvs in bootenvs_count.
While here, make a copy of currdev for init_zfs_bootenv since it will be
modifying it and the caller may not necessarily want that. Some of the logic
was shifted around so that the 'currdev' pointer remains at the beginning of
the string and 'beroot' is moved around as needed to modify it or ultimately
store it in zfs_be_root.
The original zfs_bootenv that this was copied from will be able to go away
only if/when forth eventually goes away.
Tested with: lualoader (and local changes to add boot env. support)
Tested with: forth
Reviewed by: cem (earlier version), imp
Differential Revision: https://reviews.freebsd.org/D14435
The Makefile gives the impression that ext2fs and msdos were excluded
(they weren't) and that you could exclude cd9660 and ufs support (you
couldn't). Allow those to be excluded.
We need to look, in the future, at trimming the number of supported
filesystems, and this will make that easier.
There's no reason to have multiple copies of lszfs and
reloadbe. Consolidate them into one location. Also ldi_get_size is the
same everywhere (except sparc64). Make it the same everywhere as the
common definition is more general and will work on spar64.
With all values identical it was possible for Var() to return a negative
value due to limited floating point precision, resulting in "nan"
reported as Stddev.
Variance cannot actually be negative, so just return 0. We can later
investigate alternate algorithms for calculating variance to reduce the
effect of catastrophic cancellation here.
Reported by: Arshan Khanifar <arshankhanifar_gmail.com>
Approved by: phk
Sponsored by: The FreeBSD Foundation
illumos/illumos-gate@95643f75d295643f75d2https://www.illumos.org/issues/8520
lzc_rollback_to() should support rolling back to a clone's origin.
The current checks in zfs_ioc_rollback() would not allow that because the
origin snapshot belongs to a different filesystem.
The overly restrictive check was introduced in 7600, but it was not a
regression as none of the existing tools provided a way to rollback to the
origin.
https://www.illumos.org/issues/7198
EINVAL is returned when a dataset does not have any snapshots, so there is
nothing to roll back to.
Although the code in zfs_do_rollback checks for that condition in advance, it's
still possible that the snapshot(s) gets removed after the check and before the
rollback sync task is executed.
At the moment zfs command would crash when that happens.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after: 2 weeks
illumos/illumos-gate@95643f75d295643f75d2https://www.illumos.org/issues/8520
lzc_rollback_to() should support rolling back to a clone's origin.
The current checks in zfs_ioc_rollback() would not allow that because the
origin snapshot belongs to a different filesystem.
The overly restrictive check was introduced in 7600, but it was not a
regression as none of the existing tools provided a way to rollback to the
origin.
https://www.illumos.org/issues/7198
EINVAL is returned when a dataset does not have any snapshots, so there is
nothing to roll back to.
Although the code in zfs_do_rollback checks for that condition in advance, it's
still possible that the snapshot(s) gets removed after the check and before the
rollback sync task is executed.
At the moment zfs command would crash when that happens.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
illumos/illumos-gate@f864f99efef864f99efehttps://www.illumos.org/issues/8997
When dmu_tx_assign is called from zil_lwb_write_issue, it's possible
for either ERESTART or EIO to be returned.
If ERESTART is returned, this will cause an assertion to fail directly
in zil_lwb_write_issue, where the code assumes the return value is
EIO if dmu_tx_assign returns a non-zero value. This can occur if the
SPA is suspended when dmu_tx_assign is called, and most often occurs
when running zloop.
If EIO is returned, this can cause assertions to fail elsewhere in the
ZIL code. For example, zil_commit_waiter_timeout contains the
following logic:
lwb_t *nlwb = zil_lwb_write_issue(zilog, lwb);
ASSERT3S(lwb->lwb_state, !=, LWB_STATE_OPENED);
In this case, if dmu_tx_assign returned EIO from within
zil_lwb_write_issue, the lwb variable passed in will not be issued
to disk. Thus, it's lwb_state field will remain LWB_STATE_OPENED and
this assertion will fail. zil_commit_waiter_timeout assumes that after
it calls zil_lwb_write_issue, the lwb will be issued to disk, and
doesn't handle the case where this is not true; i.e. it doesn't handle
the case where dmu_tx_assign returns EIO.
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>
MFC after: 3 weeks
This matches forth behavior. Hitting "6" when autobooting at the welcome
menu will now take you directly to the "Boot Options" menu.
We likely have some slight optimizations we should make, like not checking
autoboot every time we open a new menu and things of this nature. Further
work will go towards this end.
illumos/illumos-gate@f864f99efef864f99efehttps://www.illumos.org/issues/8997
When dmu_tx_assign is called from zil_lwb_write_issue, it's possible
for either ERESTART or EIO to be returned.
If ERESTART is returned, this will cause an assertion to fail directly
in zil_lwb_write_issue, where the code assumes the return value is
EIO if dmu_tx_assign returns a non-zero value. This can occur if the
SPA is suspended when dmu_tx_assign is called, and most often occurs
when running zloop.
If EIO is returned, this can cause assertions to fail elsewhere in the
ZIL code. For example, zil_commit_waiter_timeout contains the
following logic:
lwb_t *nlwb = zil_lwb_write_issue(zilog, lwb);
ASSERT3S(lwb->lwb_state, !=, LWB_STATE_OPENED);
In this case, if dmu_tx_assign returned EIO from within
zil_lwb_write_issue, the lwb variable passed in will not be issued
to disk. Thus, it's lwb_state field will remain LWB_STATE_OPENED and
this assertion will fail. zil_commit_waiter_timeout assumes that after
it calls zil_lwb_write_issue, the lwb will be issued to disk, and
doesn't handle the case where this is not true; i.e. it doesn't handle
the case where dmu_tx_assign returns EIO.
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Andriy Gapon <avg@FreeBSD.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Prakash Surya <prakash.surya@delphix.com>
illumos/illumos-gate@a6c1eb3c08a6c1eb3c08https://www.illumos.org/issues/8731
annotate_ecksum() asserts that nui64s, calculated as nui64s = size / sizeof
(uint64_t), is not greater than UINT16_MAX.
This restriction is needed because histograms of incorrectly set and cleared
bits have 16 bit counters and if the buffer consists of too many 64-bit words,
then a counter can potentially overflow producing an incorrect result.
When the largest buffer size was 128KB the greatest value of nui64s was 16K,
well within the limit.
But now we have support for large buffers and for buffer sizes of 512KB and
above the restriction is violated.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after: 2 weeks
illumos/illumos-gate@a6c1eb3c08a6c1eb3c08https://www.illumos.org/issues/8731
annotate_ecksum() asserts that nui64s, calculated as nui64s = size / sizeof
(uint64_t), is not greater than UINT16_MAX.
This restriction is needed because histograms of incorrectly set and cleared
bits have 16 bit counters and if the buffer consists of too many 64-bit words,
then a counter can potentially overflow producing an incorrect result.
When the largest buffer size was 128KB the greatest value of nui64s was 16K,
well within the limit.
But now we have support for large buffers and for buffer sizes of 512KB and
above the restriction is violated.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
When processor enters power-save state it releases resources shared with other
cpu threads which makes other cores working much faster.
This patch also implements saving and restoring registers that might get
corrupted in power-save state.
Submitted by: Patryk Duda <pdk@semihalf.com>
Obtained from: Semihalf
Reviewed by: jhibbits, nwhitehorn, wma
Sponsored by: IBM, QCM Technologies
Differential revision: https://reviews.freebsd.org/D14330
binary mode user-space emulation layer. This is a regression issue after
r328436, when LinuxKPI character devices started to use DTYPE_DEV in
the "f_type" field of the associated file structure(s).
MFC after: 3 days
Found by: Johannes Lundberg <johalun0@gmail.com>
Sponsored by: Mellanox Technologies