multiple devices was changed. However, swapoff_one() was not fully and
correctly converted. In particular, with r118390's introduction of a per-
device blist, the maximum swap block size, "dmmax", became irrelevant to
swapoff_one()'s operation. Moreover, swapoff_one() was performing out-of-
range operations on the per-device blist that were silently ignored by
blist_fill().
This change corrects both of these problems with swapoff_one(), which will
allow us to potentially increase MAX_PAGEOUT_CLUSTER. Previously,
swapoff_one() would panic inside of blist_fill() if you increased
MAX_PAGEOUT_CLUSTER.
Reviewed by: kib, markj
MFC after: 3 days
While it sounds like a good idea to extract the RFC1048 data from PXE, in the
end it is not and it is causing lots of issues. Our pxeloader might need
options which are incompatible with other pxe servers (for example iPXE, but
not only).
Our pxe loaders are also now settings their own user class, so it is useful to
issue our own pxe request at startup
Reviewed by: tsoome
Differential Revision: https://reviews.freebsd.org/D10953
All the code are now only issueing one single dhcp request at startup of the
loader meaning we can always request a the PXE informations from the
dhcp server.
Previous code lost that information, meaning no option 55 anymore (meaning not
working with the kea dhcp server) and no request for rootpath etc, no user class
Remove the flags from the bootp function which is not needed anymore
Reviewed by: tsoome
Differential Revision: https://reviews.freebsd.org/D10952
Rather that previous attempts to add tftpfs support at the same time as NFS
support. This time decide on a proper URI parser rather than hacks.
root-path can now be define the following way:
For tftpfs:
tftp://ip/path
tftp:/path (this one will consider the tftp server is the same as the one where
the pxeboot file was fetched from)
For nfs:
nfs:/path
nfs://ip/path
The historical
ip:/path
/path
are kept on NFS
Reviewed by: tsoom, rgrimes
Differential Revision: https://reviews.freebsd.org/D10947
rfc3004 allows to pass multiple user classes on dhcp requests
this is used by dhcp servers to differentiate the caller if needed.
As an example with isc dhcp server it will be possible to make options
only for the FreeBSD loaders:
if exists user-class and option user-class = "FREEBSD" {
option root-path "tftp://192.168.42.1/FreeBSD;
}
Reviewed by: tsoome
Differential Revision: https://reviews.freebsd.org/D10951
- Add the missing option 'n' to the getopt(3) string
- Add the missing options 'libxo' and 'N' to the usage message
- Add the missing options 'M' and 'N' to the man-page
Submitted by: Keegan Drake H.P. <kdrakehp@zoho.com>
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D10915
The adapter firmware in general does not accept PDUs larger than 64k - 1
bytes in size. Sending crypto requests larger than this size result in
hangs or incorrect output, so reject them with EFBIG. For requests
chaining an AES cipher with an HMAC, the firmware appears to require
slightly smaller requests (around 512 bytes).
Sponsored by: Chelsio Communications
Some NICs have some capabilities dependent, so that disabling one require
disabling some other (TXCSUM/RXCSUM on em). This code tries to reach the
consensus more insistently.
PR: 219453
MFC after: 1 week
The motivation for this is two-fold.
1. Some old WD SATA disks may appear as if they need to be spun up
when they are already spinning. Those disks would respond with
an error to the spin-up request.
2. Even if we really fail to spin up the disk, we still can try to
proceed to the subsequent phases. If we fail later on, then no
difference. Otherwise we get a chance to communicate with the
disk which is better than completely ignoring it, because a user
can try to recover the disk.
Reviewed by: mav
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D10896
This modification adds the capability to newsyslog to write the
rotation message in a format that is compliant with RFC5424. This
capability is enabled on a per-log file basis through a new value
("T") in the flags field in newsyslog.conf. This is useful on systems
that use the RFC5424 format for log files so that the rotation message
format matches that of the other log messages. There has been recent
mention of adding an RFC5424 compliant mode to syslogd and at least
one alternative system log daemon (rsyslogd) that already has the
capability to use that format.
Reviewed by: vangyzen, ngie
Approved by: vangyzen (mentor)
MFC after: 2 months
Relnotes: yes
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D10253
This is similar to r318912, except that ptrace.[sS] was previously a
file in the source tree, not a generated assembly wrapper.
Check for the existence of ptrace.[sS] in the .depend file to determine
if we have to clean it up. This is a bit hackish and will not be left
in place indefinitely, but provides a useful example case when
investigating a better solution in bmake.
Reviewed by: bdrewery
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D10930
Fix warnings about redundant declarations in rtld
when libthr in increased to WARNS=6.
Reviewed by: kib
MFC after: 3 days
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D10934
__thr_setcontext() mistakenly tested for the presence of SIGCANCEL
in its local ucontext_t instead of the parameter. Therefore,
if a thread calls setcontext() with a context whose signal mask
contains SIGTHR (a.k.a. SIGCANCEL), that signal will be blocked,
preventing the thread from being cancelled or suspended.
Reported by: gcc 6.1 via RISC-V tinderbox
Reviewed by: kib
MFC after: 3 days
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D10933
illumos/illumos-gate@bc83969fdbbc83969fdbhttps://www.illumos.org/issues/8265
Reserve bit 23 in the zfs send stream flags for the large
dnode feature which has been implemented for Linux.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Brian Behlendorf <behlendorf1@llnl.gov>
MFC after: 1 week
illumos/illumos-gate@bc83969fdbbc83969fdbhttps://www.illumos.org/issues/8265
Reserve bit 23 in the zfs send stream flags for the large
dnode feature which has been implemented for Linux.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Brian Behlendorf <behlendorf1@llnl.gov>
illumos/illumos-gate@2d2f193a212d2f193a21https://www.illumos.org/issues/8166
If we do a scrub while a leaf device is offline (via "zpool offline"),
we will inadvertently clear the DTL (dirty time log) of the offline
device, even though it is still damaged. When the device comes back
online, we will incompletely resilver it, thinking that the scrub
repaired blocks written before the scrub was started. The incomplete
resilver can lead to data loss if there is a subsequent failure of a
different leaf device.
The fix is to never clear the DTL of offline devices. Note that if a
device is onlined while a scrub is in progress, the scrub will be
restarted.
The problem can be worked around by running "zpool scrub" after
"zpool online".
See also https://github.com/zfsonlinux/zfs/issues/5806
Reviewed by: George Wilson george.wilson@delphix.com
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@2d2f193a212d2f193a21https://www.illumos.org/issues/8166
If we do a scrub while a leaf device is offline (via "zpool offline"),
we will inadvertently clear the DTL (dirty time log) of the offline
device, even though it is still damaged. When the device comes back
online, we will incompletely resilver it, thinking that the scrub
repaired blocks written before the scrub was started. The incomplete
resilver can lead to data loss if there is a subsequent failure of a
different leaf device.
The fix is to never clear the DTL of offline devices. Note that if a
device is onlined while a scrub is in progress, the scrub will be
restarted.
The problem can be worked around by running "zpool scrub" after
"zpool online".
See also https://github.com/zfsonlinux/zfs/issues/5806
Reviewed by: George Wilson george.wilson@delphix.com
Reviewed by: Brad Lewis <brad.lewis@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@7855d95b307855d95b30https://www.illumos.org/issues/7446
Since we support whole-disk configuration for boot pool, we also will need
whole disk support with UEFI boot and for this, zpool create should create efi-
system partition.
I have borrowed the idea from oracle solaris, and introducing zpool create -
B switch to provide an way to specify that boot partition should be created.
However, there is still an question, how big should the system partition be.
For time being, I have set default size 256MB (thats minimum size for FAT32
with 4k blocks). To support custom size, the set on creation "bootsize"
property is created and so the custom size can be set as: zpool create B -
o bootsize=34MB rpool c0t0d0
After pool is created, the "bootsize" property is read only. When -B switch is
not used, the bootsize defaults to 0 and is shown in zpool get output with
value ''. Older zfs/zpool implementations are ignoring this property.
https://www.illumos.org/rb/r/219/
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Approved by: Dan McDonald <danmcd@kebe.com>
Author: Toomas Soome <tsoome@me.com>
illumos/illumos-gate@6401734d546401734d54https://www.illumos.org/issues/6418
An easy, direct means of sanitizing pool vdevs can be helpful for management
purposes.
FreeBSD has had a 'zpool labelclear' for some time, see: https://
svnweb.freebsd.org/base?view=revision&revision=224171
SpectraBSD has a slightly updated version, which I propose for inclusion.
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Will Andrews <will@firepipe.net>
Note: the bulk of the change has been already imported, this is a follow
up that imports zpool.1m changes.
illumos/illumos-gate@e4cb59f791e4cb59f791https://www.illumos.org/issues/6781
cache
A device used to cache storage pool data. A cache device cannot
be cannot be configured as a mirror or raidz group. For more
information, see the "Cache Devices" section.
needs changed to
cache
A device used to cache storage pool data. A cache device cannot
be configured as a mirror or raidz group. For more
information, see the "Cache Devices" section.
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Alexander Pyhalov <apyhalov@gmail.com>
illumos/illumos-gate@879bece34e879bece34ehttps://www.illumos.org/issues/2897
Found this option in some Oracle documentation and wanted to check out the
zpool manpage on it in OI. Unfortunately it seems to be missing from the
manpage, so I first thought it was unsupported. However, "# zpool split" does
print the correct usage. Testing with the "-n" switch makes me believe it is
supported (I don't actually need to split my pool).
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Steven Burgess <sburgess@datto.com>
5659 in the manual page for zpool(1M), one misuse of the word 'zpool' to describe a pool
illumos/illumos-gate@c8323d4323c8323d4323https://www.illumos.org/issues/4465
zpool(1M) is able to offline cache vdevs despite man page saying that it isn't:
zpool offline [-t] pool device ...
Takes the specified physical device offline. While the device is
offline, no attempt is made to read or write to the device.
This command is not applicable to spares or cache devices.
altair:root:~# zpool create testoff c9t67d0 cache c9t71d0
altair:root:~# zpool status testoff
pool: testoff
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
testoff ONLINE 0 0 0
c9t67d0 ONLINE 0 0 0
cache
c9t71d0 ONLINE 0 0 0
errors: No known data errors
altair:root:~# zpool offline testoff c9t71d0
altair:root:~# zpool status testoff
pool: testoff
state: ONLINE
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: none requested
https://www.illumos.org/issues/5659
At https://github.com/illumos/illumos-gate/blob/master/usr/src/man/man1m/
zpool.1m#L931
Do not add a disk that is currently configured as a quorum device to
a zpool.
– should be:
Do not add a disk that is currently configured as a quorum device to
a pool.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Hans Rosenfeld <rosenfeld@grumpf.hope-2000.org>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
illumos/illumos-gate@40713f2b2440713f2b24https://www.illumos.org/issues/8070
Add some ZFS comments left by various developers at different times
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Alan Somers <asomers@gmail.com>
MFC after: 1 week
illumos/illumos-gate@40713f2b2440713f2b24https://www.illumos.org/issues/8070
Add some ZFS comments left by various developers at different times
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Alan Somers <asomers@gmail.com>
illumos/illumos-gate@ade42b557aade42b557ahttps://www.illumos.org/issues/8064
It's currently nearly impossible to trace what process places a hold on
a vnode, as the only ways holds are place is via the `VN_HOLD()` and
`VN_HOLD_CALLER()` macros, which inline the bumping of `v_count`. Adding
static DTrace probes to these macros would enable tracing of where
specific vnode references come from.
For completeness and symmetry, a similar static probe should be added to
`vn_rele()` and `vn_rele_dnlc()`.
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Sebastien Roy <seb@delphix.com>
illumos/illumos-gate@b7b2590dd9b7b2590dd9https://www.illumos.org/issues/8063
A standard practice in ZFS is to keep track of "per-txg" state. Any of
the 3 active TXG's (open, quiescing, syncing) can have different values
for this state. We should assert that we do not attempt to modify other
(inactive) TXG's.
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after: 2 weeks
illumos/illumos-gate@b7b2590dd9b7b2590dd9https://www.illumos.org/issues/8063
A standard practice in ZFS is to keep track of "per-txg" state. Any of
the 3 active TXG's (open, quiescing, syncing) can have different values
for this state. We should assert that we do not attempt to modify other
(inactive) TXG's.
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@5f368aef865f368aef86https://www.illumos.org/issues/7786
Currently, vdev_online() will only post sysevent if previous state was
"offline". It should also post the event when the state changes from "removed"
or "faulted" to "healthy" or "degraded".
This will fix the following scenario:
- pull disk from slot A
- check that hotspare has taken its place (if available)
- insert disk into slot B
- check that hotspare moved back to "avail" state (if spare was used)
The problem here is that we don't get any ESC_ZFS_VDEV_* notification and fail
to update the vdev FRU.
Reviewed by: Matthew Ahrens mahrens@delphix.com
Reviewed by: George Wilson george.wilson@delphix.com
Approved by: Albert Lee <trisk@forkgnu.org>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
MFC after: 1 week
illumos/illumos-gate@5f368aef865f368aef86https://www.illumos.org/issues/7786
Currently, vdev_online() will only post sysevent if previous state was
"offline". It should also post the event when the state changes from "removed"
or "faulted" to "healthy" or "degraded".
This will fix the following scenario:
- pull disk from slot A
- check that hotspare has taken its place (if available)
- insert disk into slot B
- check that hotspare moved back to "avail" state (if spare was used)
The problem here is that we don't get any ESC_ZFS_VDEV_* notification and fail
to update the vdev FRU.
Reviewed by: Matthew Ahrens mahrens@delphix.com
Reviewed by: George Wilson george.wilson@delphix.com
Approved by: Albert Lee <trisk@forkgnu.org>
Author: Yuri Pankov <yuri.pankov@nexenta.com>
illumos/illumos-gate@def4fac588def4fac588https://www.illumos.org/issues/8025
dbuf_read() creates a zio_root() to track and wait for all the zio's
that may happen as part of this call. However, if the blkptr_t for
this buffer is NULL or a hole, we will not create any more zio's, so
this zio_root() is unnecessary. This is always the case when calling
dbuf_read() on a bonus buffer, because it has no blkptr (it's part of
the containing dnode). For workloads that read a lot of bonus buffers
(e.g. file creation and removal), creating and destroying these
unnecessary zio's can decrease performance by around 3%.
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
illumos/illumos-gate@def4fac588def4fac588https://www.illumos.org/issues/8025
dbuf_read() creates a zio_root() to track and wait for all the zio's
that may happen as part of this call. However, if the blkptr_t for
this buffer is NULL or a hole, we will not create any more zio's, so
this zio_root() is unnecessary. This is always the case when calling
dbuf_read() on a bonus buffer, because it has no blkptr (it's part of
the containing dnode). For workloads that read a lot of bonus buffers
(e.g. file creation and removal), creating and destroying these
unnecessary zio's can decrease performance by around 3%.
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>