40 Commits

Author SHA1 Message Date
Toomas Soome
467c82cb84 loader: Replace EFI part devices.
Rewrite EFI part device interface to present disk devices in more
user friendly way.

We keep list of three types of devices: floppy, cd and disk, the
visible names: fdX: cdX: and diskX:

Use common/disk.c and common/part.c interfaces to manage the
partitioning.

The lsdev -l will additionally list the device path.

Reviewed by:	imp, allanjude
Approved by:	imp (mentor), allanjude (mentor)
Differential Revision:	https://reviews.freebsd.org/D8581
2017-02-06 09:18:47 +00:00
Toomas Soome
c12dbfe608 loader: Implement disk_ioctl() to support DIOCGSECTORSIZE and DIOCGMEDIASIZE.
Need interface to extract information about disk abstraction,
to read disk or partition size depending on the provided argument
and adjust disk size based on information in partition table.

The disk handle from disk_open() has d_offset field to point to
partition start. So we can use this fact to return either whole disk
size or partition size. For this we only need to record partition size
we get from disk_open() anyhow.

In addition, this will also make it possible to adjust the disk media size
based on information from partition table. The problem with disk size is
about some BIOS systems reporting bogus disk size for 2+TB disks, but
since such disks are using GPT partitioning, and GPT does have information
about disk size (alternate LBA + 1), we can use this fact to record disk
size based on partition table.

This patch does exactly this: implements DIOCGSECTORSIZE and DIOCGMEDIASIZE
ioctl, and DIOCGMEDIASIZE will report either disk media size or partition size.

Adds ptable_getsize() call to read partition size in bytes from ptable pointer.
Updates disk_open() to use ptable_getsize() to update mediasize value.

Implements GPT detection function to update ptable size (used by
ptable_getsize()) according to alternate lba (which is location of backup copy
of GPT header table).

Reviewed by:	allanjude
Approved by:	allanjude (mentor)
Differential Revision:	https://reviews.freebsd.org/D8594
2017-02-06 08:26:45 +00:00
Toomas Soome
151139ad9e loader: disk/part api needs to use uint64_t offsets
The disk_* and part_* api is using 64bit values for media size and
offsets. However, the current api is using off_t type, which is signed
64-bit int.

In this context the signed media size does not make any sense, and
the offsets are used to mark absolute, not relative locations.

Also, the data from GPT partition table and some other sources is
already using uint64_t data type, so using signed off_t can cause sign
issues.

Reviewed by:	imp
Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D8710
2017-02-01 20:10:56 +00:00
Toomas Soome
1ecc859193 dosfs support in libstand is broken since r298230
Apparently the libstand dosfs optimization is a bit too optimistic
and did introduce possible memory corruption.

This patch is backing out the bad part and since this results in
dosfs reading full blocks now, we can also remove extra offset argument
from dv_strategy callback.

The analysis of the issue and the backout patch is provided by Mikhail Kupchik.

PR:		214423
Submitted by:	Mikhail Kupchik
Reported by:	Mikhail Kupchik
Reviewed by:	bapt, allanjude
Approved by:	allanjude (mentor)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D8644
2016-12-30 19:06:29 +00:00
Toomas Soome
77d81a4dab lsdev device name section headers should be printed by dv_print callback.
lsdev command does walk over devsw list, prints list element name and
will use dv_print() callback to print the device list.
Unfortunately this approach will add unneeded noise when there are no
particular devices detected.

To remove "empty" device section headers, the dv_print() callback
should print the header instead.

In addition, fixed dv_print callback for md module.

Reviewed by:	imp
Approved by:	imp (mentor)
Differential Revision:	https://reviews.freebsd.org/D8551
2016-11-19 08:54:21 +00:00
Toomas Soome
cbd6713146 Loader paged/pageable data is not always paged.
This change does modify devsw dv_print() to return the int value,
enabling walkers to interrupt the walk on non zero value from dv_print().

This will allow the pager_print actually to stop displaying data on
user input, and additionally pager is used in various *dev_print callbacks,
where it was missing.

For test, lsdev [-v] command should display data by screenfuls and should
stop when the key 'q' is pressed on pager prompt.

Reviewed by:	allanjude
Approved by:	allanjude (mentor)
Differential Revision:	https://reviews.freebsd.org/D5461
2016-11-08 06:50:18 +00:00
Allan Jude
4deb8929ea Make boot code and loader check for unsupported ZFS feature flags
OpenZFS uses feature flags instead of a zpool version number to track
features since the split from Oracle. In addition to avoiding confusion
on ZFS vs OpenZFS version numbers, this also allows features to be added
to different operating systems that use OpenZFS in different order.

The previous zfs boot code (gptzfsboot) and loader (zfsloader) blindly
tries to read the pool, and if failed provided only a vague error message.

With this change, both the boot code and loader check the MOS features
list in the ZFS label and compare it against the list of features that
the loader supports. If any unsupported feature is active, the pool is
not considered as a candidate for booting, and a helpful diagnostic
message is printed to the screen. Features that are merely enabled via
zpool upgrade, but not in use, do not block booting from the pool.

Submitted by:	Toomas Soome <tsoome@me.com>
Reviewed by:	delphij, mav
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D6857
2016-08-01 19:37:43 +00:00
Andriy Gapon
dc87c22a61 fix a zfs boot regression introduced in r300117 by accident
There is no reason to return non-zero value from zfs_probe_partition()
as that causes following partitions to not be probed for ZFS vdevs.
A particular scenario that I encountered is a GPT partitioned disk
where several partitions have freebsd-zfs type.  A partition with a lower
index is used as a cache (l2arc) vdev and in that case case zfs_probe()
returned a non-zero status.  That status was returned to ptable_iterate()
and caused it to abort the iteration.  Because of that the subsequent
partitions were not probed and a root pool was not discovered resulting
in a boot failure.

While there fix the style for nearby return statements.

Approved by:	re (kib)
2016-06-16 07:45:57 +00:00
Warner Losh
9a0b26ec6f Fix several instances where the boot loader ignored pager_output
return value when it could return 1 (indicating we should stop).
Fix a few instances of pager_open() / pager_close() not being called.
Actually use these routines for the environment variable printing code
I just committed.
2016-05-18 05:59:05 +00:00
Pedro F. Giffuni
d9c9c81c08 sys: use our roundup2/rounddown2() macros when param.h is available.
rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.

This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.
2016-04-21 19:57:40 +00:00
Allan Jude
87ed2b7f5a A new implementation of the loader block cache
The block cache implementation in loader has proven to be almost useless, and in worst case even slowing down the disk reads due to insufficient cache size and extra memory copy.
Also the current cache implementation does not cache reads from CDs, or work with zfs built on top of multiple disks.
Instead of an LRU, this code uses a simple hash (O(1) read from cache), and instead of a single global cache, a separate cache per block device.
The cache also implements limited read-ahead to increase performance.
To simplify read ahead management, the read ahead will not wrap over bcache end, so in worst case, single block physical read will be performed to fill the last block in bcache.

Booting from a virtual CD over IPMI:
0ms latency, before: 27 second, after: 7 seconds
60ms latency, before: over 12 minutes, after: under 5 minutes.

Submitted by:	Toomas Soome <tsoome@me.com>
Reviewed by:	delphij (previous version), emaste (previous version)
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D4713
2016-04-18 23:09:22 +00:00
Allan Jude
9cdff681a4 Do not set vfs.root.mountfrom unnecessarily
This causes boot from external media (installer USB image) to mount / from
the default ZFS BE, rather than the USB device.

Reported by:	kmoore
MFC after:	5 days
Sponsored by:	ScaleEngine Inc.
2016-02-07 00:49:15 +00:00
Allan Jude
fbe958861e Move init_zfs_bootenv to sys/boot/zfs/zfs.c instead of having a copy in each loader
While here, add a filter to ignore special datasets

MFC after:	3 days
Sponsored by:	ScaleEngine Inc.
2016-01-15 05:45:10 +00:00
Allan Jude
076b613091 DIOCGSECTORSIZE expects to write to a u_int, but struct zfs_probe_args
member secsz was a uint16_t

sys/boot/zfs/zfs.c has a probe args structure member, secsz, that is a
uint16_t for media sector size; it is used as an argument for ioctl()
at line 484. however, this ioctl writes 32 bits of data (u_int *) and
therefore this ioctl will overwrite and corrupt 16 bits of memory.
other use cases seem to use correct u_int type for secsz.

PR:		204358
Submitted by:	Toomas Soome <tsoome at me.com>
Reviewed by:	asomers, delphij, smh
MFC after:	5 days
Differential Revision:	https://reviews.freebsd.org/D4811
2016-01-11 15:35:29 +00:00
Allan Jude
fce8d0e350 Only call init_zfs_bootenv() when the system was booted with ZFS
Add a few other safeguards to ensure things do not break when the
boot device cannot be determined

Reported by:	flo
MFC after:	3 days
Sponsored by:	ScaleEngine Inc.
2016-01-09 00:54:08 +00:00
Steven Hartland
5d07d143e6 Fix return from zfs_probe_dev
Ensure zfs_probe_dev returns the correct value.

Also fix a style(9) trailing whitespace issue while here.

MFC after:	2 weeks
X-MFC-With:	r293268
Sponsored by:	Multiplay
2016-01-06 20:25:41 +00:00
Allan Jude
f1aba489c1 Introduce the ZFS Boot Environments menu to the loader menu
If the system was booted with ZFS, a new menu item (#7) appears
It contains an autogenerated list of ZFS Boot Environments

This allows the user to switch to an alternate root file system
Use Cases:
 - Revert a failed upgrade
 - Concurrently run different versions of FreeBSD with common home directory
 - Easier integration with the sysadmin/beadm utility

Requested by:	many
Reviewed by:	dteske
MFC after:	10 days
Relnotes:	yes
Sponsored by:	ScaleEngine Inc.
Differential Revision:	https://reviews.freebsd.org/D3167
2015-12-31 20:00:53 +00:00
Andriy Gapon
d39075208e zfs loader: treat plain pool name as a name of its root dataset
... as opposed to the previous behavior of treating it as boot
dataset (specified by bootfs or default)

MFC after:	19 days
2012-10-06 19:42:50 +00:00
Andriy Gapon
164efe4010 boot/zfs: a small whitespace cleanup
MFC after:	5 days
2012-10-06 19:41:11 +00:00
Andriy Gapon
62c725a9db boot/zfs: call zfs_spa_init for all found pools
... and drop those for which it fails.
Also, add more sanity checking to the function.

MFC after:	16 days
2012-10-06 19:40:12 +00:00
Andriy Gapon
74b3e265c7 zfs boot: add code for listing child datasets of a given dataset
- only filesystem datasets are supported
- children names are printed to stdout

To do: allow to iterate over the list and fetch names programatically

MFC after:	17 days
2012-10-06 19:27:04 +00:00
Andriy Gapon
84b339ac4c zfs boot: chose a "first" pool if none is explicitly requested
MFC after:	8 days
2012-10-06 19:25:40 +00:00
Andriy Gapon
b6910e777e zfs boot: bring zap_leaf_chunk field names in sync with kernel code
This change is cosmetic.

MFC after:	10 days
2012-09-11 07:11:32 +00:00
Andrey V. Elsukov
a188e43ef3 Explicitly terminate the string after strncpy(3). 2012-08-15 09:18:49 +00:00
Andrey V. Elsukov
4c89da6c18 Teach the ZFS use new partitions API when probing.
Note: now ZFS does probe only for partitions with type "freebsd-zfs"
and "freebsd".
2012-08-05 14:48:28 +00:00
Andriy Gapon
0a56cdf866 zfs boot: cleanup remnants of temporary compat code
MFC after:	1 month
2012-05-13 10:54:43 +00:00
Andriy Gapon
dfc351d8e9 zfs boot code: mark spa_t arguments as const where they are used as such
MFC after:	1 month
2012-05-13 09:22:18 +00:00
Andriy Gapon
32df866434 sparc64/zfs boot: take advantage of new libzfsboot capabilities
Also drop the now unneeded compatibility shims.

Tested by:	marius
MFC after:	1 month
2012-05-12 20:27:33 +00:00
Andriy Gapon
f7f7b759f6 zfs boot code: use %j and uintmax_t instead %ll and uint64_t in printfs
This is to silence warnings that result from different definitions of
uint64_t on different architectures, specifically i386 and sparc64.

MFC after:	1 month
2012-05-12 20:23:30 +00:00
Andriy Gapon
1702e62f67 zfsboot/zfsloader: support accessing filesystems within a pool
In zfs loader zfs device name format now is "zfs:pool/fs",
fully qualified file path is "zfs:pool/fs:/path/to/file"
loader allows accessing files from various pools and filesystems as well
as changing currdev to a different pool/filesystem.

zfsboot accepts kernel/loader name in a format pool:fs:path/to/file or,
as before, pool:path/to/file; in the latter case a default filesystem
is used (pool root or bootfs).  zfsboot passes guids of the selected
pool and dataset to zfsloader to be used as its defaults.

zfs support should be architecture independent and is provided
in a separate library, but architectures wishing to use this zfs support
still have to provide some glue code and their devdesc should be
compatible with zfs_devdesc.
arch_zfs_probe method is used to discover all disk devices that may
be part of ZFS pool(s).

libi386 unconditionally includes zfs support, but some zfs-specific
functions are stubbed out as weak symbols.  The strong definitions
are provided in libzfsboot.
This change mean that the size of i386_devspec becomes larger
to match zfs_devspec.

Backward-compatibility shims are provided for recently added sparc64
zfs boot support.  Currently that architecture still works the old
way and does not support the new features.

TODO:
- clear up pool root filesystem vs pool bootfs filesystem distinction
- update sparc64 support
- set vfs.root.mountfrom based on currdev (for zfs)

Mid-future TODO:
- loader sub-menu for selecting alternative boot environment

Distant future TODO:
- support accessing snapshots, using a snapshot as readonly root

Reviewed by:	marius (sparc64),
		Gavin Mu <gavin.mu@gmail.com> (sparc64)
Tested by:	Florian Wagner <florian@wagner-flo.net> (x86),
		marius (sparc64)
No objections:	fs@, hackers@
MFC after:	1 month
2012-05-12 09:03:30 +00:00
Marius Strobl
2d75b8321f Add initial support for booting from ZFS on sparc64. At least on Sun Fire
V100, the firmware is known to be broken and not allowing to simultaneously
open disk devices, causing attempts to boot from a mirror or RAIDZ to cause
a crash. This will be worked around later. The firmwares of newer sun4u models
don't seem to exhibit this problem though.

Steps for ZFS booting:

1. create VTOC8 label
# gpart create -s vtoc8 da0

2. add partitions, f.e.:
# gpart add -t freebsd-zfs -s 60g da0
# gpart add -t freebsd-swap da0
resulting in something like:
# gpart show
=>        0  143331930  da0  VTOC8  (68G)
          0  125821080    1  freebsd-zfs  (60G)
  125821080   17510850    2  freebsd-swap  (8.4G)

3. create zpool
# zpool create bunker da0a
or for mirror/RAIDZ (after preparing additional disks as in steps 1. + 2.):
# zpool create bunker mirror da0a da1a
# zpool create bunker raidz da0a da1a da2a ...

4. set bootfs
# zpool set bootfs=bunker bunker

5. install zfsboot
# zpool export bunker
# gpart bootcode -p /boot/zfsboot da0

6. write zfsloader to the ZFS Boot Block (so far, there's no dedicated tool
for this, so dd(1) has to be used for this purpose)
When using mirror/RAIDZ, step 4. and the dd(1) invocation should be repeated
for the additional disks in order to be able to boot from another disk in
case of failure.
# sysctl kern.geom.debugflags=0x10
# dd if=/boot/zfsloader of=/dev/da0a bs=512 oseek=1024 conv=notrunc
# zpool import bunker

7. install system on ZFS filesystem
Don't forget to set 'zfs_load="YES"' and vfs.root.mountfrom="zfs:bunker" in
loader.conf as well as 'zfs_enable="YES"'in rc.conf.

8. copy zpool.cache to the ZFS filesystem
cp -p /boot/zfs/zpool.cache /bunker/boot/zfs/zpool.cache

9. set mountpoint
# zfs set mountpoint=/ bunker

10. Now, given that aliases for all disks in the zpool exists (check with
the `devalias` command on the boot monitor prompt) and disk0 corresponds
to da0 (likewise for additional disks), the system can be booted from the
ZFS with:
{1} ok boot disk0

PR:             165025
Submitted by:   Gavin Mu
2012-05-01 17:16:01 +00:00
Pawel Jakub Dawidek
10b9d77bf1 Finally... Import the latest open-source ZFS version - (SPA) 28.
Few new things available from now on:

- Data deduplication.
- Triple parity RAIDZ (RAIDZ3).
- zfs diff.
- zpool split.
- Snapshot holds.
- zpool import -F. Allows to rewind corrupted pool to earlier
  transaction group.
- Possibility to import pool in read-only mode.

MFC after:	1 month
2011-02-27 19:41:40 +00:00
Pawel Jakub Dawidek
701c50988f Remove magic value. 2010-09-17 22:51:45 +00:00
Andriy Gapon
4b0562e63c zfs boot: fix error handling in zfs_readdir
Found by:	clang static analyzer
MFC after:	4 days
2010-05-31 09:06:03 +00:00
Pawel Jakub Dawidek
736c8a338c Update comment. We also look for GPT partitions. 2010-02-18 22:23:30 +00:00
Robert Noland
14c436e101 Correct some issues with zfs boot.
- Teach it to read gang blocks. (essentially untested)
   If you see "ZFS: gang block detected!", please let
   me know, so we can either remove the printf if it
   works, or fix it if it doesn't.

 - If multiple partitions exist on a disk, probe them all.
   We also need to reset dsk->start to 0 to read the right
   sector here.

 - With GPT, we can have 128 partitions.

 - If the bootfs property has ever been set on a pool
   it seems that it never goes away.  zpool won't allow
   you to add to the pool with the bootfs property set.
   However, if you clear the property back to default
   we end up getting 0 for the object number and read
   a bogus block pointer and fail to boot.

 - Fix some error printfs. The printf in the loader is
   only capable of c,s and u formats.

 - Teach printf how to display %llu

Reviewed by:	dfr, jhb
MFC after:	2 weeks
2009-10-23 18:44:53 +00:00
Paul Saab
5ee5aed0a3 Fix a leak introduced in r185902. We should free the devspec if
we've successfully found a zfs pool.
2008-12-11 16:48:35 +00:00
Paul Saab
390edcc5b9 Avoid a double free in devopen by not freeing the device structure
in zfs_dev_open.  This stops a panic in the loader when trying to
read from a zfs device and no zfs devices exist.
2008-12-11 02:23:49 +00:00
Doug Rabson
51f0d2e192 Add a GPT-aware variant of zfsboot which should be used in a similar manner
to gptboot, i.e. installed in a freebsd-boot partition using /sbin/gpart or
/sbin/gpt.

Tweak the /boot/loader ZFS support so that it can find ZFS pools that are
contained in GPT partitions.
2008-11-19 16:39:01 +00:00
Pawel Jakub Dawidek
1ba4a712dd Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes.
This bring huge amount of changes, I'll enumerate only user-visible changes:

- Delegated Administration

	Allows regular users to perform ZFS operations, like file system
	creation, snapshot creation, etc.

- L2ARC

	Level 2 cache for ZFS - allows to use additional disks for cache.
	Huge performance improvements mostly for random read of mostly
	static content.

- slog

	Allow to use additional disks for ZFS Intent Log to speed up
	operations like fsync(2).

- vfs.zfs.super_owner

	Allows regular users to perform privileged operations on files stored
	on ZFS file systems owned by him. Very careful with this one.

- chflags(2)

	Not all the flags are supported. This still needs work.

- ZFSBoot

	Support to boot off of ZFS pool. Not finished, AFAIK.

	Submitted by:	dfr

- Snapshot properties

- New failure modes

	Before if write requested failed, system paniced. Now one
	can select from one of three failure modes:
	- panic - panic on write error
	- wait - wait for disk to reappear
	- continue - serve read requests if possible, block write requests

- Refquota, refreservation properties

	Just quota and reservation properties, but don't count space consumed
	by children file systems, clones and snapshots.

- Sparse volumes

	ZVOLs that don't reserve space in the pool.

- External attributes

	Compatible with extattr(2).

- NFSv4-ACLs

	Not sure about the status, might not be complete yet.

	Submitted by:	trasz

- Creation-time properties

- Regression tests for zpool(8) command.

Obtained from:	OpenSolaris
2008-11-17 20:49:29 +00:00