Commit Graph

43 Commits

Author SHA1 Message Date
mm
cc61ab2f13 Introduce "feature flags" for ZFS pools (bump SPA version to 5000).
Add first feature "com.delphix:async_destroy" (asynchronous destroy
of ZFS datasets).
Implement features support in ZFS boot code.

Illumos revisions merged:
13700:2889e2596bd6
13701:1949b688d5fb
2619 asynchronous destruction of ZFS file systems
2747 SPA versioning with zfs feature flags

References:
https://www.illumos.org/issues/2619
https://www.illumos.org/issues/2747

Obtained from:	illumos (issue #2619, #2747)
MFC after:	1 month
2012-06-11 11:35:22 +00:00
avg
7a12082a7a zfs boot: cleanup remnants of temporary compat code
MFC after:	1 month
2012-05-13 10:54:43 +00:00
avg
64734e6382 zfs boot code: mark spa_t arguments as const where they are used as such
MFC after:	1 month
2012-05-13 09:22:18 +00:00
avg
5a967fbfcc sparc64/zfs boot: take advantage of new libzfsboot capabilities
Also drop the now unneeded compatibility shims.

Tested by:	marius
MFC after:	1 month
2012-05-12 20:27:33 +00:00
avg
1218acb302 zfs boot code: use %j and uintmax_t instead %ll and uint64_t in printfs
This is to silence warnings that result from different definitions of
uint64_t on different architectures, specifically i386 and sparc64.

MFC after:	1 month
2012-05-12 20:23:30 +00:00
avg
a1cf7817fd zfsboot/zfsloader: support accessing filesystems within a pool
In zfs loader zfs device name format now is "zfs:pool/fs",
fully qualified file path is "zfs:pool/fs:/path/to/file"
loader allows accessing files from various pools and filesystems as well
as changing currdev to a different pool/filesystem.

zfsboot accepts kernel/loader name in a format pool:fs:path/to/file or,
as before, pool:path/to/file; in the latter case a default filesystem
is used (pool root or bootfs).  zfsboot passes guids of the selected
pool and dataset to zfsloader to be used as its defaults.

zfs support should be architecture independent and is provided
in a separate library, but architectures wishing to use this zfs support
still have to provide some glue code and their devdesc should be
compatible with zfs_devdesc.
arch_zfs_probe method is used to discover all disk devices that may
be part of ZFS pool(s).

libi386 unconditionally includes zfs support, but some zfs-specific
functions are stubbed out as weak symbols.  The strong definitions
are provided in libzfsboot.
This change mean that the size of i386_devspec becomes larger
to match zfs_devspec.

Backward-compatibility shims are provided for recently added sparc64
zfs boot support.  Currently that architecture still works the old
way and does not support the new features.

TODO:
- clear up pool root filesystem vs pool bootfs filesystem distinction
- update sparc64 support
- set vfs.root.mountfrom based on currdev (for zfs)

Mid-future TODO:
- loader sub-menu for selecting alternative boot environment

Distant future TODO:
- support accessing snapshots, using a snapshot as readonly root

Reviewed by:	marius (sparc64),
		Gavin Mu <gavin.mu@gmail.com> (sparc64)
Tested by:	Florian Wagner <florian@wagner-flo.net> (x86),
		marius (sparc64)
No objections:	fs@, hackers@
MFC after:	1 month
2012-05-12 09:03:30 +00:00
marius
c79645eb6d Add initial support for booting from ZFS on sparc64. At least on Sun Fire
V100, the firmware is known to be broken and not allowing to simultaneously
open disk devices, causing attempts to boot from a mirror or RAIDZ to cause
a crash. This will be worked around later. The firmwares of newer sun4u models
don't seem to exhibit this problem though.

Steps for ZFS booting:

1. create VTOC8 label
# gpart create -s vtoc8 da0

2. add partitions, f.e.:
# gpart add -t freebsd-zfs -s 60g da0
# gpart add -t freebsd-swap da0
resulting in something like:
# gpart show
=>        0  143331930  da0  VTOC8  (68G)
          0  125821080    1  freebsd-zfs  (60G)
  125821080   17510850    2  freebsd-swap  (8.4G)

3. create zpool
# zpool create bunker da0a
or for mirror/RAIDZ (after preparing additional disks as in steps 1. + 2.):
# zpool create bunker mirror da0a da1a
# zpool create bunker raidz da0a da1a da2a ...

4. set bootfs
# zpool set bootfs=bunker bunker

5. install zfsboot
# zpool export bunker
# gpart bootcode -p /boot/zfsboot da0

6. write zfsloader to the ZFS Boot Block (so far, there's no dedicated tool
for this, so dd(1) has to be used for this purpose)
When using mirror/RAIDZ, step 4. and the dd(1) invocation should be repeated
for the additional disks in order to be able to boot from another disk in
case of failure.
# sysctl kern.geom.debugflags=0x10
# dd if=/boot/zfsloader of=/dev/da0a bs=512 oseek=1024 conv=notrunc
# zpool import bunker

7. install system on ZFS filesystem
Don't forget to set 'zfs_load="YES"' and vfs.root.mountfrom="zfs:bunker" in
loader.conf as well as 'zfs_enable="YES"'in rc.conf.

8. copy zpool.cache to the ZFS filesystem
cp -p /boot/zfs/zpool.cache /bunker/boot/zfs/zpool.cache

9. set mountpoint
# zfs set mountpoint=/ bunker

10. Now, given that aliases for all disks in the zpool exists (check with
the `devalias` command on the boot monitor prompt) and disk0 corresponds
to da0 (likewise for additional disks), the system can be booted from the
ZFS with:
{1} ok boot disk0

PR:             165025
Submitted by:   Gavin Mu
2012-05-01 17:16:01 +00:00
avg
2d5c2df342 zfs boot: allow file vdevs to be used in testing (e.g. with zfsboottest)
MFC after:	1 week
2011-12-04 21:29:56 +00:00
pjd
57635fa52e - Correctly read gang header from raidz.
- Decompress assembled gang block data if compressed.
- Verify checksum of a gang header.
- Verify checksum of assembled gang block data.
- Verify checksum of uber block.

Submitted by:	avg
MFC after:	3 days
2011-10-20 15:42:38 +00:00
pjd
2b80d6e6dd Always pass data size for checksum verification function, as using
physical block size declared in bp may not always be what we want.
For example in case of gang block header physical block size declared
in bp is much larger than SPA_GANGBLOCKSIZE (512 bytes) and checksum
calculation failed. This bug could lead to accessing unallocated
memory and resets/failures during boot.

MFC after:	3 days
2011-10-19 23:44:38 +00:00
pjd
5418d81b38 Never pass NULL block pointer when reading. This is neither expected nor
handled by lower layers like vdev_raidz, which uses bp for checksum
verification. This bug could lead to NULL pointer reference and resets
during boot.

MFC after:	3 days
2011-10-19 23:40:37 +00:00
pjd
dca375324c Don't mark vdev as healthy too soon, so we won't try to use invalid vdevs.
MFC after:	3 days
2011-10-19 23:37:30 +00:00
avg
ff769d30aa zfstest: rename to zfsboottest and move to tools
Approved by:	re (kib)
MFC after:	1 week
2011-09-16 08:22:48 +00:00
avg
dab0468c87 zfstest: cleanup the code, improve functionality and diagnostics
The utility is not connected to the build, so it should be safe
to update it.
To do: move the utility to tools/.
Some code is provided by Peter Jeremy <peterjeremy@acm.org>

Tested by:	Sebastian Chmielewski <chmielsster@gmail.com>,
		Peter Jeremy <peterjeremy@acm.org> (earlier versions)
Approved by:	re (kib)
MFC after:	4 days
2011-09-13 14:01:35 +00:00
pjd
1b03c5bf41 Finally... Import the latest open-source ZFS version - (SPA) 28.
Few new things available from now on:

- Data deduplication.
- Triple parity RAIDZ (RAIDZ3).
- zfs diff.
- zpool split.
- Snapshot holds.
- zpool import -F. Allows to rewind corrupted pool to earlier
  transaction group.
- Possibility to import pool in read-only mode.

MFC after:	1 month
2011-02-27 19:41:40 +00:00
dim
2543f7030b On i386 and amd64, consistently use the following options whenever we
want to avoid using any "advanced" CPU features:

  -mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3 -msoft-float
2011-01-05 22:24:33 +00:00
dim
ec93578a69 In lib/libstand, sys/boot/ficl and sys/boot/zfs, -mno-sse3 should also
be used for amd64, not just for i386.
2011-01-05 22:00:37 +00:00
pjd
891c7fcf8c - Split code shared by almost any boot loader into separate files and
clean up most layering violations:

	sys/boot/i386/common/rbx.h:

		RBX_* defines
		OPT_SET()
		OPT_CHECK()

	sys/boot/common/util.[ch]:

		memcpy()
		memset()
		memcmp()
		bcpy()
		bzero()
		bcmp()
		strcmp()
		strncmp() [new]
		strcpy()
		strcat()
		strchr()
		strlen()
		printf()

	sys/boot/i386/common/cons.[ch]:

		ioctrl
		putc()
		xputc()
		putchar()
		getc()
		xgetc()
		keyhit() [now takes number of seconds as an argument]
		getstr()

	sys/boot/i386/common/drv.[ch]:

		struct dsk
		drvread()
		drvwrite() [new]
		drvsize() [new]

	sys/boot/common/crc32.[ch] [new]

	sys/boot/common/gpt.[ch] [new]

- Teach gptboot and gptzfsboot about new files. I haven't touched the
  rest, but there is still a lot of code duplication to be removed.

- Implement full GPT support. Currently we just read primary header and
  partition table and don't care about checksums, etc. After this change we
  verify checksums of primary header and primary partition table and if
  there is a problem we fall back to backup header and backup partition
  table.

- Clean up most messages to use prefix of boot program, so in case of an
  error we know where the error comes from, eg.:

	gptboot: unable to read primary GPT header

- If we can't boot, print boot prompt only once and not every five
  seconds.

- Honour newly added GPT attributes:

	bootme - this is bootable partition
	bootonce - try to boot from this partition only once
	bootfailed - we failed to boot from this partition

- Change boot order of gptboot to the following:

	1. Try to boot from all the partitions that have both 'bootme'
	   and 'bootonce' attributes one by one.
	2. Try to boot from all the partitions that have only 'bootme'
	   attribute one by one.
	3. If there are no partitions with 'bootme' attribute, boot from
	   the first UFS partition.

- The 'bootonce' functionality is implemented in the following way:

	1. Walk through all the partitions and when 'bootonce'
	   attribute is found without 'bootme' attribute, remove
	   'bootonce' attribute and set 'bootfailed' attribute.
	   'bootonce' attribute alone means that we tried to boot from
	   this partition, but boot failed after leaving gptboot and
	   machine was restarted.
	2. Find partition with both 'bootme' and 'bootonce' attributes.
	3. Remove 'bootme' attribute.
	4. Try to execute /boot/loader or /boot/kernel/kernel from that
	   partition. If succeeded we stop here.
	5. If execution failed, remove 'bootonce' and set 'bootfailed'.
	6. Go to 2.

   If whole boot succeeded there is new /etc/rc.d/gptboot script coming
   that will log all partitions that we failed to boot from (the ones with
   'bootfailed' attribute) and will remove this attribute. It will also
   find partition with 'bootonce' attribute - this is the partition we
   booted from successfully. The script will log success and remove the
   attribute.

   All the GPT updates we do here goes to both primary and backup GPT if
   they are valid. We don't touch headers or partition tables when
   checksum doesn't match.

Reviewed by:	arch (Message-ID: <20100917234542.GE1902@garage.freebsd.pl>)
Obtained from:	Wheel Systems Sp. z o.o. http://www.wheelsystems.com
MFC after:	2 weeks
2010-09-24 19:49:12 +00:00
pjd
1fabf17c4c Remove magic value. 2010-09-17 22:51:45 +00:00
pjd
c934528337 Remove empty lines committed by accident.
MFC after:	2 weeks
2010-09-09 21:32:09 +00:00
pjd
beba890b25 Ignore log vdevs.
MFC after:	2 weeks
2010-09-09 21:19:09 +00:00
pjd
4cc2af01fb Allow to boot from a pool within which replacing is in progress.
Before the change it wasn't possible and the following error was printed:

	ZFS: can only boot from disk, mirror or raidz vdevs

Now if the original vdev (the one we are replacing) is still present we will
read from it, but if it is not present we won't read from the new vdev, as it
might not have enough valid data yet.

MFC after:	2 weeks
2010-09-09 21:18:00 +00:00
pjd
87570d6f03 Remove duplicated code.
MFC after:	2 weeks
2010-09-09 21:15:16 +00:00
imp
86c4c6b1df MF tbemd: Minor tweaks, prefer MACHINE_CPUARCH generally to MACHINE_ARCH (which simplifies some powerpc/powerpc64 ifs) 2010-08-23 01:50:34 +00:00
mm
fe97da49a3 Return EIO if vdev->v_phys_read is NULL.
This fixes booting from a ZFS mirror with a unavailable primary device.

PR:		kern/148655
Reviewed by:	avg
Approved by:	delphij (mentor)
MFC after:	3 days
2010-08-09 06:36:11 +00:00
dfr
0a6d03ea0c A simple test harness to help debug problems with the ZFS boot code. 2010-07-30 13:54:15 +00:00
avg
bec30888be zfs boot: fix error handling in zfs_readdir
Found by:	clang static analyzer
MFC after:	4 days
2010-05-31 09:06:03 +00:00
avg
5a45693652 boot/zfs: fix gang block reading code
- use correct size (512) while reading a gang block
- skip holes while reading child blocks
- advance buffer pointer while reading child blocks

PR:		144214
MFC after:	10 days
2010-05-28 07:34:20 +00:00
pjd
438d612346 Update comment. We also look for GPT partitions. 2010-02-18 22:23:30 +00:00
delphij
f487d8fd62 Space cleanup for revision 201689 committed separately for easier review.
This commit is purely space changes.

Submitted by:	Matt Reimer
Sponsored by:	VPOP Technologies, Inc.
MFC after:	2 weeks
2010-01-06 23:11:56 +00:00
delphij
66f8e0d24f Instead of assuming all vdevs are healthy, check the newest vdev label
for each vdev's status.  Booting from a degraded vdev should now be
more robust.

Submitted by:	Matt Reimer <mattjreimer at gmail.com>
Sponsored by:	VPOP Technologies, Inc.
MFC after:	2 weeks
2010-01-06 23:09:23 +00:00
jhb
7e12f26423 - Port bios_getmem() from libi386 to {gpt,}zfsboot() and use it to
safely allocate a heap region above 1MB.  This enables {gpt,}zfsboot()
  to allocate much larger buffers than before.
- Use a larger buffer (1MB instead of 128K) for temporary ZFS buffers.  This
  allows more reliable reading of compressed files in a raidz/raidz2 pool.

Submitted by:	Matt Reimer  mattjreimer of gmail
MFC after:	1 week
2009-12-09 20:36:56 +00:00
rnoland
8a200b8ecf Correct some issues with zfs boot.
- Teach it to read gang blocks. (essentially untested)
   If you see "ZFS: gang block detected!", please let
   me know, so we can either remove the printf if it
   works, or fix it if it doesn't.

 - If multiple partitions exist on a disk, probe them all.
   We also need to reset dsk->start to 0 to read the right
   sector here.

 - With GPT, we can have 128 partitions.

 - If the bootfs property has ever been set on a pool
   it seems that it never goes away.  zpool won't allow
   you to add to the pool with the bootfs property set.
   However, if you clear the property back to default
   we end up getting 0 for the object number and read
   a bogus block pointer and fail to boot.

 - Fix some error printfs. The printf in the loader is
   only capable of c,s and u formats.

 - Teach printf how to display %llu

Reviewed by:	dfr, jhb
MFC after:	2 weeks
2009-10-23 18:44:53 +00:00
dfr
0db82eb221 Add support for booting from raidz1 and raidz2 pools. 2009-05-16 10:48:20 +00:00
dfr
5823684a8b Use full 64bit arithmetic when converting file offsets to block numbers - fixes
booting on filesystems with inode numbers with values above 4194304.

Submitted by:	ps
2008-12-17 18:12:01 +00:00
ps
1b2020cf17 Fix a leak introduced in r185902. We should free the devspec if
we've successfully found a zfs pool.
2008-12-11 16:48:35 +00:00
ps
dcd4d48fc8 Avoid a double free in devopen by not freeing the device structure
in zfs_dev_open.  This stops a panic in the loader when trying to
read from a zfs device and no zfs devices exist.
2008-12-11 02:23:49 +00:00
dfr
5319dab7a9 Don't get confused if we encounter a device which is part of a raidz or raidz2
pool while probing for vdevs.

PR:		129539
Submitted by:	Paul Wootton (paul at fletchermoorland dot co dot uk)
2008-12-10 10:46:34 +00:00
ps
78e3ca9845 Correct include path for i386 specific includes. This allows zfs
to boot on systems where the loader is built on amd64 systems.
2008-12-06 14:45:03 +00:00
dfr
fbf7bda4ae Fix amd64 build and re-enable gptzfsboot. 2008-11-22 14:24:55 +00:00
dfr
f0b4df1a33 Some zfsboot fixes from Norikatsu Shigemura:
1. zfsboot2 (boot2) doesn't %d (printf), so change %d to %u.
2. chase new zpool versioning as SPA_VERSION.
   Obtained from: sys/cddl/contrib/opensolaris/uts/common/sys/fs/zfs.h

Submitted by:	nork
2008-11-19 16:59:19 +00:00
dfr
d6f289d443 Add a GPT-aware variant of zfsboot which should be used in a similar manner
to gptboot, i.e. installed in a freebsd-boot partition using /sbin/gpart or
/sbin/gpt.

Tweak the /boot/loader ZFS support so that it can find ZFS pools that are
contained in GPT partitions.
2008-11-19 16:39:01 +00:00
pjd
bbe899b96e Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes.
This bring huge amount of changes, I'll enumerate only user-visible changes:

- Delegated Administration

	Allows regular users to perform ZFS operations, like file system
	creation, snapshot creation, etc.

- L2ARC

	Level 2 cache for ZFS - allows to use additional disks for cache.
	Huge performance improvements mostly for random read of mostly
	static content.

- slog

	Allow to use additional disks for ZFS Intent Log to speed up
	operations like fsync(2).

- vfs.zfs.super_owner

	Allows regular users to perform privileged operations on files stored
	on ZFS file systems owned by him. Very careful with this one.

- chflags(2)

	Not all the flags are supported. This still needs work.

- ZFSBoot

	Support to boot off of ZFS pool. Not finished, AFAIK.

	Submitted by:	dfr

- Snapshot properties

- New failure modes

	Before if write requested failed, system paniced. Now one
	can select from one of three failure modes:
	- panic - panic on write error
	- wait - wait for disk to reappear
	- continue - serve read requests if possible, block write requests

- Refquota, refreservation properties

	Just quota and reservation properties, but don't count space consumed
	by children file systems, clones and snapshots.

- Sparse volumes

	ZVOLs that don't reserve space in the pool.

- External attributes

	Compatible with extattr(2).

- NFSv4-ACLs

	Not sure about the status, might not be complete yet.

	Submitted by:	trasz

- Creation-time properties

- Regression tests for zpool(8) command.

Obtained from:	OpenSolaris
2008-11-17 20:49:29 +00:00