running time for a full fsck. It also reduces the random access time
for large files and speeds the traversal time for directory tree walks.
The key idea is to reserve a small area in each cylinder group
immediately following the inode blocks for the use of metadata,
specifically indirect blocks and directory contents. The new policy
is to preferentially place metadata in the metadata area and
everything else in the blocks that follow the metadata area.
The size of this area can be set when creating a filesystem using
newfs(8) or changed in an existing filesystem using tunefs(8).
Both utilities use the `-k held-for-metadata-blocks' option to
specify the amount of space to be held for metadata blocks in each
cylinder group. By default, newfs(8) sets this area to half of
minfree (typically 4% of the data area).
This work was inspired by a paper presented at Usenix's FAST '13:
www.usenix.org/conference/fast13/ffsck-fast-file-system-checker
Details of this implementation appears in the April 2013 of ;login:
www.usenix.org/publications/login/april-2013-volume-38-number-2.
A copy of the April 2013 ;login: paper can also be downloaded
from: www.mckusick.com/publications/faster_fsck.pdf.
Reviewed by: kib
Tested by: Peter Holm
MFC after: 4 weeks
extended using growfs(8). The problem here is that geom_label checks if
the filesystem size recorded in UFS superblock is equal to the provider
(i.e. device) size. This check cannot be removed due to backward
compatibility. On the other hand, in most cases growfs(8) cannot set
fs_size in the superblock to match the provider size, because, differently
from newfs(8), it cannot recompute cylinder group sizes.
To fix this problem, add another superblock field, fs_providersize, used
only for this purpose. The geom_label(4) will attach if either fs_size
(filesystem created with newfs(8)) or fs_providersize (filesystem expanded
using growfs(8)) matches the device size.
PR: kern/165962
Reviewed by: mckusick
Sponsored by: FreeBSD Foundation
These tools declare global variables without using the static keyword,
even though their use is limited to a single C-file, or without placing
an extern declaration of them in the proper header file.
This avoids a potentially many-hours-long loop of failed writes if newfs
finds a partially-overwritten superblock (or, for that matter, random
garbage which happens to have superblock magic bytes); on one occasion I
found newfs trying to zero 800 million superblocks on a 50 MB disk.
Reviewed by: mckusick
MFC after: 1 week
include sys/time.h instead of time.h. This include is incorrect as
per the manpages for the APIs and the POSIX definitions. This commit
replaces sys/time.h where necessary with time.h.
The commit also includes some minor style(9) header fixup in newfs.
This commit is part of a larger effort by Garrett Cooper started in
//depot/user/gcooper/posix-conformance-work/ -- to make FreeBSD more
POSIX compliant.
Submitted by: Garrett Cooper yanegomi at gmail dot com
Large (60GB) filesystems created using "newfs -U -O 1 -b 65536 -f 8192"
show incorrect results from "df" for free and used space when mounted
immediately after creation. fsck on the new filesystem (before ever
mounting it once) gives a "SUMMARY INFORMATION BAD" error in phase 5.
This error hasn't occurred in any runs of fsck immediately after
"newfs -U -b 65536 -f 8192" (leaving out the "-O 1" option).
Solution:
The default UFS1 superblock is located at offset 8K in the filesystem
partition; the default UFS2 superblock is located at offset 64K in
the filesystem partition. For UFS1 filesystems with a blocksize of
64K, the first alternate superblock resides at 64K which is the the
location used for the default UFS2 superblock. By default, the
system first checks for a valid superblock at the default location
for a UFS2 filoesystem. For a UFS1 filesystem with a blocksize of
64K, there is a valid UFS1 superblock at this location. Thus, even
though it is expected to be a backup superblock, the system will
use it as its default superblock. So, we have to ensure that all the
statistcs on usage are correct in this first alternate superblock
as it is the superblock that will actually be used.
While tracking down this problem, another limitation of UFS1 became
evident. For UFS1, the number of inodes per cylinder group is stored
in an int16_t. Thus the maximum number of inodes per cylinder group
is limited to 2^15 - 1. This limit can easily be exceeded for block
sizes of 32K and above. Thus when building UFS1 filesystems, newfs
must limit the number of inodes per cylinder group to 2^15 - 1.
Reported by: Guy Helmer<ghelmer@palisadesys.com>
Followup by: Bruce Cran <brucec@freebsd.org>
PR: 107692
MFC after: 4 weeks
inodes by cutting back on the number of inodes per cylinder group if
necessary to stay under the limit. For a default (16K block) file
system, this limit begins to take effect for file systems above 32Tb.
This fix is in addition to -r203763 which corrected a problem in the
kernel that treated large inode numbers as negative rather than unsigned.
For a default (16K block) file system, this bug began to show up at a
file system size above about 16Tb.
Reported by: Scott Burns, John Kilburg, Bruce Evans
Followup by: Jeff Roberson
PR: 133980
MFC after: 2 weeks
want to prepare disk images for emulators (though 'makefs' in port
can do something similar).
This relies on:
+ minor changes to pass the consistency checks even when working on a file;
+ an additional option, '-p partition' , to specify the disk partition to
initialize;
+ some changes on the I/O routines to deal with partition offsets.
The latter was a bit tricky to implement, see the details in newfs.h:
in newfs, I/O is done through libufs which assumes that the file
descriptor refers to the whole partition. Introducing support for
the offset in libufs would require a non-backward compatible change
in the library, to be dealt with a version bump or with symbol
versioning.
I felt both approaches to be overkill for this specific application,
especially because there might be other changes to libufs that might
become necessary in the near future.
So I used the following trick:
- read access is always done by calling bread() directly, so we just add
the offset in the (few) places that call bread();
- write access is done through bwrite() and sbwrite(), which in turn
calls bwrite(). To avoid rewriting sbwrite(), we supply our own version
of bwrite() here, which takes precedence over the version in libufs.
MFC after: 4 weeks
Implement -E option which will erase the filesystem sectors before
making the new filesystem. Reserved space in front of the superblock
(bootcode) is not erased.
NB: Erasing can take as long time as writing every sector sequentially.
This is relevant for all flash based disks which use wearlevelling.
affect the largest file size that is allowed by the file system.
On the other hand, when creating a snapshot, the snapshot file will
appear as it is as big as the file system itself. Hence we will not
be able to create a file system on large file systems with small
block sizes.
Add a warning about this, and gives some hints to correct the issue.
Reviewed by: mckusick
MFC After: 1 week
the new filesystem. This is intended for memory and vnode filesystems
that will never be fsck'ed or dumped.
Obtained from: St. Bernard Software RAPID
MFC after: 2 weeks
has only been partly initialized via newfs(8) so that it applies to both
UFS1 and UFS2.
Submitted by: "Xin LI" delphij at frontfree dot net
MFC: maybe?
permits users of newfs to set the multilabel flag on UFS1 and UFS2
file systems from inception without using tunefs.
Obtained from: TrustedBSD Project
Sponsored by: DARPA, McAfee Research
of newfs, to signify the newfs operation has not yet completed. Re-
write the superblock with the correct magic number once all of the
cylinder groups have been created to show the operation has finished.
Sponsored by: St. Bernard Software
a new filesystem. Dump and fsck will create snapshots in this
directory rather than in the root for two reasons:
1) For terabyte-sized filesystems, the snapshot may require many
minutes to build. Although the filesystem will not be suspended
during most of the snapshot build, the snapshot file itself is
locked during the entire snapshot build period. Thus, if it is
accessed during the period that it is being built, the process
trying to access it will block holding its containing directory
locked. If the snapshot is in the root, the root will lock and
the system will come to a halt until the snapshot finishes. By
putting the snapshot in a subdirectory, it is out of the likely
path of any process traversing through the root and hence much
less likely to cause a lock race to the root.
2) The dump program is usually run by a non-root user running with
operator group privilege. Such a user is typically not permitted
to create files in the root of a filesystem. By having a directory
in group operator with group write access available, such a user
will be able to create a snapshot there. Having the dump program
create its snapshot in a subdirectory below the root will benefit
from point (1) as well.
Sponsored by: DARPA & NAI Labs.
The old way of just returning could result in a file system
extremely likely to panic the kernel. The warning printed
wouldn't help much since tools invoking newfs(8), e.g., mdmfs(8),
couldn't detect the error.
PR: bin/55078
MFC after: 1 week
with UFS1, the UFS1 superblocks were not deleted. This allowed any
RELENG_4 (or other non-UFS2-aware) fsck to think it knew how to "fix"
the file system, resulting in severe data scrambling.
This patch is a more advanced version than the one originally submitted.
Lukas improved it based on feedback from Kirk, and testing by me. It
blanks all UFS1 superblocks (if any) during a UFS2 newfs, thereby causing
fsck's that are not UFS2 aware to generate the "SEARCH FOR ALTERNATE
SUPER-BLOCK FAILED" message, and exit without damaging the fs.
PR: bin/51619
Submitted by: Lukas Ertl <l.ertl@univie.ac.at>
Reviewed by: kirk
Approved by: re (scottl)
changed to use libufs in revision 1.71. Without this, any write
failures in newfs were silently ignored.
Note that this will display a meaningless errno string in the case
of a short write as opposed to a write error, since bwrite()'s
return value does not allow the caller to determine if errno is
valid.
Reported by: Lukas Ertl <l.ertl@univie.ac.at>
Reviewed by: jmallett
Approved by: re (bmah)
values for the initial inode generation numbers in newfs and for
newly allocated inode generation numbers in the kernel.
Submitted by: Theo de Raadt <deraadt@cvs.openbsd.org>
Sponsored by: DARPA & NAI Labs.
version of such. Differences in filesystems generated were found to be
from 1) sbwrite with the "all" parameter 2) removal of writecache. The
sbwrite call was made to perform as the original version, and otherwise
this was checked against a version of newfs with the write cache removed.
so that fsck does not complain with `SUMMARY BLK COUNT(S) WRONG IN
SUPERBLK' the first time it is run on a new filesystem.
Reported by: Poul-Henning Kamp <phk@freebsd.org>
Sponsored by: DARPA & NAI Labs.
that the kernel will refuse to mount. Specifically it now enforces
the MAXBSIZE blocksize limit. This update also fixes a problem where
newfs could segment fault if the selected fragment size was too large.
PR: bin/30959
Submitted by: Ceri Davies <setantae@submonkey.net>
Sponsored by: DARPA & NAI Labs.
the old 8-bit fs_old_flags to the new location the first time that the
filesystem is mounted by a new kernel. One of the unused flags in
fs_old_flags is used to indicate that the flags have been moved.
Leave the fs_old_flags word intact so that it will work properly if
used on an old kernel.
Change the fs_sblockloc superblock location field to be in units
of bytes instead of in units of filesystem fragments. The old units
did not work properly when the fragment size exceeeded the superblock
size (8192). Update old fs_sblockloc values at the same time that
the flags are moved.
Suggested by: BOUWSMA Barry <freebsd-misuser@netscum.dyndns.dk>
Sponsored by: DARPA & NAI Labs.