in the Pass 5 checks. The manifestation was fsck_ffs exiting with this error:
** Phase 5 - Check Cyl groups
fsck_ffs: inoinfo: inumber 18446744071562087424 out of range
The error only manifests itself for filesystems bigger than about 100Tb.
Reported by: Nikita Grechikhin <ngrechikhin at yandex.ru>
MFC after: 2 weeks
Sponsored by: Netflix
document the libufs interface for fetching and storing inodes.
The undocumented getino / putino interface has been replaced
with a new getinode / putinode interface.
Convert the utilities that had been using the undocumented
interface to use the new documented interface.
No functional change (as for now the libufs library does not
do inode check-hashes).
Reviewed by: kib
Tested by: Peter Holm
Sponsored by: Netflix
folks running filesystems created on check-hash enabled kernels
(which I will call "new") on a non-check-hash enabled kernels (which
I will call "old). The idea here is to detect when a filesystem is
run on an old kernel and flag the filesystem so that when it gets
moved back to a new kernel, it will not start getting a slew of
check-hash errors.
Back when the UFS version 2 filesystem was created, it added a file
flag FS_INDEXDIRS that was to be set on any filesystem that kept
some sort of on-disk indexing for directories. The idea was precisely
to solve the issue we have today. Specifically that a newer kernel
that supported indexing would be able to tell that the filesystem
had been run on an older non-indexing kernel and that the indexes
should not be used until they had been rebuilt. Since we have never
implemented on-disk directory indicies, the FS_INDEXDIRS flag is
cleared every time any UFS version 2 filesystem ever created is
mounted for writing.
This commit repurposes the FS_INDEXDIRS flag as the FS_METACKHASH
flag. Thus, the FS_METACKHASH is definitively known to have always
been cleared. The FS_INDEXDIRS flag has been moved to a new block
of flags that will always be cleared starting with this commit
(until they get used to implement some future feature which needs
to detect that the filesystem was mounted on a kernel that predates
the new feature).
If a filesystem with check-hashes enabled is mounted on an old
kernel the FS_METACKHASH flag is cleared. When that filesystem is
mounted on a new kernel it will see that the FS_METACKHASH has been
cleared and clears all of the fs_metackhash flags. To get them
re-enabled the user must run fsck (in interactive mode without the
-y flag) which will ask for each supported check hash whether it
should be rebuilt and enabled. When fsck is run in its default preen
mode, it will just ignore the check hashes so they will remain
disabled.
The kernel has always disabled any check hash functions that it
does not support, so as more types of check hashes are added, we
will get a non-surprising result. Specifically if filesystems get
moved to kernels supporting fewer of the check hashes, those that
are not supported will be disabled. If the filesystem is moved back
to a kernel with more of the check-hashes available and fsck is run
interactively to rebuild them, then their checking will resume.
Otherwise just the smaller subset will be checked.
A side effect of this commit is that filesystems running with
cylinder-group check hashes will stop having them checked until
fsck is run to re-enable them (since none of them currently have
the FS_METACKHASH flag set). So, if you want check hashes enabled
on your filesystems after booting a kernel with these changes, you
need to run fsck to enable them. Any newly created filesystems will
have check hashes enabled. If in doubt as to whether you have check
hashes emabled, run dumpfs and look at the list of enabled flags
at the end of the superblock details.
previous behavior is preserved (the CG checksum is fixed). We're just
noisy about it now.
Reviewed by: kirk@
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D13884
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
1200046, the first version that supports this feature. If we set it,
then use an old kernel, we'll break the 'contract' of having
checksummed cylinder groups this flag signifies. To avoid creating
something with an inconsistent state, don't turn the flag on in these
cases. The first full fsck with a new kernel will turn this on.
Spnsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D13114
check hash to cylinder groups. If a check hash fails when a cylinder
group is read, no further allocations are attempted in that cylinder
group until it has been fixed by fsck. This avoids a class of
filesystem panics related to corrupted cylinder group maps. The
hash is done using crc32c.
Check hases are added only to UFS2 and not to UFS1 as UFS1 is primarily
used in embedded systems with small memories and low-powered processors
which need as light-weight a filesystem as possible.
Specifics of the changes:
sys/sys/buf.h:
Add BX_FSPRIV to reserve a set of eight b_xflags that may be used
by individual filesystems for their own purpose. Their specific
definitions are found in the header files for each filesystem
that uses them. Also add fields to struct buf as noted below.
sys/kern/vfs_bio.c:
It is only necessary to compute a check hash for a cylinder
group when it is actually read from disk. When calling bread,
you do not know whether the buffer was found in the cache or
read. So a new flag (GB_CKHASH) and a pointer to a function to
perform the hash has been added to breadn_flags to say that the
function should be called to calculate a hash if the data has
been read. The check hash is placed in b_ckhash and the B_CKHASH
flag is set to indicate that a read was done and a check hash
calculated. Though a rather elaborate mechanism, it should
also work for check hashing other metadata in the future. A
kernel internal API change was to change breada into a static
fucntion and add flags and a function pointer to a check-hash
function.
sys/ufs/ffs/fs.h:
Add flags for types of check hashes; stored in a new word in the
superblock. Define corresponding BX_ flags for the different types
of check hashes. Add a check hash word in the cylinder group.
sys/ufs/ffs/ffs_alloc.c:
In ffs_getcg do the dance with breadn_flags to get a check hash and
if one is provided, check it.
sys/ufs/ffs/ffs_vfsops.c:
Copy across the BX_FFSTYPES flags in background writes.
Update the check hash when writing out buffers that need them.
sys/ufs/ffs/ffs_snapshot.c:
Recompute check hash when updating snapshot cylinder groups.
sys/libkern/crc32.c:
lib/libufs/Makefile:
lib/libufs/libufs.h:
lib/libufs/cgroup.c:
Include libkern/crc32.c in libufs and use it to compute check
hashes when updating cylinder groups.
Four utilities are affected:
sbin/newfs/mkfs.c:
Add the check hashes when building the cylinder groups.
sbin/fsck_ffs/fsck.h:
sbin/fsck_ffs/fsutil.c:
Verify and update check hashes when checking and writing cylinder groups.
sbin/fsck_ffs/pass5.c:
Offer to add check hashes to existing filesystems.
Precompute check hashes when rebuilding cylinder group
(although this will be done when it is written in fsutil.c
it is necessary to do it early before comparing with the old
cylinder group)
sbin/dumpfs/dumpfs.c
Print out the new check hash flag(s)
sbin/fsdb/Makefile:
Needs to add libufs now used by pass5.c imported from fsck_ffs.
Reviewed by: kib
Tested by: Peter Holm (pho)
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.
Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96
Clang errors around printf could be trivially fixed, but the breakage in
sbin/fsdb were to significant for this type of change.
Submitter of this changeset has been notified and hopefully this can be
restored soon.
that they do not need to be read again in pass5. As this nearly
doubles the memory requirement for fsck, the cache is thrown away
if other memory needs in fsck would otherwise fail. Thus, the
memory footprint of fsck remains unchanged in memory constrained
environments.
This work was inspired by a paper presented at Usenix's FAST '13:
www.usenix.org/conference/fast13/ffsck-fast-file-system-checker
Details of this implementation appears in the April 2013 of ;login:
www.usenix.org/publications/login/april-2013-volume-38-number-2.
A copy of the April 2013 ;login: paper can also be downloaded
from: www.mckusick.com/publications/faster_fsck.pdf.
Reviewed by: kib
Tested by: Peter Holm
MFC after: 4 weeks
that was built before ffs grew support for TRIM, your filesystem will have
plenty of free blocks that the flash chip doesn't know are free, so it
can't take advantage of them for wear leveling. Once you've upgraded your
kernel, you enable TRIM on the filesystem (tunefs -t enable), then run
fsck_ffs -E on it before mounting it.
I tested this patch by half-filling an mdconfig'ed filesystem image,
running fsck_ffs -E on it, then verifying that the contents were not
damaged by comparing them to a pristine copy using rsync's checksum
functionality. There is no reliable way to test it on real hardware.
Many thanks to mckusick@, who provided the tricky parts of this patch and
reviewed the final version.
Reviewed by: mckusick@
MFC after: 3 weeks
brings in support for an optional intent log which eliminates the need
for background fsck on unclean shutdown.
Sponsored by: iXsystems, Yahoo!, and Juniper.
With help from: McKusick and Peter Holm
background fsck on the same file system might then print negative
numbers for reclaimed directories/files/fragments.
Address the issue in a limited degree, by using old summary data for
cg when bgfsck is performed.
Submitted by: tegge
MFC after: 1 week
systems less than 1 TB, due to using 32-bits integers for file system block
numbers. This also causes incorrect error reporting for foreground fsck.
Convert it to use ufs2_daddr_t for block numbers.
PR: kern/127951
Submitted by: tegge
MFC after: 1 week
and -p flag was given perform fast file system checking (bascially only
garbage collecting of orphaned objects).
Rename bread() to blread() and bwrite() to blwrite() as we now link to
the libufs library, which also implement functions with that names.
Sponsored by: home.pl
initializing the sysctl mibs data before actually using them.
The original patchset (which is the actual version that is running
on my testboxes) have checked whether all of these sysctls and
refuses to do background fsck if we don't have them. Kirk has
pointed out that refusing running fsck on old kernels is pointless,
as old kernels will recompute the summary at mount time, so I
have removed these checks.
Unfortunatelly, as the checks will initialize the mib values of
those sysctl's, and which are vital for the runtime summary
adjustment to work, we can not simply remove the check, which
will lead to problem when running background fsck over a dirty
volume. Add these checks in a different way: give a warning rather
than refusing to work, and complain if the functionality is not
available when adjustments are necessary.
Noticed by: A power failure at my lab
Pointy hat: me
MFC After: 3 days
very slow process, especially for large file systems that is just
recovered from a crash.
Since the summary is already re-sync'ed every 30 second, we will
not lag behind too much after a crash. With this consideration
in mind, it is more reasonable to transfer the responsibility to
background fsck, to reduce the delay after a crash.
Add a new sysctl variable, vfs.ffs.compute_summary_at_mount, to
control this behavior. When set to nonzero, we will get the
"old" behavior, that the summary is computed immediately at mount
time.
Add five new sysctl variables to adjust ndir, nbfree, nifree,
nffree and numclusters respectively. Teach fsck_ffs about these
API, however, intentionally not to check the existence, since
kernels without these sysctls must have recomputed the summary
and hence no adjustments are necessary.
This change has eliminated the usual tens of minutes of delay of
mounting large dirty volumes.
Reviewed by: mckusick
MFC After: 1 week
count of zero and instead encode this information in the inode state.
Pass 4 performed a linear search of this list for each inode in
the file system, which performs poorly if the list is long.
Reviewed by: sam & keramida (an earlier version of the patch), mckusick
MFC after: 1 month
It seems a common corruption to have them -ve (I've seen it several times)
and if fsck doesn't fix it, it leads to a kernel pagefault.
Reviewd by: kirk
Submitted by: Eric Jacobs <eaja@erols.com> and me independently.
MFC in: 2 days
PR: bin/40967
Approved by: re
UFS2 commit.
These bits in essence made any instance of "softupdates expected
corrution", (ie blocks marked allocated but not referenced by an
inode etc) result in a exit value for fsck_ffs of 2.
2 is part of the magic and appearantly undocumented protocol between
fsck_FOO and fsck and means "dump into single user mode ASAP.
Sponsored by: DARPA & NAI Labs.
filesystem expands the inode to 256 bytes to make space for 64-bit
block pointers. It also adds a file-creation time field, an ability
to use jumbo blocks per inode to allow extent like pointer density,
and space for extended attributes (up to twice the filesystem block
size worth of attributes, e.g., on a 16K filesystem, there is space
for 32K of attributes). UFS2 fully supports and runs existing UFS1
filesystems. New filesystems built using newfs can be built in either
UFS1 or UFS2 format using the -O option. In this commit UFS1 is
the default format, so if you want to build UFS2 format filesystems,
you must specify -O 2. This default will be changed to UFS2 when
UFS2 proves itself to be stable. In this commit the boot code for
reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c)
as there is insufficient space in the boot block. Once the size of the
boot block is increased, this code can be defined.
Things to note: the definition of SBSIZE has changed to SBLOCKSIZE.
The header file <ufs/ufs/dinode.h> must be included before
<ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and
ufs_lbn_t.
Still TODO:
Verify that the first level bootstraps work for all the architectures.
Convert the utility ffsinfo to understand UFS2 and test growfs.
Add support for the extended attribute storage. Update soft updates
to ensure integrity of extended attribute storage. Switch the
current extended attribute interfaces to use the extended attribute
storage. Add the extent like functionality (framework is there,
but is currently never used).
Sponsored by: DARPA & NAI Labs.
Reviewed by: Poul-Henning Kamp <phk@freebsd.org>
It does not help modern compilers, and some may take some hit from it.
(I also found several functions that listed *every* of its 10 local vars with
"register" -- just how many free registers do people think machines have?)
These were mainly missing casts or wrong format strings in printf
statements, but there were also missing includes, unused variables,
functions and arguments.
The choice of `long' vs `int' still seems almost random in a lot
of places though.
1) Set the FS_NEEDSFSCK flag when unexpected problems are encountered.
2) Clear the FS_NEEDSFSCK flag after a successful foreground cleanup.
3) Refuse to run in background when the FS_NEEDSFSCK flag is set.
4) Avoid taking and removing a snapshot when the filesystem is already clean.
5) Properly implement the force cleaning (-f) flag when in preen mode.
Note that you need to have revision 1.21 (date: 2001/04/14 05:26:28) of
fs.h installed in <ufs/ffs/fs.h> defining FS_NEEDSFSCK for this to compile.
affect current systems until fsck is modified to use these new
facilities. To try out this change, set the fsck passno to zero
in /etc/fstab to cause the filesystem to be mounted without running
fsck, then run `fsck_ffs -p -B <filesystem>' after the system has
been brought up multiuser to run a background cleanup on <filesystem>.
Note that the <filesystem> in question must have soft updates enabled.
a SIGINFO (normally via Ctrl-T), a line will be output indicating
the current phase number and progress information relevant to the
current phase.
Approved by: mckusick
effect on operation of fsck on filesystems without snapshots.
If you get compilation errors, be sure that you have copies of
/usr/include/sys/mount.h (1.94), /usr/include/sys/stat.h (1.21),
and /usr/include/ufs/ffs/fs.h (1.16) as of July 4, 2000 or later.