Identify each of the superblock validation checks as either a
warning or a fatal error. Any integrity check that can cause a
system hang or crash is marked as fatal. Those that may simply
lead to poor file layoutor other less good operating conditions
are marked as warning.
Normally both fatal and warning are treated as errors and prevent
the superblock from being loaded. A new flag, UFS_NOWARNFAIL, is
added. When passed to ffs_sbget() it will note warnings that it
finds, but will still proceed with loading the superblock. Note
that when UFS_NOWARNFAIL is used, it also includes UFS_NOHASHFAIL.
No legitimate superblocks should fail as a result of these changes.
Further updates based on analysis of the way the fields are used
in the various filesystem macros defined in fs.h.
Eliminate several checks for non-negative values where the fields
are checked for specific values. Since these specific values are
non-negative, if the value is a verified positive value then it
cannot be negative and such a check is redundant and unnecessary.
No legitimate superblocks should fail as a result of these changes.
Rather than trying to shoehorn flags into the requested superblock
address, create a separate flags parameter to the ffs_sbget()
function in sys/ufs/ffs/ffs_subr.c. The ffs_sbget() function is
used both in the kernel and in user-level utilities through export
to the sbget() function in the libufs(3) library (see sbget(3)
for details). The kernel uses ffs_sbget() when mounting UFS
filesystems, in the glabel(8) and gjournal(8) GEOM utilities,
and in the standalone library used when booting the system
from a UFS root filesystem.
The ffs_sbget() function reads the superblock located at the byte
offset specified by its sblockloc parameter. The value UFS_STDSB
may be specified for sblockloc to request that the standard
location for the superblock be read.
The two existing options are now flags:
UFS_NOHASHFAIL will note if the check hash is wrong but will still
return the superblock. This is used by the bootstrap code to
give the system a chance to come up so that fsck can be run to
correct the problem.
UFS_NOMSG indicates that superblock inconsistency error messages
should not be printed. It is used by programs like fsck that
want to print their own error message and programs like glabel(8)
that just want to know if a UFS filesystem exists on a partition.
One additional flag is added:
UFS_NOCSUM causes only the superblock itself to be returned, but does
not read in any auxiliary data structures like the cylinder group
summary information. It is used by clients like glabel(8) that
just want to check for possible filesystem types. Using UFS_NOCSUM
skips the superblock checks for csum data which allows superblocks
that have corrupted csum data to be read and used.
The validate_sblock() function checks that the superblock has not
been corrupted in a way that can crash or hang the system. Unless
the UFS_NOMSG flag is specified, it will print out any errors that
it finds. Prior to this commit, validate_sblock() returned as soon
as it found an inconsistency so would print at most one message.
It now does all its checks so when UFS_NOMSG has not been specified
will print out everything that it finds inconsistent.
Sponsored by: The FreeBSD Foundation
Reorder a few checks to ensure fields have been checked before
using them to check other fields.
Add eight new checks mostly checking for non-negative values.
No legitimate superblocks should fail as a result of these changes.
A better fix to commit 9e1f44d044. Rather than coping with the case
where a backup superblock is used, catch the case when the superblock
is being read in and ensure that the standard one is used rather than
the backup one.
Older versions of growfs(8) failed to correctly update fs_dsize.
Filesystems that have been grown fail the test for fs_dsize's correct
value. For now we exclude the fs_dsize test from the requirements.
Reported by: Edward Tomasz Napiera
Tested by: Edward Tomasz Napiera
Tested by: Peter Holm
MFC after: 1 month (with 076002f24d)
Differential Revision: https://reviews.freebsd.org/D35219
The original check verified that if an alternate superblock has not
been selected that the superblock is located in its standard location.
For UFS1 the with a 65536 block size, the first backup superblock
is at the same location as the UFS2 superblock. Since SBLOCK_UFS2
is the first location checked, the first backup is the superblock
that will be used for a UFS1 filesystems with a 65536 block size.
This patch allows the use of the first backup superblock in that
situation.
Reported by: Peter Holm
Tested by: Peter Holm
MFC after: 1 month (with 076002f24d)
Differential Revision: https://reviews.freebsd.org/D35219
The tests for number of cylinder groups (fs_ncg), inodes per cylinder
group (fs_ipg), and the size and layout of the cylinder group summary
information (fs_csaddr and fs_cssize) were overly restrictive and
would exclude some valid filesystems. These updates avoid precluding
valid fiesystems while still detecting rogue values that can crash or
hang the kernel.
Reported by: Chuck Silvers
Tested by: Peter Holm
MFC after: 1 month (with 076002f24d)
Differential Revision: https://reviews.freebsd.org/D35219
One of the checks was that the cylinder group size (fs_cgsize)
matched that calculated by CGSIZE(). The value calculated by CGSIZE()
has changed over time as the filesystem has evolved. Thus comparing
the value of CGSIZE() of the current generation filesystem may not
match the size as computed by CGSIZE() that was in effect at the
time an older filesystem was created. Therefore the check for
fs_cgsize is changed to simply ensure that it is not larger than
the filesystem blocksize (fs_bsize).
Reported by: Martin Birgmeier
Tested by: Martin Birgmeier
MFC after: 1 month (with 076002f24d)
PR: 264450
Differential Revision: https://reviews.freebsd.org/D35219
Two bugs have been reported with the UFS/FFS superblock integrity
checks that were added in commit 076002f24d.
The code checked that fs_sblockactualloc was properly set to the
location of the superblock. The fs_sblockactualloc field was an
addition to the superblock in commit dffce2150e on Jan 26 2018
and used a field that was zero in filesystems created before it
was added. The integrity check had to be expanded to accept the
fs_sblockactualloc field being zero so as not to reject filesystems
created before Jan 26 2018.
The integrity check set an upper bound on the value of fs_maxcontig
based on the maximum transfer size supported by the kernel. It
required that fs->fs_maxcontig <= maxphys / fs->fs_bsize. The kernel
variable maxphys defines the maximum transfer size permitted by the
controllers and/or buffering. The fs_maxcontig parameter controls the
maximum number of blocks that the filesystem will read or write in
a single transfer. It is calculated when the filesystem is created
as maxphys / fs_bsize. The bug appeared in the loader because it
uses a maxphys of 128K even when running on a system that supports
larger values. If the filesystem was built on a system that supports
a larger maxphys (1M is typical) it will have configured fs_maxcontig
for that larger system so would fail the test when run with the smaller
maxphys used by the loader. So we bound the upper allowable limit
for fs_maxconfig to be able to at least work with a 1M maxphys on the
smallest block size filesystem: 1M / 4096 == 256. We then use the
limit for fs_maxcontig as fs_maxcontig <= MAX(256, maxphys / fs_bsize).
There is no harm in allowing the mounting of filesystems that make larger
than maxphys I/O requests because those (mostly 32-bit machines) can
(very slowly) handle I/O requests that exceed maxphys.
Thanks to everyone who helped sort out the problems and the fixes.
Reported by: Cy Schubert, David Wolfskill
Diagnosis by: Mark Johnston, John Baldwin
Reviewed by: Warner Losh
Tested by: Cy Schubert, David Wolfskill
MFC after: 1 month (with 076002f24d)
Differential Revision: https://reviews.freebsd.org/D35219
Historically only minimal checks were made of a superblock when it
was read in as it was assumed that fsck would have been run to
correct any errors before attempting to use the filesystem. Recently
several bug reports have been submitted reporting kernel panics
that can be triggered by deliberately corrupting filesystem superblocks,
see Bug 263979 - [meta] UFS / FFS / GEOM crash (panic) tracking
which is tracking the reported corruption bugs.
This change upgrades the checks that are performed. These additional
checks should prevent panics from a corrupted superblock. Although
it appears in only one place, the new code will apply to the kernel
modules and (through libufs) user applications that read in superblocks.
Reported by: Robert Morris and Neeraj
Reviewed by: kib
Tested by: Peter Holm
PR: 263979
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D35219
When the kernel is requested to mount a filesystem with a bad superblock
check hash, it would set the flag in the superblock requesting that the
fsck(8) program be run. The flag is only written to disk as part of a
superblock update. Since the superblock always has its check hash updated
when it is written to disk, the problem for which the flag has been set
will no longer exist. Hence, it is counter-productive to set the flag
as it will just cause an unnecessary run of fsck if it ever gets written.
Sponsored by: Netflix
When reading UFS/FFS superblocks that have check hashes, both the kernel
and libufs print an error message if the check hash is incorrect. This
commit adds the ability to request that the error message not be made.
It is intended for use by programs like fsck that wants to print its
own error message and by kernel subsystems like glabel that just wants
to check for possible filesystem types.
This capability will be used in followup commits.
Sponsored by: Netflix
The STDSB macro is passed to the ffs_sbget() routine to fetch a
UFS/FFS superblock "from the stadard place". It was identically defined
in lib/libufs/libufs.h, stand/libsa/ufs.c, sys/ufs/ffs/ffs_extern.h,
and sys/ufs/ffs/ffs_subr.c. Delete it from these four files and
define it instead in sys/ufs/ffs/fs.h. All existing uses of this macro
already include sys/ufs/ffs/fs.h so no include changes need to be made.
No functional change intended.
Sponsored by: Netflix
over various major releases. Superblock check hashes were added for
the 12 release and cylinder-group and inode check hashes will appear
in the 13 release.
When a disk with a UFS filesystem is writably mounted, the kernel
clears the feature flags for anything that it does not support. For
example, if a UFS disk from a 12-stable kernel is mounted on an
11-stable system, the 11-stable kernel will clear the flag in the
filesystem superblock that indicates that superblock check-hashs
are being maintained. Thus if the disk is later moved back to a
12-stable system, the 12-stable system will know to ignore its
incorrect check-hash.
If the only filesystem modification done on the earlier kernel is
to run a utility such as growfs(8) that modifies the superblock but
neither updates the check-hash nor clears the feature flag indicating
that it does not support the check-hash, the disk will fail to mount
if it is moved back to its original newer kernel.
This patch moves the code that clears the filesystem feature flags
from the mount code (ffs_mountfs()) to the code that reads the
superblock (ffs_sbget()). As ffs_sbget() is used by the kernel mount
code and is imported into libufs(3), all the filesystem utilities
will now also clear these flags when they make modifications to the
filesystem.
As suggested by John Baldwin, fsck_ffs(8) has been changed to accept
and repair bad superblock check-hashes rather than refusing to run.
This change allows fsck to recover filesystems that have been impacted
by utilities older than those created after this change and is a
sensible thing to do in any event.
Reported by: John Baldwin (jhb@)
MFC after: 2 weeks
Sponsored by: Netflix
out verbatim to the disk: see ffs_sbput() in sys/ufs/ffs/ffs_subr.c.
It contains a pointer to the fs_summary_info structure. This pointer
value inadvertently causes garbage to be stored. It is garbage because
the pointer to the fs_summary_info structure is the address the then
current stack or heap. Although a mere pointer does not reveal anything
useful (like a part of a private key) to an attacker, garbage output
deteriorates reproducibility.
This commit zeros out the pointer to the fs_summary_info structure
before writing the out the superblock.
Reviewed by: kib
Tested by: Peter Holm
PR: 246983
Sponsored by: Netflix
fs_summary_info structure. This change was originally done
by the CheriBSD project as they need larger pointers that
do not fit in the existing superblock.
This cleanup of the superblock eases the task of the commit
that immediately follows this one.
Suggested by: brooks
Reviewed by: kib
PR: 246983
Sponsored by: Netflix
module from that file into ffs_vfsops.c. This fixes the build for kernel
configs that don't include FFS.
PR: 247256
Submitted by: glebius
Reviewed by: mckusick (earlier version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D25285
the underlying media fails or becomes inaccessible. For example
when a USB flash memory card hosting a UFS filesystem is unplugged.
The strategy for handling disk I/O errors when soft updates are
enabled is to stop writing to the disk of the affected file system
but continue to accept I/O requests and report that all future
writes by the file system to that disk actually succeed. Then
initiate an asynchronous forced unmount of the affected file system.
There are two cases for disk I/O errors:
- ENXIO, which means that this disk is gone and the lower layers
of the storage stack already guarantee that no future I/O to
this disk will succeed.
- EIO (or most other errors), which means that this particular
I/O request has failed but subsequent I/O requests to this
disk might still succeed.
For ENXIO, we can just clear the error and continue, because we
know that the file system cannot affect the on-disk state after we
see this error. For EIO or other errors, we arrange for the geom_vfs
layer to reject all future I/O requests with ENXIO just like is
done when the geom_vfs is orphaned. In both cases, the file system
code can just clear the error and proceed with the forcible unmount.
This new treatment of I/O errors is needed for writes of any buffer
that is involved in a dependency. Most dependencies are described
by a structure attached to the buffer's b_dep field. But some are
created and processed as a result of the completion of the dependencies
attached to the buffer.
Clearing of some dependencies require a read. For example if there
is a dependency that requires an inode to be written, the disk block
containing that inode must be read, the updated inode copied into
place in that buffer, and the buffer then written back to disk.
Often the needed buffer is already in memory and can be used. But
if it needs to be read from the disk, the read will fail, so we
fabricate a buffer full of zeroes and pretend that the read succeeded.
This zero'ed buffer can be updated and written back to disk.
The only case where a buffer full of zeros causes the code to do
the wrong thing is when reading an inode buffer containing an inode
that still has an inode dependency in memory that will reinitialize
the effective link count (i_effnlink) based on the actual link count
(i_nlink) that we read. To handle this case we now store the i_nlink
value that we wrote in the inode dependency so that it can be
restored into the zero'ed buffer thus keeping the tracking of the
inode link count consistent.
Because applications depend on knowing when an attempt to write
their data to stable storage has failed, the fsync(2) and msync(2)
system calls need to return errors if data fails to be written to
stable storage. So these operations return ENXIO for every call
made on files in a file system where we have otherwise been ignoring
I/O errors.
Coauthered by: mckusick
Reviewed by: kib
Tested by: Peter Holm
Approved by: mckusick (mentor)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D24088
when a superblock check-hash error is detected. This change clarifies
a mount that failed due to media hardware failures (EIO) from a mount
that failed due to media errors (EINTEGRITY) that can be corrected by
running fsck(8).
Sponsored by: Netflix
filesystems that have block pointers that are out-of-range for their
filesystem. These out-of-range block pointers are corrected by
fsck(8) so are only encountered when an unchecked filesystem is
mounted.
A new "untrusted" flag has been added to the generic mount interface
that can be set when mounting media of unknown provenance or integrity.
For example, a daemon that automounts a filesystem on a flash drive
when it is plugged into a system.
This commit adds a test to UFS/FFS that validates all block numbers
before using them. Because checking for out-of-range blocks adds
unnecessary overhead to normal operation, the tests are only done
when the filesystem is mounted as an "untrusted" filesystem.
Reported by: Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE
Reported as: FS-14-UFS-3: Out of bounds read in write-2 (ffs_alloccg)
Reviewed by: kib
Sponsored by: Netflix
rename the source to gsb_crc32.c.
This is a prerequisite of unifying kernel zlib instances.
PR: 229763
Submitted by: Yoshihiro Ota <ota at j.email.ne.jp>
Differential Revision: https://reviews.freebsd.org/D20193
the check-hash fails. Prior to the fix in -r342133 the inode with the
zeroed out check-hash was written back to disk causing further confusion.
Reported by: Gary Jennejohn (gj)
Sponsored by: Netflix
before copying in the inode so that the mode and link-count are not set
if the check-hash fails. This change ensures that the vnode will be properly
unwound and recycled rather than being held in the cache.
Initialize the file mode is zero so that if the loading of the inode
fails (for example because of a check-hash failure), the vnode will be
properly unwound and recycled.
Reported by: Gary Jennejohn (gj)
Sponsored by: Netflix
"panic: softdep_update_inodeblock: bad link count" when releasing
a partially initialized vnode after an inode check-hash failure.
Reported by: Gary Jennejohn <gljennjohn@gmail.com>
Reported by: Peter Holm (pho)
Sponsored by: Netflix
check hash to the filesystem inodes. Access attempts to files
associated with an inode with an invalid check hash will fail with
EINVAL (Invalid argument). Access is reestablished after an fsck
is run to find and validate the inodes with invalid check-hashes.
This check avoids a class of filesystem panics related to corrupted
inodes. The hash is done using crc32c.
Note this check-hash is for the inode itself and not any of its
indirect blocks. Check-hash validation may be extended to also
cover indirect block pointers, but that will be a separate (and
more costly) feature.
Check hashes are added only to UFS2 and not to UFS1 as UFS1 is
primarily used in embedded systems with small memories and low-powered
processors which need as light-weight a filesystem as possible.
Reviewed by: kib
Tested by: Peter Holm
Sponsored by: Netflix
superblock has a check-hash error, an error message noting the
superblock check-hash failure is printed and the mount fails. The
administrator then runs fsck to repair the filesystem and when
successful, the filesystem can once again be mounted.
This approach fails if the filesystem in question is a root filesystem
from which you are trying to boot. Here, the loader fails when trying
to access the filesystem to get the kernel to boot. So it is necessary
to allow the loader to ignore the superblock check-hash error and make
a best effort to read the kernel. The filesystem may be suffiently
corrupted that the read attempt fails, but there is no harm in trying
since the loader makes no attempt to write to the filesystem.
Once the kernel is loaded and starts to run, it attempts to mount its
root filesystem. Once again, failure means that it breaks to its prompt
to ask where to get its root filesystem. Unless you have an alternate
root filesystem, you are stuck.
Since the root filesystem is initially mounted read-only, it is
safe to make an attempt to mount the root filesystem with the failed
superblock check-hash. Thus, when asked to mount a root filesystem
with a failed superblock check-hash, the kernel prints a warning
message that the root filesystem superblock check-hash needs repair,
but notes that it is ignoring the error and proceeding. It does
mark the filesystem as needing an fsck which prevents it from being
enabled for writing until fsck has been run on it. The net effect
is that the reboot fails to single user, but at least at that point
the administrator has the tools at hand to fix the problem.
Reported by: Rick Macklem (rmacklem@)
Discussed with: Warner Losh (imp@)
Sponsored by: Netflix
predates metadata check hashes so that it is done before deciding
whether to compute a check-hash of the superblock.
Reported by: Rick Macklem <rmacklem@uoguelph.ca>
Sponsored by: Netflix
This corrects a bug that prevented snapshots from being mounted due to a
superblock check-hash failure.
Reported by: Brennan Vincent <brennan@umanwizard.com>
Tested by: Peter Holm (pho@)
Sponsored by: Netflix
document the libufs interface for fetching and storing inodes.
The undocumented getino / putino interface has been replaced
with a new getinode / putinode interface.
Convert the utilities that had been using the undocumented
interface to use the new documented interface.
No functional change (as for now the libufs library does not
do inode check-hashes).
Reviewed by: kib
Tested by: Peter Holm
Sponsored by: Netflix
check hash to the superblock. If a check hash fails when an attempt
is made to mount a filesystem, the mount fails with EINVAL (Invalid
argument). This avoids a class of filesystem panics related to
corrupted superblocks. The hash is done using crc32c.
Check hases are added only to UFS2 and not to UFS1 as UFS1 is primarily
used in embedded systems with small memories and low-powered processors
which need as light-weight a filesystem as possible.
Reviewed by: kib
Tested by: Peter Holm
Sponsored by: Netflix
Avoid Undefined Behavior in ffs_clusteracct()
Change the type of 'bit' variable from int to unsigned int and use unsigned
values consistently.
sys/ufs/ffs/ffs_subr.c:336:10, shift exponent -1 is negative
Detected with Kernel Undefined Behavior Sanitizer.
Reported by <Harry Pantazis>
Submitted by: Pedro Giffuni
to fix the memory leak that I introduced in r328426. Instead of
trying to clear up the possible memory leak in all the clients, I
ensure that it gets cleaned up in the source (e.g., ffs_sbget ensures
that memory is always freed if it returns an error).
The original change in r328426 was a bit sparse in its description.
So I am expanding on its description here (thanks cem@ and rgrimes@
for your encouragement for my longer commit messages).
In preparation for adding check hashing to superblocks, r328426 is
a refactoring of the code to get the reading/writing of the superblock
into one place. Unlike the cylinder group reading/writing which
ends up in two places (ffs_getcg/ffs_geom_strategy in the kernel
and cgget/cgput in libufs), I have the core superblock functions
just in the kernel (ffs_sbfetch/ffs_sbput in ffs_subr.c which is
already imported into utilities like fsck_ffs as well as libufs to
implement sbget/sbput). The ffs_sbfetch and ffs_sbput functions
take a function pointer to do the actual I/O for which there are
four variants:
ffs_use_bread / ffs_use_bwrite for the in-kernel filesystem
g_use_g_read_data / g_use_g_write_data for kernel geom clients
ufs_use_sa_read for the standalone code (stand/libsa/ufs.c
but not stand/libsa/ufsread.c which is size constrained)
use_pread / use_pwrite for libufs
Uses of these interfaces are in the UFS filesystem, geoms journal &
label, libsa changes, and libufs. They also permeate out into the
filesystem utilities fsck_ffs, newfs, growfs, clri, dump, quotacheck,
fsirand, fstyp, and quot. Some of these utilities should probably be
converted to directly use libufs (like dumpfs was for example), but
there does not seem to be much win in doing so.
Tested by: Peter Holm (pho@)
ffs_sbget() may return a superblock buffer even if it fails, so the
caller must be prepared to free it in this case. Moreover, when tasting
alternate superblock locations in a loop, ffs_sbget()'s readfunc
callback must free the previously allocated buffer.
Reported and tested by: pho
Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D14390
Specifically reading is done if ffs_sbget() and writing is done
in ffs_sbput(). These functions are exported to libufs via the
sbget() and sbput() functions which then used in the various
filesystem utilities. This work is in preparation for adding
subperblock check hashes.
No functional change intended.
Reviewed by: kib
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
Remove redunand i_dev and i_fs pointers, which are available as
ip->i_ump->um_dev and ip->i_ump->um_fs, and reorder members by size to
reduce padding. To compensate added derefences, the most often i_ump
access to differentiate between UFS1 and UFS2 dinode layout is
removed, by addition of the new i_flag IN_UFS2. Overall, this
actually reduces the amount of memory dereferences.
On 64bit machine, original struct inode size is 176, reduced to 152
bytes with the change.
Tested by: pho (previous version)
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks