The segfault was being hit in ckfini() (sbin/fsck_ffs/fsutil.c)
while attempting to traverse the buffer cache to flush dirty buffers.
The tail queue used for the buffer cache was not initialized before
dropping into gjournal_check(). Move the buffer initialization earlier
so that it has been done before calling gjournal_check().
Reported by: crypt47, nvass
Fix by: Robert Wing
Tested by: Robert Wing
PR: 255030
PR: 255979
MFC after: 3 days
Sponsored by: Netflix
Pass 1b of fsck_ffs runs only when Pass 1 has found duplicate blocks.
Pass 1 only knows that a block is duplicate when it finds the second
instance of its use. The role of Pass 1b is to find the first use
of all the duplicate blocks. It makes a pass over the cylinder groups
looking for these blocks. When moving to the next cylinder group,
Pass 1b failed to properly calculate the starting inode number for
the cylinder group resulting in the above error message when it
tried to read the first inode in the cylinder group.
Reported by: Px
Tested by: Px
PR: 255979
MFC after: 3 days
Sponsored by: Netflix
UFS does not allow files to end with a hole; it requires that the
last block of a file be allocated. As fsck_ffs(8) initially scans
each allocated inode, it tracks the last allocated block in the
inode. It then checks that the inode's size falls in the last
allocated block. If the last allocated block falls before the size,
a `file size beyond end of allocated file' warning is issued and
the file is shortened to reference the last allocated block (to avoid
having it reference a hole at its end). If the last allocated block
falls after the size, a `partially truncated file' warning is issued
and all blocks following the block referenced by the size are freed.
Because of an incorrect unsigned comparison, this test was failing
to handle files with no allocated blocks but non-zero size (which
should have had their size reset to zero). Once that was fixed the
test started incorrectly complaining about short symbolic links
that place the link path in the inode rather than in a disk block.
Because these symbolic links have a non-zero size, but no allocated
blocks, fsck_ffs wanted to zero out their size. This patch has to
detect and avoid changing the size of such symbolic links.
Reported by: Chuck Silvers
Tested by: Chuck Silvers
MFC after: 1 week
Sponsored by: Netflix
When fsck_ffs is creating a lost+found directory it must allocate
an inode and a filesystem block. If it encounters a cylinder group
with a bad check hash, it complains twice: once for the inode and
again for the filesystem block.
This change suppresses the second complaint.
Reported by: Chuck Silvers
Tested by: Chuck Silvers
MFC after: 1 week
Sponsored by: Netflix
When fsck_ffs is running in interactive mode and finds unlinked files,
it offers to either unlink them or place them in a lost+found directory.
If the lost+found directory option is requested and no lost+found
directory exists, fsck_ffs offers to create one. When creating one,
it must allocate an inode and a filesystem block. It attempts to
allocate them from the first cylinder group. If the first cylinder
group has a bad check hash, it gives up.
This change expands the search into later cylinder groups when the
first one fails with a bad check hash.
Reported by: Chuck Silvers
Tested by: Chuck Silvers
MFC after: 1 week
Sponsored by: Netflix
Several large data structures are allocated by fsck_ffs to track
resource usage. Most but not all were deallocated at the end of
checking each filesystem. This commit consolidates the freeing
of all data structures in one place and adds one that had previously
been missing.
It is important to clean up these data structures as they can be
large. If the previous allocations have not been freed, fsck_ffs
can run out of address space when many large filesystems are being
checked. An alternative would be to fork a new instance of fsck_ffs
for each filesystem to be checked, but we choose to free the small
set of large structures to save the fork overhead.
Reported by: Chuck Silvers
Tested by: Chuck Silvers
MFC after: 7 days
Sponsored by: Netflix
This fixes a long-standing but very obscure bug in fsck_ffs when
it is run with the -R (rerun after unexpected errors). It only
occurs if fsck_ffs finds duplicate blocks and they are all contained
in inodes that reside in the first block of inodes (typically among
the first 128 inodes).
Rather than use the usual ginode() interface to walk through the
inodes in pass1, there is a special optimized `getnextinode()'
routine for walking through all the inodes. It has its own private
buffer for reading the inode blocks. If pass 1 finds duplicate
blocks it runs pass 1b to find all the inodes that contain these
duplicate blocks. Pass 1b also uses the `getnextinode()' to search
for the inodes with duplicate blocks. Pass 1b stops when all the
duplicate blocks have been found. If all the duplicate blocks are
found in the first block of inodes, then the getnextinode cache
holds this block of bad inodes. The subsequent cleanup of the inodes
in passes 2-5 is done using ginode() which uses the regular fsck_ffs
cache.
When fsck_ffs restarts, pass1() calls setinodebuf() to point at the
first block of inodes. When it calls getnextinode() to get inode
2, getnextino() sees that its private cache already has the first
set of inodes loaded and starts using them. They are of course the
trashed inodes left over from the previous run of pass1b().
The fix is to always invalidate the getnextinode cache when calling
setinodebuf().
Reported by: Chuck Silvers
Tested by: Chuck Silvers
MFC after: 3 days
Sponsored by: Netflix
Pass 1b of fsck_ffs runs only when Pass 1 has found duplicate blocks.
When starting up, Pass 1b failed to properly skip over the two unused
inodes at the beginning of the filesystem resulting in the above error
message when it tried to read the filesystem root inode.
Reported by: Chuck Silvers
Tested by: Chuck Silvers
MFC after: 3 days
Sponsored by: Netflix
Make sys/buf.h, sys/pipe.h, sys/fs/devfs/devfs*.h headers usable in
userspace, assuming that the consumer has an idea what it is for.
Unhide more material from sys/mount.h and sys/ufs/ufs/inode.h,
sys/ufs/ufs/ufsmount.h for consumption of userspace tools, with the
same caveat.
Remove unacceptable hack from usr.sbin/makefs which relied on sys/buf.h
being unusable in userspace, where it override struct buf with its own
definition. Instead, provide struct m_buf and struct m_vnode and adapt
code to use local variants.
Reviewed by: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D28679
A long-standing bug in Pass 1 of fsck_ffs in which it is reading in
blocks of inodes to check their block pointers. It failed to round
up the size of the read to a disk block size. When disks would
accept 512-byte aligned reads, the bug rarely manifested itself.
But many recent disks will no longer accept 512-byte aligned reads
but require 4096-byte aligned reads, so the failure to properly
round-up read sizes to multiples of 4096 bytes makes the error
much more likely to occur.
Reported by: Peter Holm and others
Tested by: Peter Holm and Rozhuk Ivan
MFC after: 3 days
Sponsored by: Netflix
making fsck_ffs(8) run faster, there should be no functional change.
The original fsck_ffs(8) had its own disk I/O management system.
When gjournal(8) was added to FreeBSD 7, code was added to fsck_ffs(8)
to do the necessary gjournal rollback. Rather than use the existing
fsck_ffs(8) disk I/O system, it wrote its own from scratch. Similarly
when journalled soft updates were added in FreeBSD 9, code was added
to fsck_ffs(8) to do the necessary journal rollback. And once again,
rather than using either of the existing fsck_ffs(8) disk I/O
systems, it wrote its own from scratch. Lastly the fsdb(8) utility
uses the fsck_ffs(8) disk I/O management system. In preparation for
making the changes necessary to enable snapshots to be taken when
using journalled soft updates, it was necessary to have a single
disk I/O system used by all the various subsystems in fsck_ffs(8).
This commit merges the functionality required by all the different
subsystems into a single disk I/O system that supports all of their
needs. In so doing it picks up optimizations from each of them
with the results that each of the subsystems does fewer reads and
writes than it did with its own customized I/O system. It also
greatly simplifies making changes to fsck_ffs(8) since everything
goes through a single place. For example the ginode() function
fetches an inode from the disk. When inode check hashes were added,
they previously had to be checked in the code implementing inode
fetch in each of the three different disk I/O systems. Now they
need only be checked in ginode().
Tested by: Peter Holm
Sponsored by: Netflix
of its lost+found directory by allocating direct block pointers. The
effect was that it was limited to about 19,000 files. One of Peter Holm's
tests produced a filesystem with about 23,000 lost files which meant
that fsck_ffs was unable to recover it. This update allows lost+found
to be expanded into a single indirect block which allows it to store
up to about 6,573,000 lost files.
Reported by: Peter Holm
Sponsored by: Netflix
The new name more accurately describes what it does and the file move
puts it with other similar functions. Done in preparation for future
cleanups. No functional differences intended.
Sponsored by: Netflix
Historic Footnote: my last FreeBSD svn commit
in the Pass 5 checks. The manifestation was fsck_ffs exiting with this error:
** Phase 5 - Check Cyl groups
fsck_ffs: inoinfo: inumber 18446744071562087424 out of range
The error only manifests itself for filesystems bigger than about 100Tb.
Reported by: Nikita Grechikhin <ngrechikhin at yandex.ru>
MFC after: 2 weeks
Sponsored by: Netflix
over various major releases. Superblock check hashes were added for
the 12 release and cylinder-group and inode check hashes will appear
in the 13 release.
When a disk with a UFS filesystem is writably mounted, the kernel
clears the feature flags for anything that it does not support. For
example, if a UFS disk from a 12-stable kernel is mounted on an
11-stable system, the 11-stable kernel will clear the flag in the
filesystem superblock that indicates that superblock check-hashs
are being maintained. Thus if the disk is later moved back to a
12-stable system, the 12-stable system will know to ignore its
incorrect check-hash.
If the only filesystem modification done on the earlier kernel is
to run a utility such as growfs(8) that modifies the superblock but
neither updates the check-hash nor clears the feature flag indicating
that it does not support the check-hash, the disk will fail to mount
if it is moved back to its original newer kernel.
This patch moves the code that clears the filesystem feature flags
from the mount code (ffs_mountfs()) to the code that reads the
superblock (ffs_sbget()). As ffs_sbget() is used by the kernel mount
code and is imported into libufs(3), all the filesystem utilities
will now also clear these flags when they make modifications to the
filesystem.
As suggested by John Baldwin, fsck_ffs(8) has been changed to accept
and repair bad superblock check-hashes rather than refusing to run.
This change allows fsck to recover filesystems that have been impacted
by utilities older than those created after this change and is a
sensible thing to do in any event.
Reported by: John Baldwin (jhb@)
MFC after: 2 weeks
Sponsored by: Netflix
API to the sbget() and sbput() interfaces. Specifically they take
a file descriptor pointer rather than the struct uufsd *disk pointer
used by the libufs cgread() and cgwrite() interfaces. Update fsck_ffs
to use these revised interfaces.
No functional changes intended.
Sponsored by: Netflix
The only output from fsck that should go to stderr is the usage message.
if setup() fails then exit with EEXIT rather than 0.
Reviewed by: mckusick
Sponsored by: Netflix
Trace the cause down to journalled soft updates recovery code in
fsck failing to recompute the check-hash after updating an inode.
As inode check-hash was first introduced to UFS in FreeBSD 13,
there is no need to MFC this commit.
Reported by: Chuck Silvers
Sponsored by: Netflix
soft update recovery code with the debugging (-d) option.
As inode check-hash was first introduced to UFS in FreeBSD 13,
there is no need to MFC this commit.
Reported by: Chuck Silvers
Sponsored by: Netflix
This one is also a small list:
- 3x duplicate definition (ufs2_zino, returntosingle, nflag)
- 5x 'needs extern', 3/5 of which are referenced in fsdb
-fno-common will become the default in GCC10/LLVM11.
MFC after: 1 week
USE JOURNAL? [yn]
when the journal timestamp does not match the filesystem mount time
as we are just going to print an error and fall through to a full fsck.
Instead, just run a full fsck.
Requested by: Bjoern A. Zeeb (bz)
MFC after: 7 days
Make a note in the newfs.8 manual page to update the first backup
superblock location when changing the default fragment size for
the filesystem.
Reported by: O. Hartmann
having their check hashes recomputed which resulted in spurious inode
check-hash errors when the system came back up after a crash.
Reported by: Alan Somers
Sponsored by: Netflix
found by Coverity. However, upon closer inspection the implementation of
fsck_ffs's fsck_readdir() and dircheck() functions is both nearly impossible
to follow and fails to check / fix directories in several cases. So, this
revision is an entire rewrite of these two functions to clarify what they
are doing and also to get something that works properly.
Referred by: cem
Reviewed by: kib, David G Lawrence
MFC after: 3 days
CID 1401317: namlen may be used uninitialized
directory entries that is caused by uninitialized directory entry
padding written to the disk. It can be viewed by any user with read
access to that directory. Up to 3 bytes of kernel stack are disclosed
per file entry, depending on the the amount of padding the kernel
needs to pad out the entry to a 32 bit boundry. The offset in the
kernel stack that is disclosed is a function of the filename size.
Furthermore, if the user can create files in a directory, this 3
byte window can be expanded 3 bytes at a time to a 254 byte window
with 75% of the data in that window exposed. The additional exposure
is done by removing the entry, creating a new entry with a 4-byte
longer name, extracting 3 more bytes by reading the directory, and
repeating until a 252 byte name is created.
This exploit works in part because the area of the kernel stack
that is being disclosed is in an area that typically doesn't change
that often (perhaps a few times a second on a lightly loaded system),
and these file creates and unlinks themselves don't overwrite the
area of kernel stack being disclosed.
It appears that this bug originated with the creation of the Fast
File System in 4.1b-BSD (Circa 1982, more than 36 years ago!), and
is likely present in every Unix or Unix-like system that uses
UFS/FFS. Amazingly, nobody noticed until now.
This update also adds the -z flag to fsck_ffs to have it scrub
the leaked information in the name padding of existing directories.
It only needs to be run once on each UFS/FFS filesystem after a
patched kernel is installed and running.
Submitted by: David G. Lawrence <dg@dglawrence.com>
Reviewed by: kib
MFC after: 1 week
last allocated block of the file and if that is found, shortens the
file to reference the last allocated block thus avoiding having it
reference a hole at its end.
This update corrects an error where fsck_ffs miscalculated the last
logical block of the file when the file contained a large hole.
Reported by: Jamie Landeg-Jones
Tested by: Peter Holm
MFC after: 2 weeks
Sponsored by: Netflix
inodes that reference directories. While here tighten the check for
comparing the last logical block with the end of the file.
Reported by: Peter Holm
Tested by: Peter Holm
Sponsored by: Netflix
shorter than its size resulting in a hole as its final block (which
is a violation of the invarients of the UFS filesystem).
Soft updates will always ensure that the file size is correct when
writing inodes to disk for files that contain only direct block
pointers. However soft updates does not roll back sizes for files
with indirect blocks that it has set to unallocated because their
contents have not yet been written to disk. Hence, the file can
appear to have a hole at its end because the block pointer has been
rolled back to zero when its inode was written to disk. Thus,
fsck_ffs calculates the last allocated block in the file. For files
that extend into indirect blocks, fsck_ffs checks for a size past
the last allocated block of the file and if that is found, shortens
the file to reference the last allocated block thus avoiding having
it reference a hole at its end.
Submitted by: Chuck Silvers <chs@netflix.com>
Tested by: Chuck Silvers <chs@netflix.com>
MFC after: 1 week
Sponsored by: Netflix
pass of fsck_ffs. Some changes, such as check-hash corrections were
being lost.
Reported by: Michael Tuexen (tuexen@)
Tested by: Michael Tuexen (tuexen@)
MFC after: 3 days
If requested to fix the inode check-hash it would confirm having done
it, but then fail to make the fix. The same code is used in fsdb which,
unlike fsck, would actually fix the inode check-hash.
The discrepancy occurred because fsck has two ways to fetch inodes.
The inode by number function ginode() and the streaming inode
function getnextinode() used during pass1. Fsdb uses the ginode()
function which correctly does the fix, while fsck first encounters
the bad inode check-hash in pass1 where it is using the getnextinode()
function that failed to make the correction. This patch corrects
the getnextinode() function so that fsck now correctly fixes inodes
with incorrect inode check-hashs.
Reported by: Gary Jennejohn <gljennjohn@gmail.com>
Sponsored by: Netflix
check hash to the filesystem inodes. Access attempts to files
associated with an inode with an invalid check hash will fail with
EINVAL (Invalid argument). Access is reestablished after an fsck
is run to find and validate the inodes with invalid check-hashes.
This check avoids a class of filesystem panics related to corrupted
inodes. The hash is done using crc32c.
Note this check-hash is for the inode itself and not any of its
indirect blocks. Check-hash validation may be extended to also
cover indirect block pointers, but that will be a separate (and
more costly) feature.
Check hashes are added only to UFS2 and not to UFS1 as UFS1 is
primarily used in embedded systems with small memories and low-powered
processors which need as light-weight a filesystem as possible.
Reviewed by: kib
Tested by: Peter Holm
Sponsored by: Netflix
superblock has a check-hash error, an error message noting the
superblock check-hash failure is printed and the mount fails. The
administrator then runs fsck to repair the filesystem and when
successful, the filesystem can once again be mounted.
This approach fails if the filesystem in question is a root filesystem
from which you are trying to boot. Here, the loader fails when trying
to access the filesystem to get the kernel to boot. So it is necessary
to allow the loader to ignore the superblock check-hash error and make
a best effort to read the kernel. The filesystem may be suffiently
corrupted that the read attempt fails, but there is no harm in trying
since the loader makes no attempt to write to the filesystem.
Once the kernel is loaded and starts to run, it attempts to mount its
root filesystem. Once again, failure means that it breaks to its prompt
to ask where to get its root filesystem. Unless you have an alternate
root filesystem, you are stuck.
Since the root filesystem is initially mounted read-only, it is
safe to make an attempt to mount the root filesystem with the failed
superblock check-hash. Thus, when asked to mount a root filesystem
with a failed superblock check-hash, the kernel prints a warning
message that the root filesystem superblock check-hash needs repair,
but notes that it is ignoring the error and proceeding. It does
mark the filesystem as needing an fsck which prevents it from being
enabled for writing until fsck has been run on it. The net effect
is that the reboot fails to single user, but at least at that point
the administrator has the tools at hand to fix the problem.
Reported by: Rick Macklem (rmacklem@)
Discussed with: Warner Losh (imp@)
Sponsored by: Netflix
report the check-hash failure and offer to search for and use
alternate superblocks. Prior to this fix fsck_ffs would simply
report the check-hash failure and exit.
Reported by: Julian H. Stacey <jhs@berklix.com>
Tested by: Peter Holm
Sponsored by: Netflix
document the libufs interface for fetching and storing inodes.
The undocumented getino / putino interface has been replaced
with a new getinode / putinode interface.
Convert the utilities that had been using the undocumented
interface to use the new documented interface.
No functional change (as for now the libufs library does not
do inode check-hashes).
Reviewed by: kib
Tested by: Peter Holm
Sponsored by: Netflix
check hash to the superblock. If a check hash fails when an attempt
is made to mount a filesystem, the mount fails with EINVAL (Invalid
argument). This avoids a class of filesystem panics related to
corrupted superblocks. The hash is done using crc32c.
Check hases are added only to UFS2 and not to UFS1 as UFS1 is primarily
used in embedded systems with small memories and low-powered processors
which need as light-weight a filesystem as possible.
Reviewed by: kib
Tested by: Peter Holm
Sponsored by: Netflix
The timespecadd(3) family of macros were imported from NetBSD back in
r35029. However, they were initially guarded by #ifdef _KERNEL. In the
meantime, we have grown at least 28 syscalls that use timespecs in some
way, leading many programs both inside and outside of the base system to
redefine those macros. It's better just to make the definitions public.
Our kernel currently defines two-argument versions of timespecadd and
timespecsub. NetBSD, OpenBSD, and FreeDesktop.org's libbsd, however, define
three-argument versions. Solaris also defines a three-argument version, but
only in its kernel. This revision changes our definition to match the
common three-argument version.
Bump _FreeBSD_version due to the breaking KPI change.
Discussed with: cem, jilles, ian, bde
Differential Revision: https://reviews.freebsd.org/D14725