Commit Graph

81 Commits

Author SHA1 Message Date
Kirk McKusick
fc56fd262d Ensure that all allocated data structures in fsck_ffs are freed.
Several large data structures are allocated by fsck_ffs to track
resource usage. Most but not all were deallocated at the end of
checking each filesystem. This commit consolidates the freeing
of all data structures in one place and adds one that had previously
been missing.

It is important to clean up these data structures as they can be
large. If the previous allocations have not been freed, fsck_ffs
can run out of address space when many large filesystems are being
checked. An alternative would be to fork a new instance of fsck_ffs
for each filesystem to be checked, but we choose to free the small
set of large structures to save the fork overhead.

Reported by:  Chuck Silvers
Tested by:    Chuck Silvers
MFC after:    7 days
Sponsored by: Netflix
2021-04-02 11:58:49 -07:00
Kirk McKusick
5cc52631b3 Rewrite the disk I/O management system in fsck_ffs(8). Other than
making fsck_ffs(8) run faster, there should be no functional change.

The original fsck_ffs(8) had its own disk I/O management system.
When gjournal(8) was added to FreeBSD 7, code was added to fsck_ffs(8)
to do the necessary gjournal rollback. Rather than use the existing
fsck_ffs(8) disk I/O system, it wrote its own from scratch. Similarly
when journalled soft updates were added in FreeBSD 9, code was added
to fsck_ffs(8) to do the necessary journal rollback. And once again,
rather than using either of the existing fsck_ffs(8) disk I/O
systems, it wrote its own from scratch. Lastly the fsdb(8) utility
uses the fsck_ffs(8) disk I/O management system. In preparation for
making the changes necessary to enable snapshots to be taken when
using journalled soft updates, it was necessary to have a single
disk I/O system used by all the various subsystems in fsck_ffs(8).

This commit merges the functionality required by all the different
subsystems into a single disk I/O system that supports all of their
needs. In so doing it picks up optimizations from each of them
with the results that each of the subsystems does fewer reads and
writes than it did with its own customized I/O system. It also
greatly simplifies making changes to fsck_ffs(8) since everything
goes through a single place. For example the ginode() function
fetches an inode from the disk. When inode check hashes were added,
they previously had to be checked in the code implementing inode
fetch in each of the three different disk I/O systems. Now they
need only be checked in ginode().

Tested by:    Peter Holm
Sponsored by: Netflix
2021-01-07 15:03:15 -08:00
Kirk McKusick
68dc94c7d3 Correct and add some comments.
Sponsored by: Netflix
2020-12-31 15:36:33 -08:00
Kirk McKusick
7180f1ab40 Rename pass4check() to freeblock() and move from pass4.c to inode.c.
The new name more accurately describes what it does and the file move
puts it with other similar functions. Done in preparation for future
cleanups. No functional differences intended.

Sponsored by: Netflix
Historic Footnote: my last FreeBSD svn commit
2020-12-18 23:28:27 +00:00
Kyle Evans
c3e9752ea1 fsck_ffs/fsdb: fix -fno-common build
This one is also a small list:

- 3x duplicate definition (ufs2_zino, returntosingle, nflag)
- 5x 'needs extern', 3/5 of which are referenced in fsdb

-fno-common will become the default in GCC10/LLVM11.

MFC after:	1 week
2020-03-29 20:03:46 +00:00
Kirk McKusick
0061238fb0 This update eliminates a kernel stack disclosure bug in UFS/FFS
directory entries that is caused by uninitialized directory entry
padding written to the disk. It can be viewed by any user with read
access to that directory. Up to 3 bytes of kernel stack are disclosed
per file entry, depending on the the amount of padding the kernel
needs to pad out the entry to a 32 bit boundry. The offset in the
kernel stack that is disclosed is a function of the filename size.
Furthermore, if the user can create files in a directory, this 3
byte window can be expanded 3 bytes at a time to a 254 byte window
with 75% of the data in that window exposed. The additional exposure
is done by removing the entry, creating a new entry with a 4-byte
longer name, extracting 3 more bytes by reading the directory, and
repeating until a 252 byte name is created.

This exploit works in part because the area of the kernel stack
that is being disclosed is in an area that typically doesn't change
that often (perhaps a few times a second on a lightly loaded system),
and these file creates and unlinks themselves don't overwrite the
area of kernel stack being disclosed.

It appears that this bug originated with the creation of the Fast
File System in 4.1b-BSD (Circa 1982, more than 36 years ago!), and
is likely present in every Unix or Unix-like system that uses
UFS/FFS. Amazingly, nobody noticed until now.

This update also adds the -z flag to fsck_ffs to have it scrub
the leaked information in the name padding of existing directories.
It only needs to be run once on each UFS/FFS filesystem after a
patched kernel is installed and running.

Submitted by: David G. Lawrence <dg@dglawrence.com>
Reviewed by:  kib
MFC after:    1 week
2019-05-03 21:54:14 +00:00
Kirk McKusick
d483391306 Followup to -r344552 in which fsck_ffs checks for a size past the
last allocated block of the file and if that is found, shortens the
file to reference the last allocated block thus avoiding having it
reference a hole at its end.

This update corrects an error where fsck_ffs miscalculated the last
logical block of the file when the file contained a large hole.

Reported by:  Jamie Landeg-Jones
Tested by:    Peter Holm
MFC after:    2 weeks
Sponsored by: Netflix
2019-04-13 13:31:06 +00:00
Kirk McKusick
ac4b20a0a7 After a crash, a file that extends into indirect blocks may end up
shorter than its size resulting in a hole as its final block (which
is a violation of the invarients of the UFS filesystem).

Soft updates will always ensure that the file size is correct when
writing inodes to disk for files that contain only direct block
pointers. However soft updates does not roll back sizes for files
with indirect blocks that it has set to unallocated because their
contents have not yet been written to disk. Hence, the file can
appear to have a hole at its end because the block pointer has been
rolled back to zero when its inode was written to disk. Thus,
fsck_ffs calculates the last allocated block in the file. For files
that extend into indirect blocks, fsck_ffs checks for a size past
the last allocated block of the file and if that is found, shortens
the file to reference the last allocated block thus avoiding having
it reference a hole at its end.

Submitted by: Chuck Silvers <chs@netflix.com>
Tested by:    Chuck Silvers <chs@netflix.com>
MFC after:    1 week
Sponsored by: Netflix
2019-02-25 21:58:19 +00:00
Kirk McKusick
8ebae128be Ensure that cylinder-group check-hashes are properly updated when first
creating them and when correcting them when they are found to be corrupted.

Reported by:  Don Lewis (truckman@)
Sponsored by: Netflix
2018-12-05 06:31:50 +00:00
Kirk McKusick
9fc5d538fc In preparation for adding inode check-hashes, clean up and
document the libufs interface for fetching and storing inodes.
The undocumented getino / putino interface has been replaced
with a new getinode / putinode interface.

Convert the utilities that had been using the undocumented
interface to use the new documented interface.

No functional change (as for now the libufs library does not
do inode check-hashes).

Reviewed by:  kib
Tested by:    Peter Holm
Sponsored by: Netflix
2018-11-13 21:40:56 +00:00
Kirk McKusick
2c288c95d9 In preparation for adding inode check-hashes, change the fsck_ffs
inodirty() function to have a pointer to the inode being dirtied.
No functional change (as for now the parameter is ununsed).

Sponsored by: Netflix
2018-10-31 05:17:53 +00:00
Kirk McKusick
31461aa2f1 Include files missed in 329051. 2018-02-08 23:14:24 +00:00
Kirk McKusick
dffce2150e Refactoring of reading and writing of the UFS/FFS superblock.
Specifically reading is done if ffs_sbget() and writing is done
in ffs_sbput(). These functions are exported to libufs via the
sbget() and sbput() functions which then used in the various
filesystem utilities. This work is in preparation for adding
subperblock check hashes.

No functional change intended.

Reviewed by: kib
2018-01-26 00:58:32 +00:00
Kirk McKusick
a6bbdf81b5 More throughly integrate libufs into fsck_ffs by using its cgput()
routine to write out the cylinder groups rather than recreating the
calculation of the cylinder-group check hash in fsck_ffs.

No functional change intended.
2018-01-24 23:57:40 +00:00
Kirk McKusick
957fc241ec Rename cgget => cglookup to clear name space for new libufs function cgget.
No functional change.
2018-01-17 06:31:21 +00:00
David Bright
469759f8e4 Exit fsck_ffs with non-zero status when file system is not repaired.
When the fsck_ffs program cannot fully repair a file system, it will
output the message PLEASE RERUN FSCK. However, it does not exit with a
non-zero status in this case (contradicting the man page claim that it
"exits with 0 on success, and >0 if an error occurs."  The fsck
rc-script (when running "fsck -y") tests the status from fsck (which
passes along the exit status from fsck_ffs) and issues a "stop_boot"
if the status fails. However, this is not effective since fsck_ffs can
return zero even on (some) errors. Effectively, it is left to a later
step in the boot process when the file systems are mounted to detect
the still-unclean file system and stop the boot.

This change modifies fsck_ffs so that when it cannot fully repair the
file system and issues the PLEASE RERUN FSCK message it also exits
with a non-zero status.

While here, the fsck_ffs man page has also been updated to document
the failing exit status codes used by fsck_ffs. Previously, only exit
status 7 was documented. Some of these exit statuses are tested for in
the fsck rc-script, so they are clearly depended upon and deserve
documentation.

Reviewed by:	mckusick, vangyzen, jilles (manpages)
MFC after:	1 week
Sponsored by:	Dell EMC
Differential Revision:	https://reviews.freebsd.org/D13862
2018-01-15 19:25:11 +00:00
Pedro F. Giffuni
8a16b7a18f General further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:49:47 +00:00
Pedro F. Giffuni
f671769766 fsck_ffs: Unsign some variables and make use of reallocarray(3).
Instead of casting listmax and numdirs to unsigned values just define
them as unsigned and avoid the casts. Use reallocarray(3).

While here, fs_ncg is already unsigned so the cast is unnecessary.

Reviewed by:	mckusick
MFC after:	2 weeks
2017-04-22 14:50:11 +00:00
Warner Losh
fbbd9655e5 Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by:	Jan Schaumann <jschauma@stevens.edu>
Pull Request:	https://github.com/freebsd/freebsd/pull/96
2017-02-28 23:42:47 +00:00
Kirk McKusick
6a5972db72 Fsck_ufs was using an int rather than a ufs2_daddr_t to store the
alternate superblock location when given in the -b option. When int
is 32-bits, block numbers larger than 2^32 would get truncated. This
commit changes the storage fpr the alternate superblock location
to a ufs2_daddr_t.

Submitted by: Dmitry Sivachenko <trtrmitya@gmail.com>
2016-08-19 00:03:41 +00:00
Eitan Adler
463a577b27 Fix a ton of speelling errors
arc lint is helpful

Reviewed By: allanjude, wblock, #manpages, chris@bsdjunk.com
Differential Revision: https://reviews.freebsd.org/D3337
2015-10-21 05:37:09 +00:00
Kirk McKusick
eff68496e2 Arguments for malloc and calloc should be size_t, not int.
Use proper bounds check when trying to free cached memory.

Spotted by: Xin Li
Tested by:  Dmitry Sivachenko
MFC after:  2 weeks
2014-02-25 18:25:27 +00:00
Scott Long
7703a6ff27 Add the -R option to allow fsck_ffs to restart itself when too many critical
errors have been detected in a particular run.

Clean up the global state variables so that a restart can happen correctly.

Separate the global variables in fsck_ffs and fsdb to their own file.  This
fixes header sharing with fscd.

Correctly initialize, static-ize, and remove global variables as needed in
dir.c.  This fixes a problem with lost+found directories that was causing
a segfault.

Correctly initialize, static-ize, and remove global variables as needed in
suj.c.

Initialize the suj globals before allocating the disk object, not after.
Also ensure that 'preen' mode doesn't conflict with 'restart' mode

Submitted by:	scottl, max
Reviewed by:	max, mckusick (earlier version)
Obtained from:	Netflix
MFC after:	3 days
2013-12-30 01:16:08 +00:00
Scott Long
ce779f3756 Add a 'surrender' mode to fsck_ffs. With the -S flag, once hard read errors
are encountered, the fsck will stop instead of wasting time chewing through
possibly other errors.

Obtained from:	Netflix
MFC after:	3 days
2013-07-30 22:57:12 +00:00
Dag-Erling Smørgrav
2b5373de83 Add a -Z option which zeroes unused blocks. It can be combined with -E,
in which case unused blocks are first zeroed and then erased.

Reviewed by:	mckusick
MFC after:	3 weeks
2013-04-29 20:13:09 +00:00
Kirk McKusick
81fbded23f Revert 248634 and 248643 (e.g., restoring 248625 and 248639).
Build verified by: Glen Barber (gjb@)
2013-03-23 20:00:02 +00:00
Sean Bruno
115f80b8d3 Revert svn r248625
Clang errors around printf could be trivially fixed, but the breakage in
sbin/fsdb were to significant for this type of change.

Submitter of this changeset has been notified and hopefully this can be
restored soon.
2013-03-23 04:26:13 +00:00
Kirk McKusick
776816d32b Speed up fsck by caching the cylinder group maps in pass1 so
that they do not need to be read again in pass5. As this nearly
doubles the memory requirement for fsck, the cache is thrown away
if other memory needs in fsck would otherwise fail. Thus, the
memory footprint of fsck remains unchanged in memory constrained
environments.

This work was inspired by a paper presented at Usenix's FAST '13:
www.usenix.org/conference/fast13/ffsck-fast-file-system-checker

Details of this implementation appears in the April 2013 of ;login:
www.usenix.org/publications/login/april-2013-volume-38-number-2.
A copy of the April 2013 ;login: paper can also be downloaded
from: www.mckusick.com/publications/faster_fsck.pdf.

Reviewed by: kib
Tested by:   Peter Holm
MFC after:   4 weeks
2013-03-22 21:50:43 +00:00
Kirk McKusick
ed75b5a156 When running with the -d option, instrument fsck_ffs to track the number,
data type, and running time of its I/O operations.

No functional changes.
2013-02-24 06:44:29 +00:00
Kirk McKusick
2ec5c91481 Update fsck_ffs buffer cache manager to use TAILQ macros.
No functional changes.
2013-02-15 01:00:48 +00:00
David E. O'Brien
4a83537505 Remove needless (int) casts of write(2)'s 3rd argument.
Also change blwrite() 'size' parameter to a ssize_t to better match
write(2).
2012-09-12 15:36:44 +00:00
Ulrich Spörlein
4b85a12f71 Spelling fixes for sbin/ 2012-01-07 16:09:33 +00:00
Konstantin Belousov
6f100596e5 Change the type of real_dev_bsize variable from long to u_int.
The DIOCGSECTORSIZE takes u_int * as an argument, using long *
causes failures on big-endian targets.

Diagnosed by:	Michiel Boland <boland37 xs4all nl>
PR:	sparc64/163460
Tested by:	pho (x86), flo (sparc64)
MFC after:	1 week
2011-12-20 20:39:00 +00:00
Kirk McKusick
d2404f0470 Break out the pass 5 inode and block map updating into a separate function
so that the function can be used by the journaling soft updates recovery.
2011-07-15 15:43:40 +00:00
Dag-Erling Smørgrav
8d3dfc2691 Add an -E option to mirror newfs's. The idea is that if you have a system
that was built before ffs grew support for TRIM, your filesystem will have
plenty of free blocks that the flash chip doesn't know are free, so it
can't take advantage of them for wear leveling.  Once you've upgraded your
kernel, you enable TRIM on the filesystem (tunefs -t enable), then run
fsck_ffs -E on it before mounting it.

I tested this patch by half-filling an mdconfig'ed filesystem image,
running fsck_ffs -E on it, then verifying that the contents were not
damaged by comparing them to a pristine copy using rsync's checksum
functionality.  There is no reliable way to test it on real hardware.

Many thanks to mckusick@, who provided the tricky parts of this patch and
reviewed the final version.

Reviewed by:	mckusick@
MFC after:	3 weeks
2011-04-29 23:00:23 +00:00
Konstantin Belousov
0947d19a09 In checker, read journal by sectors.
Due to UFS insistence to pretend that device sector size is 512 bytes,
sector size is obtained from ioctl(DIOCGSECTORSIZE) for real devices,
and from the label otherwise. The file images without label have to
be made with 512 sector size.

In collaboration with:	pho
Reviewed by:	jeff
Tested by:	bz, pho
2011-02-12 13:17:14 +00:00
Pawel Jakub Dawidek
d00690aec5 Protect fsck.h from being included twice. 2010-04-24 07:54:49 +00:00
Jeff Roberson
113db2dddb - Merge soft-updates journaling from projects/suj/head into head. This
brings in support for an optional intent log which eliminates the need
   for background fsck on unclean shutdown.

Sponsored by:   iXsystems, Yahoo!, and Juniper.
With help from: McKusick and Peter Holm
2010-04-24 07:05:35 +00:00
Ulf Lilleengen
fd02a3b5c9 - Use volatile for signal variables.
Suggested by:	Jaakko Heinonen <jh -at- saunalahti.fi>
2009-06-02 17:57:24 +00:00
Ulf Lilleengen
a0f163fd83 - Use sig_atomic_t for signal handler variables.
MFC after:	1 week
2009-05-29 20:01:50 +00:00
Kirk McKusick
910b491e7e Update the actions previously attempted by the -D option to make them
robust. With these changes fsck is now able to detect and reliably
rebuild corrupted cylinder group maps. The -D option is no longer
necessary as it has been replaced by a prompt asking whether the
corrupted cylinder group should be rebuilt and doing so when requested.
These actions are only offered and taken when running fsck in manual
mode. Corrupted cylinder groups found during preen mode cause the fsck
to fail.

Add the -r option to free up excess unused inodes. Decreasing the
number of preallocated inodes reduces the running time of future
runs of fsck and frees up space that can allocated to files. The -r
option is ignored when running in preen mode.

Reviewed by: Xin LI <delphij@>
Sponsored by: Rsync.net
2009-02-04 01:02:56 +00:00
David E. O'Brien
111a52201c Add the '-C' "check clean" flag. If the FS is marked clean, skip file
system checking.  However, if the file system is not clean, perform a
full fsck.

Reviewed by:	delphij
Obtained from:	Juniper Networks
2009-01-30 18:33:05 +00:00
Xin LI
7f94ca7233 Rename option 'C' to 'D' (damaged) in order to avoid a conflict with upcoming
Juniper 'C' (clean) flag.

Requested by:	obrien
MFC after:	1 week
2009-01-20 22:49:49 +00:00
Xin LI
14320f1e7f Add a new flag, '-C' which enables a special mode that is intended for
catastrophic recovery.  Currently, this mode only validates whether a
cylindergroup has good signature data, and prompts the user to decide
whether to clear it as a whole.

This mode is useful when there is data damage on a disk and you are
working on copy of the original disk, as fsck_ffs(8) tends to abnormally
exit in such case, as a last resort to recover data from the disk.
2008-04-10 23:49:23 +00:00
Pawel Jakub Dawidek
aef8d2449b Implements gjournal support. If file system has gjournal support enabled
and -p flag was given perform fast file system checking (bascially only
garbage collecting of orphaned objects).

Rename bread() to blread() and bwrite() to blwrite() as we now link to
the libufs library, which also implement functions with that names.

Sponsored by:	home.pl
2006-10-31 22:06:56 +00:00
Xin LI
c0ed8991fb Make background fsck based summary adjustments actually work by
initializing the sysctl mibs data before actually using them.

The original patchset (which is the actual version that is running
on my testboxes) have checked whether all of these sysctls and
refuses to do background fsck if we don't have them.  Kirk has
pointed out that refusing running fsck on old kernels is pointless,
as old kernels will recompute the summary at mount time, so I
have removed these checks.

Unfortunatelly, as the checks will initialize the mib values of
those sysctl's, and which are vital for the runtime summary
adjustment to work, we can not simply remove the check, which
will lead to problem when running background fsck over a dirty
volume.  Add these checks in a different way: give a warning rather
than refusing to work, and complain if the functionality is not
available when adjustments are necessary.

Noticed by:	A power failure at my lab
Pointy hat:	me
MFC After:	3 days
2005-03-07 08:42:49 +00:00
Xin LI
a16baf37b9 The recomputation of file system summary at mount time can be a
very slow process, especially for large file systems that is just
recovered from a crash.

Since the summary is already re-sync'ed every 30 second, we will
not lag behind too much after a crash.  With this consideration
in mind, it is more reasonable to transfer the responsibility to
background fsck, to reduce the delay after a crash.

Add a new sysctl variable, vfs.ffs.compute_summary_at_mount, to
control this behavior.  When set to nonzero, we will get the
"old" behavior, that the summary is computed immediately at mount
time.

Add five new sysctl variables to adjust ndir, nbfree, nifree,
nffree and numclusters respectively.  Teach fsck_ffs about these
API, however, intentionally not to check the existence, since
kernels without these sysctls must have recomputed the summary
and hence no adjustments are necessary.

This change has eliminated the usual tens of minutes of delay of
mounting large dirty volumes.

Reviewed by:	mckusick
MFC After:	1 week
2005-02-20 08:02:15 +00:00
Robert Watson
60c9762920 Explicitly break out NETA license from Berkeley license to clearly
indicate license grant, as well as to indicate that NETA is asserting
only two clauses, not four clauses.

Requested by:	imp
2004-10-20 08:05:02 +00:00
Don Lewis
af6726e657 Eliminate linked list used to track inodes with an initial link
count of zero and instead encode this information in the inode state.
Pass 4 performed a linear search of this list for each inode in
the file system, which performs poorly if the list is long.

Reviewed by:    sam & keramida (an earlier version of the patch), mckusick
MFC after:	1 month
2004-10-08 20:44:47 +00:00
Scott Long
c3b2344b93 Create DIP_SET() and IBLK_SET() macros to fix lvalue warnings.
Inspired by: kan
2004-09-01 05:48:06 +00:00