Commit Graph

43 Commits

Author SHA1 Message Date
Konstantin Belousov
d485c77f20 Remove #define _KERNEL hacks from libprocstat
Make sys/buf.h, sys/pipe.h, sys/fs/devfs/devfs*.h headers usable in
userspace, assuming that the consumer has an idea what it is for.
Unhide more material from sys/mount.h and sys/ufs/ufs/inode.h,
sys/ufs/ufs/ufsmount.h for consumption of userspace tools, with the
same caveat.

Remove unacceptable hack from usr.sbin/makefs which relied on sys/buf.h
being unusable in userspace, where it override struct buf with its own
definition.  Instead, provide struct m_buf and struct m_vnode and adapt
code to use local variants.

Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D28679
2021-02-21 11:38:21 +02:00
Kirk McKusick
5cc52631b3 Rewrite the disk I/O management system in fsck_ffs(8). Other than
making fsck_ffs(8) run faster, there should be no functional change.

The original fsck_ffs(8) had its own disk I/O management system.
When gjournal(8) was added to FreeBSD 7, code was added to fsck_ffs(8)
to do the necessary gjournal rollback. Rather than use the existing
fsck_ffs(8) disk I/O system, it wrote its own from scratch. Similarly
when journalled soft updates were added in FreeBSD 9, code was added
to fsck_ffs(8) to do the necessary journal rollback. And once again,
rather than using either of the existing fsck_ffs(8) disk I/O
systems, it wrote its own from scratch. Lastly the fsdb(8) utility
uses the fsck_ffs(8) disk I/O management system. In preparation for
making the changes necessary to enable snapshots to be taken when
using journalled soft updates, it was necessary to have a single
disk I/O system used by all the various subsystems in fsck_ffs(8).

This commit merges the functionality required by all the different
subsystems into a single disk I/O system that supports all of their
needs. In so doing it picks up optimizations from each of them
with the results that each of the subsystems does fewer reads and
writes than it did with its own customized I/O system. It also
greatly simplifies making changes to fsck_ffs(8) since everything
goes through a single place. For example the ginode() function
fetches an inode from the disk. When inode check hashes were added,
they previously had to be checked in the code implementing inode
fetch in each of the three different disk I/O systems. Now they
need only be checked in ginode().

Tested by:    Peter Holm
Sponsored by: Netflix
2021-01-07 15:03:15 -08:00
Kirk McKusick
85ee267a3e Update the libufs cgget() and cgput() interfaces to have a similar
API to the sbget() and sbput() interfaces. Specifically they take
a file descriptor pointer rather than the struct uufsd *disk pointer
used by the libufs cgread() and cgwrite() interfaces. Update fsck_ffs
to use these revised interfaces.

No functional changes intended.

Sponsored by: Netflix
2020-09-19 22:48:30 +00:00
Kirk McKusick
0c08ecdff3 Inode check-hash errors were being reported after system crashes.
Trace the cause down to journalled soft updates recovery code in
fsck failing to recompute the check-hash after updating an inode.

As inode check-hash was first introduced to UFS in FreeBSD 13,
there is no need to MFC this commit.

Reported by:  Chuck Silvers
Sponsored by: Netflix
2020-04-10 23:58:07 +00:00
Kirk McKusick
2a18059670 Add an inode check-hash verification when running the journalled
soft update recovery code with the debugging (-d) option.

As inode check-hash was first introduced to UFS in FreeBSD 13,
there is no need to MFC this commit.

Reported by:  Chuck Silvers
Sponsored by: Netflix
2020-04-10 23:49:34 +00:00
Kirk McKusick
c094263a24 When running fsck_ffs manually, do not ask:
USE JOURNAL? [yn]

when the journal timestamp does not match the filesystem mount time
as we are just going to print an error and fall through to a full fsck.
Instead, just run a full fsck.

Requested by: Bjoern A. Zeeb (bz)
MFC after:    7 days
2019-12-24 23:03:12 +00:00
Kirk McKusick
e39c92986d Replace an uninitialized variable with the correct element from the
superblock when doing recovery with journalled soft updates.

Reported by:  Chuck Silvers
MFC after:    3 days
Sponsored by: Netflix
2019-10-22 22:23:59 +00:00
Kirk McKusick
3bd88193c6 When running with journaled soft updates, some updated inodes were not
having their check hashes recomputed which resulted in spurious inode
check-hash errors when the system came back up after a crash.

Reported by:  Alan Somers
Sponsored by: Netflix
2019-07-20 21:20:40 +00:00
Ed Maste
d8ba45e213 Revert r313780 (UFS_ prefix) 2018-03-17 12:59:55 +00:00
Ed Maste
1e2b9afca9 Prefix UFS symbols with UFS_ to reduce namespace pollution
Followup to r313780.  Also prefix ext2's and nandfs's versions with
EXT2_ and NANDFS_.

Reported by:	kib
Reviewed by:	kib, mckusick
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D9623
2018-03-17 01:48:27 +00:00
Kirk McKusick
26772fefc1 Use sbput(3) rather than sbwrite(3) to ensure that the updated copy of
the superblock gets written.

Reported by: Mark Johnston <markj@FreeBSD.org>
2018-02-02 00:07:38 +00:00
Kirk McKusick
a6bbdf81b5 More throughly integrate libufs into fsck_ffs by using its cgput()
routine to write out the cylinder groups rather than recreating the
calculation of the cylinder-group check hash in fsck_ffs.

No functional change intended.
2018-01-24 23:57:40 +00:00
Kirk McKusick
72f854ce8f Correct fsck journal-recovery code to update a cylinder-group
check-hash after making changes to the cylinder group. The problem
was that the journal-recovery code was calling the libufs bwrite()
function instead of the cgput() function. The cgput() function updates
the cylinder-group check-hash before writing the cylinder group.

This change required the additions of the cgget() and cgput() functions
to the libufs API to avoid a gratuitous bcopy of every cylinder group
to be read or written. These new functions have been added to the
libufs manual pages. This was the first opportunity that I have had
to use and document the use of the EDOOFUS error code.

Reviewed by: kib
Reported by: emaste and others
2018-01-17 17:58:24 +00:00
Pedro F. Giffuni
1de7b4b805 various: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:37:16 +00:00
John Baldwin
ed8d06aa19 Use UFS_LINK_MAX instead of LINK_MAX.
Submitted by:	bde
Sponsored by:	Chelsio Communications
2017-09-21 22:33:59 +00:00
Konstantin Belousov
6992112349 Commit the 64-bit inode project.
Extend the ino_t, dev_t, nlink_t types to 64-bit ints.  Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment.  Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.

ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks.  Unfortunately, not everything can be
fixed, especially outside the base system.  For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.

Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.

Struct xvnode changed layout, no compat shims are provided.

For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.

Update note: strictly follow the instructions in UPDATING.  Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.

Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb).  Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver.  Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem).  Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).

Sponsored by:	The FreeBSD Foundation (emaste, kib)
Differential revision:	https://reviews.freebsd.org/D10439
2017-05-23 09:29:05 +00:00
Ed Maste
1dc349ab95 prefix UFS symbols with UFS_ to reduce namespace pollution
Specifically:
  ROOTINO -> UFS_ROOTINO
  WINO -> UFS_WINO
  NXADDR -> UFS_NXADDR
  NDADDR -> UFS_NDADDR
  NIADDR -> UFS_NIADDR
  MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency)

Also prefix ext2's and nandfs's NDADDR and NIADDR with EXT2_ and NANDFS_

Reviewed by:	kib, mckusick
Obtained from:	NetBSD
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D9536
2017-02-15 19:50:26 +00:00
Konstantin Belousov
1c32456953 Use type-independent formats for printing nlink_t and ino_t.
Extracted from:	ino64 work by gleb, mckusick
Discussed with:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2017-01-06 16:59:33 +00:00
Pedro F. Giffuni
f32d2926b0 sbin: ake use of our rounddown() macro when sys/param.h is available.
No functional change.
2016-05-01 02:24:05 +00:00
Pedro F. Giffuni
11ec5dd04b fsck_ffs: Revert partially the unsigned changes.
Any value of uint16_t will be internally promoted to int so
changing them to an unsigned value doesn't help.

Missing revert value in suj_read().

X-MFC with:	r298551
2016-04-27 01:36:25 +00:00
Pedro F. Giffuni
4235bafa7e fsck_ffs: Revert partially the unsigned changes.
Any value of uint16_t will be internally promoted to int so
changing them to an unsigned value doesn't help.

Make clear we want to use uint32_t for closedisk()

X-MFC with:	r298551
2016-04-27 01:32:11 +00:00
Pedro F. Giffuni
0f8c0b0ae4 fsck_ffs: Adopt some type safety for the journalling checks.
fs_ncg is of type uint32, and we were indexing it with an int.
Fixed this using an unsigned type and adopt some other unsigned
indexes to remind us when we are dealing with unsigned numbers.

Reviewed by:	mckusick
MFC after:	5 days
2016-04-24 20:31:22 +00:00
Scott Long
7703a6ff27 Add the -R option to allow fsck_ffs to restart itself when too many critical
errors have been detected in a particular run.

Clean up the global state variables so that a restart can happen correctly.

Separate the global variables in fsck_ffs and fsdb to their own file.  This
fixes header sharing with fscd.

Correctly initialize, static-ize, and remove global variables as needed in
dir.c.  This fixes a problem with lost+found directories that was causing
a segfault.

Correctly initialize, static-ize, and remove global variables as needed in
suj.c.

Initialize the suj globals before allocating the disk object, not after.
Also ensure that 'preen' mode doesn't conflict with 'restart' mode

Submitted by:	scottl, max
Reviewed by:	max, mckusick (earlier version)
Obtained from:	Netflix
MFC after:	3 days
2013-12-30 01:16:08 +00:00
Kirk McKusick
81fbded23f Revert 248634 and 248643 (e.g., restoring 248625 and 248639).
Build verified by: Glen Barber (gjb@)
2013-03-23 20:00:02 +00:00
Sean Bruno
115f80b8d3 Revert svn r248625
Clang errors around printf could be trivially fixed, but the breakage in
sbin/fsdb were to significant for this type of change.

Submitter of this changeset has been notified and hopefully this can be
restored soon.
2013-03-23 04:26:13 +00:00
Kirk McKusick
776816d32b Speed up fsck by caching the cylinder group maps in pass1 so
that they do not need to be read again in pass5. As this nearly
doubles the memory requirement for fsck, the cache is thrown away
if other memory needs in fsck would otherwise fail. Thus, the
memory footprint of fsck remains unchanged in memory constrained
environments.

This work was inspired by a paper presented at Usenix's FAST '13:
www.usenix.org/conference/fast13/ffsck-fast-file-system-checker

Details of this implementation appears in the April 2013 of ;login:
www.usenix.org/publications/login/april-2013-volume-38-number-2.
A copy of the April 2013 ;login: paper can also be downloaded
from: www.mckusick.com/publications/faster_fsck.pdf.

Reviewed by: kib
Tested by:   Peter Holm
MFC after:   4 weeks
2013-03-22 21:50:43 +00:00
Jeff Roberson
2db62a6b1f - blk_equals() is too strict. If the journal entry defines more frags
than we're claiming it should still be considered an exact match.  This
   would previously leak frags that had been extended.
 - If there is a sequence number problem in the journal print the sequence
   numbers we've seen so far for debugging.
 - Clean up the block mask related debuging printfs.  Some are redundant.

MFC after:	1 week
2012-11-14 06:31:47 +00:00
Matthew D Fleming
a1c9ec3ce0 Fix some nearby type and style errors.
Pointed out by:	bde
2012-09-28 17:34:34 +00:00
Matthew D Fleming
e25a029eb2 Fix sbin/ build with a 64-bit ino_t.
Original code by:	Gleb Kurtsou
2012-09-27 23:31:06 +00:00
Matthew D Fleming
623d7cb663 Fix fsck_ffs build with a 64-bit ino_t.
Original code by:	Gleb Kurtsou
2012-09-27 23:30:58 +00:00
Andrey Zonov
0cf7f1865b - Fix a typo in debug message.
Approved by:	kib (mentor)
MFC after:	3 days
2012-09-13 12:55:10 +00:00
Konstantin Belousov
2db8baa956 fsck_ffs shall accept the configured journal size, and not refuse to
operate on it if journal size is greater then SUJ_MAX. The later
constant is only to select maximal journal size when user did not
specified size explicitely.

Submitted by:	Andrey Zonov <andrey@zonov.org>
Reviewed by:	mckusick
MFC after:	1 week
2012-08-02 10:39:54 +00:00
Konstantin Belousov
364e72457f For incompleted block allocations or frees, the inode block count usage
must be recalculated. The blk_check pass of suj checker explicitely marks
inodes which owned such blocks as needing block count adjustment. But
ino_adjblks() is only called by cg_trunc pass, which is performed before
blk_check. As result, the block use count for such inodes is left wrong.
This causes full fsck run after journaled run to still find inconsistencies
like 'INCORRECT BLOCK COUNT I=14557 (328 should be 0)' in phase 1.

Fix this issue by running additional adj_blk pass after blk_check, which
updates the field.

Reviewed by:	jeff, mckusick
MFC after:	1 week
2012-06-12 21:37:27 +00:00
Konstantin Belousov
6f100596e5 Change the type of real_dev_bsize variable from long to u_int.
The DIOCGSECTORSIZE takes u_int * as an argument, using long *
causes failures on big-endian targets.

Diagnosed by:	Michiel Boland <boland37 xs4all nl>
PR:	sparc64/163460
Tested by:	pho (x86), flo (sparc64)
MFC after:	1 week
2011-12-20 20:39:00 +00:00
Jeff Roberson
85e9da38fe - Handle the JOP_SYNC case as appropriate.
Reported by:	pho
2011-06-30 05:28:10 +00:00
Jeff Roberson
280e091a99 Implement fully asynchronous partial truncation with softupdates journaling
to resolve errors which can cause corruption on recovery with the old
synchronous mechanism.

 - Append partial truncation freework structures to indirdeps while
   truncation is proceeding.  These prevent new block pointers from
   becoming valid until truncation completes and serialize truncations.
 - On completion of a partial truncate journal work waits for zeroed
   pointers to hit indirects.
 - softdep_journal_freeblocks() handles last frag allocation and last
   block zeroing.
 - vtruncbuf/ffs_page_remove moved into softdep_*_freeblocks() so it
   is only implemented in one place.
 - Block allocation failure handling moved up one level so it does not
   proceed with buf locks held.  This permits us to do more extensive
   reclaims when filesystem space is exhausted.
 - softdep_sync_metadata() is broken into two parts, the first executes
   once at the start of ffs_syncvnode() and flushes truncations and
   inode dependencies.  The second is called on each locked buf.  This
   eliminates excessive looping and rollbacks.
 - Improve the mechanism in process_worklist_item() that handles
   acquiring vnode locks for handle_workitem_remove() so that it works
   more generally and does not loop excessively over the same worklist
   items on each call.
 - Don't corrupt directories by zeroing the tail in fsck.  This is only
   done for regular files.
 - Push a fsync complete record for files that need it so the checker
   knows a truncation in the journal is no longer valid.

Discussed with:	mckusick, kib (ffs_pages_remove and ffs_truncate parts)
Tested by:	pho
2011-06-10 22:48:35 +00:00
Dag-Erling Smørgrav
d40c066473 Mechanical whitespace cleanup.
MFC after:	3 weeks
2011-04-27 02:55:03 +00:00
Konstantin Belousov
0947d19a09 In checker, read journal by sectors.
Due to UFS insistence to pretend that device sector size is 512 bytes,
sector size is obtained from ioctl(DIOCGSECTORSIZE) for real devices,
and from the label otherwise. The file images without label have to
be made with 512 sector size.

In collaboration with:	pho
Reviewed by:	jeff
Tested by:	bz, pho
2011-02-12 13:17:14 +00:00
Kirk McKusick
7649cb0043 The dump, fsck_ffs, fsdb, fsirand, newfs, makefs, and quot utilities
include sys/time.h instead of time.h. This include is incorrect as
per the manpages for the APIs and the POSIX definitions. This commit
replaces sys/time.h where necessary with time.h.

The commit also includes some minor style(9) header fixup in newfs.

This commit is part of a larger effort by Garrett Cooper started in
//depot/user/gcooper/posix-conformance-work/ -- to make FreeBSD more
POSIX compliant.

Submitted by:  Garrett Cooper   yanegomi at gmail dot com
2011-01-24 06:17:05 +00:00
Jeff Roberson
24d37c1eec - Permit zero length directories as a handled inconsistency. This allows
directory truncation to proceed before the link has been cleared.  This
   is accomplished by detecting a directory with no . or .. links and
   clearing the named directory entry in the parent.
 - Add a new function ino_remref() which handles the details of removing
   a reference to an inode as a result of a lost directory.  There were
   some minor errors in various subcases of this routine.
2010-07-06 07:07:29 +00:00
Xin LI
edad602637 Improve fsck robustness for SU+J cases:
- Use err/errx only when the case is really fatal.  For other
   cases, fall back to full fsck instead of quiting fsck.
 - Plug a memory leak.
 - Avoid divide by zero when printing summary.
 - Output "FILE SYSTEM IS MARKED CLEAN" when a successful
   journal recovering is done.
 - When -f is specified, do full fsck instead of journal recovery.
2010-06-22 00:26:07 +00:00
Pawel Jakub Dawidek
132f1ed585 suj.c seems to contain two versions of the code.
Remove the one that doesn't compile.
2010-04-24 07:58:59 +00:00
Jeff Roberson
113db2dddb - Merge soft-updates journaling from projects/suj/head into head. This
brings in support for an optional intent log which eliminates the need
   for background fsck on unclean shutdown.

Sponsored by:   iXsystems, Yahoo!, and Juniper.
With help from: McKusick and Peter Holm
2010-04-24 07:05:35 +00:00