Commit Graph

1440 Commits

Author SHA1 Message Date
Kirk McKusick
1fd136ec5e When a process attempts to allocate space on a full filesystem, a
filesystem full message is sent to the offending process or the
kernel log if the offending process cannot be identified.

To prevent an explotion of messages, the kernel ppsratecheck()
function is used to limit the messages to one per second. This
revision changes the variable that tracks the rate of these messages
from a systemwide limit to a per-filesystem limit by moving it from
a global variable to a variable in the ufsmount structure.

Suggested by: kib
Reviewed by:  kib
Sponsored by: Netflix
2019-07-16 23:12:27 +00:00
Kirk McKusick
daba4da81d Add a new "untrusted" option to the mount command. Its purpose
is to notify the kernel that the file system is untrusted and it
should use more extensive checks on the file-system's metadata
before using it. This option is intended to be used when mounting
file systems from untrusted media such as USB memory sticks or other
externally-provided media.

It will initially be used by the UFS/FFS file system, but should
likely be expanded to be used by other file systems that may appear
on external media like msdosfs, exfat, and ext2fs.

Reviewed by:  kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20786
2019-07-01 23:22:26 +00:00
Mark Johnston
6137883ff3 Remove references to splbio in ffs_softdep.c.
Assert that the per-mountpoint softdep mutex is held in modified
functions that do not already have this assertion.  No functional
change intended.

Reviewed by:	kib, mckusick (previous version)
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20741
2019-06-26 16:28:42 +00:00
Xin LI
f89d207279 Separate kernel crc32() implementation to its own header (gsb_crc32.h) and
rename the source to gsb_crc32.c.

This is a prerequisite of unifying kernel zlib instances.

PR:		229763
Submitted by:	Yoshihiro Ota <ota at j.email.ne.jp>
Differential Revision:	https://reviews.freebsd.org/D20193
2019-06-17 19:49:08 +00:00
Kirk McKusick
e94828443c Add a missing bresle() in seldom-used error return. 2019-05-28 17:31:35 +00:00
Kirk McKusick
af6aeacb3e Convert use of UFS-specific #ifdef DEBUG to DIAGNOSTIC or INVARIANTS
as appropriate. No functional change intended.

Suggested-by: markj
2019-05-28 16:32:04 +00:00
Kirk McKusick
298184acb8 Add function name and line number debugging information to softupdates
worklist structures to help track their movement between work lists.
No functional change to the operation of soft updates intended.
2019-05-27 06:22:43 +00:00
Alan Somers
65417f5e27 Remove "struct ucred*" argument from vtruncbuf
vtruncbuf takes a "struct ucred*" argument. AFAICT, it's been unused ever
since that function was first added in r34611. Remove it.  Also, remove some
"struct ucred" arguments from fuse and nfs functions that were only used by
vtruncbuf.

Reviewed by:	cem
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20377
2019-05-24 20:27:50 +00:00
Conrad Meyer
daec92844e Include ktr.h in more compilation units
Similar to r348026, exhaustive search for uses of CTRn() and cross reference
ktr.h includes.  Where it was obvious that an OS compat header of some kind
included ktr.h indirectly, .c files were left alone.  Some of these files
clearly got ktr.h via header pollution in some scenarios, or tinderbox would
not be passing prior to this revision, but go ahead and explicitly include it
in files using it anyway.

Like r348026, these CUs did not show up in tinderbox as missing the include.

Reported by:	peterj (arm64/mp_machdep.c)
X-MFC-With:	r347984
Sponsored by:	Dell EMC Isilon
2019-05-21 20:38:48 +00:00
Konstantin Belousov
5ffc99e2e4 Handle races when remounting UFS volume from ro to rw.
In particular, ensure that writers are not unleashed before SU
structures are initialized.  Also, correctly handle MNT_ASYNC before
this.

Reported and tested by:	pho
Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-04-08 15:20:05 +00:00
Mariusz Zaborski
a1304030b8 Introduce funlinkat syscall that always us to check if we are removing
the file associated with the given file descriptor.

Reviewed by:	kib, asomers
Reviewed by:	cem, jilles, brooks (they reviewed previous version)
Discussed with:	pjd, and many others
Differential Revision:	https://reviews.freebsd.org/D14567
2019-04-06 09:34:26 +00:00
Kirk McKusick
69166928c7 This is an additional and hopefully final fix for bug report 230962.
This bug was introduced with the change to use softdep_bp_to_mp()
in January 2018 changes -r327723 and -r327821. The softdep_bp_to_mp()
function failed to include VSOCK as one of the valid cases.

Although local-domain sockets do not allocate blocks in the filesystem,
they will allocate blocks if they use extended attributes (such as
ACLs). Thus, softdep_bp_to_mp() needs to return a non-NULL mount
pointer when presented with a socket vnode so that the soft updates
write complete will properly process the soft updates structures
associated with the extended attribute blocks. It was the failure
to process these soft updates structures, thus leaving them hanging
off the buffer, which lead to the "panic: softdep_deallocate_dependencies:
dangling deps" when trying to clean up the buffer after it was written.

PR:           230962
Reported by:  2t8mr7kx9f@protonmail.com
Reviewed by:  kib
Tested by:    Peter Holm
MFC after:    1 week
Sponsored by: Netflix
2019-03-20 23:11:05 +00:00
Kirk McKusick
42a5a356a8 Add KASSERT to the softdep_disk_write_complete() function in the
soft dependency code to ensure that it will be able to avoid a
dangling dependency.

Sponsored by: Netflix
2019-03-12 00:10:31 +00:00
Kirk McKusick
3532718257 Give more complete information in INVARIANTS panic messages at end of
the ffs_truncate() function.

Sponsored by: Netflix
2019-03-11 23:53:56 +00:00
Jason A. Harmening
4775b07ebd FFS: allow sendfile(2) to work with block sizes greater than the page size
Implement ffs_getpages_async(), which when possible calls the asynchronous
flavor of the generic pager's getpages function. When the underlying
block size is larger than the system page size, however, it will invoke
the (synchronous) buffer cache pager, followed by a call to the client
completion routine. This retains true asynchronous completion in the most
common (block size <= page size) case, which is important for the performance
of the new sendfile(2). The behavior in the larger block size case mirrors
the default implementation of VOP_GETPAGES_ASYNC, which most other
filesystems use anyway as they do not override the getpages_async method.

PR:		235708
Reported by:	pho
Reviewed by:	kib, glebius
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D19340
2019-02-26 04:56:10 +00:00
Kirk McKusick
ac4b20a0a7 After a crash, a file that extends into indirect blocks may end up
shorter than its size resulting in a hole as its final block (which
is a violation of the invarients of the UFS filesystem).

Soft updates will always ensure that the file size is correct when
writing inodes to disk for files that contain only direct block
pointers. However soft updates does not roll back sizes for files
with indirect blocks that it has set to unallocated because their
contents have not yet been written to disk. Hence, the file can
appear to have a hole at its end because the block pointer has been
rolled back to zero when its inode was written to disk. Thus,
fsck_ffs calculates the last allocated block in the file. For files
that extend into indirect blocks, fsck_ffs checks for a size past
the last allocated block of the file and if that is found, shortens
the file to reference the last allocated block thus avoiding having
it reference a hole at its end.

Submitted by: Chuck Silvers <chs@netflix.com>
Tested by:    Chuck Silvers <chs@netflix.com>
MFC after:    1 week
Sponsored by: Netflix
2019-02-25 21:58:19 +00:00
Kirk McKusick
baba6af702 This bug was introduced with the change to use softdep_bp_to_mp() in
January 2018 changes -r327723 and -r327821. The softdep_bp_to_mp()
function failed to include VFIFO as one of the valid cases.

Although fifo's do not allocate blocks in the filesystem, they will
allocate blocks if they use extended attributes (such as ACLs). Thus,
softdep_bp_to_mp() needs to return a non-NULL mount pointer when
presented with a fifo vnode so that the soft updates write complete
will properly process the soft updates structures associated with the
extended attribute blocks. It was the failure to process these soft
updates structures, thus leaving them hanging off the buffer, which
lead to the "panic: softdep_deallocate_dependencies: dangling deps"
when trying to clean up the buffer after it was written.

PR:           230962
Reported by:  2t8mr7kx9f@protonmail.com
Reviewed by:  kib
Tested by:    Peter Holm
MFC after:    1 week
Sponsored by: Netflix
2019-01-28 21:36:45 +00:00
Kirk McKusick
6967c09c69 Expand DDB's set of printable soft dependency data structures. The
set of known soft dependency data structures now includes: sd_worklist,
sd_inodedep, sd_allocdirect, sd_allocindir, and sd_mkdir. DDB can
also print lists of sd_allinodedeps, sd_mkdir_list, and sd_workhead.
The sd_workhead script is useful for listing all the dependencies
associated with a buffer, e.g. bp->b_dep.

Prefix the soft dependency show names with sd_ so that they sort
together when listed by DDB's "show help" and to distinguish them
from other data structures printable by DDB.

Sponsored by: Netflix
2019-01-26 05:35:24 +00:00
Gleb Smirnoff
756a541279 Allocate pager bufs from UMA instead of 80-ish mutex protected linked list.
o In vm_pager_bufferinit() create pbuf_zone and start accounting on how many
  pbufs are we going to have set.
  In various subsystems that are going to utilize pbufs create private zones
  via call to pbuf_zsecond_create(). The latter calls uma_zsecond_create(),
  and sets a limit on created zone. After startup preallocate pbufs according
  to requirements of all pbuf zones.

  Subsystems that used to have a private limit with old allocator now have
  private pbuf zones: md(4), fusefs, NFS client, smbfs, VFS cluster, FFS,
  swap, vnode pager.

  The following subsystems use shared pbuf zone: cam(4), nvme(4), physio(9),
  aio(4). They should have their private limits, but changing that is out of
  scope of this commit.

o Fetch tunable value of kern.nswbuf from init_param2() and while here move
  NSWBUF_MIN to opt_param.h and eliminate opt_swap.h, that was holding only
  this option.
  Default values aren't touched by this commit, but they probably should be
  reviewed wrt to modern hardware.

This change removes a tight bottleneck from sendfile(2) operation, that
uses pbufs in vnode pager. Other pagers also would benefit from faster
allocation.

Together with:	gallatin
Tested by:	pho
2019-01-15 01:02:16 +00:00
Kirk McKusick
1c521f70d4 For consistency with FFS2's fifoops2 and both versions of FFS's
vnodeops make FFS1's fifoops1 use ffs_lock. Also delete ffs_reallocblks
from fifoops1 which is needed only for fifoops2 because of its
support for extended attributes that need to allocate blocks.

Suggested by: kib
2018-12-30 05:03:41 +00:00
Kirk McKusick
c0029546f8 When loading an inode from disk, verify that its mode is valid.
If invalid, return EINVAL. Note that inode check-hashes greatly
reduce the chance that these errors will go undetected.

Reported by:  Christopher Krah <krah@protonmail.com>
Reported as:  FS-5-UFS-2: Denial Of Service in nmount-3 (ffs_read)
Reviewed by:  kib
MFC after:    1 week
Sponsored by: Netflix

M    sys/fs/ext2fs/ext2_vnops.c
M    sys/kern/vfs_subr.c
M    sys/ufs/ffs/ffs_snapshot.c
M    sys/ufs/ufs/ufs_vnops.c
2018-12-27 07:18:53 +00:00
Konstantin Belousov
8690d4dea3 Allocate v_object for the new snapshot vnode.
The vnode is not opened, so it ends up with the malloced buffers otherwise.

Reported and tested by:	pho
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2018-12-23 18:54:09 +00:00
Kirk McKusick
c8f55fc4b4 Ensure that the inode check-hash is not left zeroed out in the case where
the check-hash fails. Prior to the fix in -r342133 the inode with the
zeroed out check-hash was written back to disk causing further confusion.

Reported by:  Gary Jennejohn (gj)
Sponsored by: Netflix
2018-12-15 18:49:30 +00:00
Kirk McKusick
72d28f97be Reorder ffs_verify_dinode_ckhash() so that it checks the inode check-hash
before copying in the inode so that the mode and link-count are not set
if the check-hash fails. This change ensures that the vnode will be properly
unwound and recycled rather than being held in the cache.

Initialize the file mode is zero so that if the loading of the inode
fails (for example because of a check-hash failure), the vnode will be
properly unwound and recycled.

Reported by:  Gary Jennejohn (gj)
Sponsored by: Netflix
2018-12-15 18:35:46 +00:00
Kirk McKusick
6fa9bc995a Must set ip->i_effnlink = ip->i_nlink to avoid a soft updates
"panic: softdep_update_inodeblock: bad link count" when releasing
a partially initialized vnode after an inode check-hash failure.

Reported by:  Gary Jennejohn <gljennjohn@gmail.com>
Reported by:  Peter Holm (pho)
Sponsored by: Netflix
2018-12-15 17:58:42 +00:00
Kirk McKusick
8f829a5cf0 Continuing efforts to provide hardening of FFS. This change adds a
check hash to the filesystem inodes. Access attempts to files
associated with an inode with an invalid check hash will fail with
EINVAL (Invalid argument). Access is reestablished after an fsck
is run to find and validate the inodes with invalid check-hashes.
This check avoids a class of filesystem panics related to corrupted
inodes. The hash is done using crc32c.

Note this check-hash is for the inode itself and not any of its
indirect blocks. Check-hash validation may be extended to also
cover indirect block pointers, but that will be a separate (and
more costly) feature.

Check hashes are added only to UFS2 and not to UFS1 as UFS1 is
primarily used in embedded systems with small memories and low-powered
processors which need as light-weight a filesystem as possible.

Reviewed by:  kib
Tested by:    Peter Holm
Sponsored by: Netflix
2018-12-11 22:14:37 +00:00
Mateusz Guzik
cc426dd319 Remove unused argument to priv_check_cred.
Patch mostly generated with cocinnelle:

@@
expression E1,E2;
@@

- priv_check_cred(E1,E2,0)
+ priv_check_cred(E1,E2)

Sponsored by:	The FreeBSD Foundation
2018-12-11 19:32:16 +00:00
Kirk McKusick
bdd6b77e1f If the vfs.ffs.dotrimcons sysctl option is enabled while a file
deletion is active, specifically after a call to ffs_blkrelease_start()
but before the call to ffs_blkrelease_finish(), ffs_blkrelease_start()
will have handed out SINGLETON_KEY rather than starting a collection
sequence. Thus if we get a SINGLETON_KEY passed to ffs_blkrelease_finish(),
we just return rather than trying to finish the nonexistent sequence.

Reported by:  Warner Losh (imp@)
Sponsored by: Netflix
2018-12-06 01:04:56 +00:00
Kirk McKusick
fb14e73cb4 Normally when an attempt is made to mount a UFS/FFS filesystem whose
superblock has a check-hash error, an error message noting the
superblock check-hash failure is printed and the mount fails. The
administrator then runs fsck to repair the filesystem and when
successful, the filesystem can once again be mounted.

This approach fails if the filesystem in question is a root filesystem
from which you are trying to boot. Here, the loader fails when trying
to access the filesystem to get the kernel to boot. So it is necessary
to allow the loader to ignore the superblock check-hash error and make
a best effort to read the kernel. The filesystem may be suffiently
corrupted that the read attempt fails, but there is no harm in trying
since the loader makes no attempt to write to the filesystem.

Once the kernel is loaded and starts to run, it attempts to mount its
root filesystem. Once again, failure means that it breaks to its prompt
to ask where to get its root filesystem. Unless you have an alternate
root filesystem, you are stuck.

Since the root filesystem is initially mounted read-only, it is
safe to make an attempt to mount the root filesystem with the failed
superblock check-hash. Thus, when asked to mount a root filesystem
with a failed superblock check-hash, the kernel prints a warning
message that the root filesystem superblock check-hash needs repair,
but notes that it is ignoring the error and proceeding. It does
mark the filesystem as needing an fsck which prevents it from being
enabled for writing until fsck has been run on it. The net effect
is that the reboot fails to single user, but at least at that point
the administrator has the tools at hand to fix the problem.

Reported by:    Rick Macklem (rmacklem@)
Discussed with: Warner Losh (imp@)
Sponsored by:   Netflix
2018-12-06 00:09:39 +00:00
Kirk McKusick
a02bd3e38c Move the check for the filesystem having been run on a kernel that
predates metadata check hashes so that it is done before deciding
whether to compute a check-hash of the superblock.

Reported by:  Rick Macklem <rmacklem@uoguelph.ca>
Sponsored by: Netflix
2018-11-26 00:58:07 +00:00
Kirk McKusick
ade67b509c Calculate updated superblock check-hash before writing it into the snapshot.
This corrects a bug that prevented snapshots from being mounted due to a
superblock check-hash failure.

Reported by:  Brennan Vincent <brennan@umanwizard.com>
Tested by:    Peter Holm (pho@)
Sponsored by: Netflix
2018-11-25 18:01:15 +00:00
Kirk McKusick
9fc5d538fc In preparation for adding inode check-hashes, clean up and
document the libufs interface for fetching and storing inodes.
The undocumented getino / putino interface has been replaced
with a new getinode / putinode interface.

Convert the utilities that had been using the undocumented
interface to use the new documented interface.

No functional change (as for now the libufs library does not
do inode check-hashes).

Reviewed by:  kib
Tested by:    Peter Holm
Sponsored by: Netflix
2018-11-13 21:40:56 +00:00
Konstantin Belousov
4f77f48884 Implement O_BENEATH and AT_BENEATH.
Flags prevent open(2) and *at(2) vfs syscalls name lookup from
escaping the starting directory.  Supposedly the interface is similar
to the same proposed Linux flags.

Reviewed by:	jilles (code, previous version of manpages), 0mp (manpages)
Discussed with:	allanjude, emaste, jonathan
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D17547
2018-10-25 22:16:34 +00:00
Kirk McKusick
ec888383cf Continuing efforts to provide hardening of FFS, this change adds a
check hash to the superblock. If a check hash fails when an attempt
is made to mount a filesystem, the mount fails with EINVAL (Invalid
argument). This avoids a class of filesystem panics related to
corrupted superblocks. The hash is done using crc32c.

Check hases are added only to UFS2 and not to UFS1 as UFS1 is primarily
used in embedded systems with small memories and low-powered processors
which need as light-weight a filesystem as possible.

Reviewed by:  kib
Tested by:    Peter Holm
Sponsored by: Netflix
2018-10-23 21:10:06 +00:00
Konstantin Belousov
b7befdf509 Correct panic messages.
Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
Approved by:	re (rgrimes)
MFC after:	1 week
2018-09-22 17:05:49 +00:00
Kirk McKusick
4b6a2c497e The Call For Testing had no reports of operational problems and
found that performance was no worse and usually better when running
with TRIM consolidation. Performance improvement was most noticable
when multiple large files are released in a short period of time.

Thus, TRIM consolidation is being enabled by default. Should
operational problems be found, it can be disabled using the command
`sysctl vfs.ffs.dotrimcons=0'. This variable can also be set as a
tunable if early disabling is necessary.

Approved by:  re (gjb)
Sponsored by: Netflix
2018-09-06 23:28:35 +00:00
Mark Murray
19fa89e938 Remove the Yarrow PRNG algorithm option in accordance with due notice
given in random(4).

This includes updating of the relevant man pages, and no-longer-used
harvesting parameters.

Ensure that the pseudo-unit-test still does something useful, now also
with the "other" algorithm instead of Yarrow.

PR:		230870
Reviewed by:	cem
Approved by:	so(delphij,gtetlow)
Approved by:	re(marius)
Differential Revision:	https://reviews.freebsd.org/D16898
2018-08-26 12:51:46 +00:00
Kirk McKusick
a9c2220f5c Proper spelling of consolidation.
Submitted by: Dimitry Andric
2018-08-23 22:35:14 +00:00
Kirk McKusick
d530fd484b TRIM consolodation is supposed to be off by default 2018-08-20 21:19:21 +00:00
Kirk McKusick
4de0d16b8c For traditional disks, the filesystem attempts to allocate the
blocks of a file as contiguously as possible. Since the filesystem
does not know how large a file will grow when it is first being
written, it initially places the file in a set of blocks in which
it currently fits. As it grows, it is relocated to areas with
larger contiguous blocks.  In this way it saves its large contiguous
sets of blocks for the files that need them and thus avoids
unnecessaily fragmenting its disk space.

We used to skip reallocating the blocks of a file into a contiguous
sequence if the underlying flash device requested BIO_DELETE
notifications, because devices that benefit from BIO_DELETE also
benefit from not moving the data. However, in the algorithm described
above that reallocates the blocks, the destination for the data is
usually moved before the data is written to the initially allocated
location. So we rarely suffer the penalty of extra writes.  With
the addition of the consolodation of contiguous blocks into single
BIO_DELETE operations, having fewer but larger contiguous blocks
reduces the number of (slow and expensive) BIO_DELETE operations.
So when doing BIO_DELETE consolodation, we do block reallocation.

Reviewed by:  kib
Tested by:    Peter Holm
Sponsored by: Netflix
2018-08-19 17:19:20 +00:00
Kirk McKusick
fc6e171535 Add consolodation of TRIM / BIO_DELETE commands to the UFS/FFS filesystem.
When deleting files on filesystems that are stored on flash-memory
(solid-state) disk drives, the filesystem notifies the underlying
disk of the blocks that it is no longer using. The notification
allows the drive to avoid saving these blocks when it needs to
flash (zero out) one of its flash pages. These notifications of
no-longer-being-used blocks are referred to as TRIM notifications.
In FreeBSD these TRIM notifications are sent from the filesystem
to the drive using the BIO_DELETE command.

Until now, the filesystem would send a separate message to the drive
for each block of the file that was deleted. Each Gigabyte of file
size resulted in over 3000 TRIM messages being sent to the drive.
This burst of messages can overwhelm the drive's task queue causing
multiple second delays for read and write requests.

This implementation collects runs of contiguous blocks in the file
and then consolodates them into a single BIO_DELETE command to the
drive. The BIO_DELETE command describes the run of blocks as a
single large block being deleted. Each Gigabyte of file size can
result in as few as two BIO_DELETE commands and is typically less
than ten.  Though these larger BIO_DELETE commands take longer to
run, they do not clog the drive task queue, so read and write
commands can intersperse effectively with them.

Though this new feature has been throughly reviewed and tested, it
is being added disabled by default so as to minimize the possibility
of disrupting the upcoming 12.0 release. It can be enabled by running
``sysctl vfs.ffs.dotrimcons=1''. Users are encouraged to test it.
If no problems arise, we will consider requesting that it be enabled
by default for 12.0.

Reviewed by:  kib
Tested by:    Peter Holm
Sponsored by: Netflix
2018-08-19 16:56:42 +00:00
Kirk McKusick
7e038bc257 Replace the TRIM consolodation framework originally added in -r337396
driven by problems found with the algorithms being tested for TRIM
consolodation.

Reported by:  Peter Holm
Suggested by: kib
Reviewed by:  kib
Sponsored by: Netflix
2018-08-18 22:21:59 +00:00
Kirk McKusick
cc91864c26 Revert -r337396. It is being replaced with a revised interface that
resulted from testing and further reviews.
2018-08-18 21:21:06 +00:00
Kirk McKusick
68c49bcc40 Put in place the framework for consolodating contiguous blocks into
a smaller number of larger TRIM requests. The hope had been to have
the full TRIM consolodation in place for 12.0, but the algorithms
are still under development and need further testing. With this
framework in place it will be possible to easily add TRIM consolodation
once the optimal strategy has been found.

The only functional change with this patch is the elimination of TRIM
requests for blocks that are freed before they have been likely to
have been written.

Reviewed by: kib
Discussed with: Warner Losh and Chuck Silvers
Sponsored by: Netflix
2018-08-06 21:09:11 +00:00
Konstantin Belousov
61b285ac57 Avoid assertion in /dev/ufssuspend when the suspend ioctl is
(incorrectly) called while another suspension is already active.

PR:	230220
Reported by:	dexuan
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-08-01 19:06:55 +00:00
Alan Somers
6040822c4e Make timespecadd(3) and friends public
The timespecadd(3) family of macros were imported from NetBSD back in
r35029. However, they were initially guarded by #ifdef _KERNEL. In the
meantime, we have grown at least 28 syscalls that use timespecs in some
way, leading many programs both inside and outside of the base system to
redefine those macros. It's better just to make the definitions public.

Our kernel currently defines two-argument versions of timespecadd and
timespecsub.  NetBSD, OpenBSD, and FreeDesktop.org's libbsd, however, define
three-argument versions.  Solaris also defines a three-argument version, but
only in its kernel.  This revision changes our definition to match the
common three-argument version.

Bump _FreeBSD_version due to the breaking KPI change.

Discussed with:	cem, jilles, ian, bde
Differential Revision:	https://reviews.freebsd.org/D14725
2018-07-30 15:46:40 +00:00
Kirk McKusick
15430057b3 Add needed locking for um_flags added in -r335808.
While here document required locking details in ufsmount structure.

Reported by: kib
Reviewed by: kib
2018-07-17 04:43:58 +00:00
Conrad Meyer
9c9c01e43b ffs_syncvnode: Remove unhelpful print
It can occur during ordinary use of softupdates, or perhaps if writes to the
underlying media fail (causing bufs to be redirtied).  Either way, it is not
particularly actionable.

Reviewed by:	imp, kib
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D16258
2018-07-14 15:45:11 +00:00
Kirk McKusick
e1c27cf7d6 Import commit from NetBSD with checkin message:
Avoid Undefined Behavior in ffs_clusteracct()

    Change the type of 'bit' variable from int to unsigned int and use unsigned
    values consistently.

    sys/ufs/ffs/ffs_subr.c:336:10, shift exponent -1 is negative

    Detected with Kernel Undefined Behavior Sanitizer.

    Reported by <Harry Pantazis>

Submitted by: Pedro Giffuni
2018-07-07 19:11:43 +00:00
Kirk McKusick
ab0bcb6032 Create um_flags in the ufsmount structure to hold flags for a UFS filesystem.
Convert integer structure flags to use um_flags:

	int	um_candelete;			/* devvp supports TRIM */
	int	um_writesuspended;		/* suspension in progress */

become:

#define UM_CANDELETE		0x00000001	/* devvp supports TRIM */
#define UM_WRITESUSPENDED	0x00000002	/* suspension in progress */

This is in preparation for adding other flags to indicate forcible
unmount in progress after a disk failure and possibly forcible
downgrade to read-only.

No functional change intended.

Sponsored by:	Netflix
2018-06-29 22:24:41 +00:00
Warner Losh
06753bd3f9 Use buf + strategy rather than bypassing geom_vfs layer
The reference counting that's done in the geom_vfs layer to prevent
delivery of requests to defunct devices only works if all requests go
through that layer. UFS was bypassing that layer for BIO_DELETE requests,
sending them to the geom_consumer directly with g_io_request. Allocate
a buf, fill it in, and call strategy on it instead.

Submitted by: Chuck Silvers
Reviewed by: scottl, imp, kirk
Sponsored by: Netflix
Differential: https://reviews.freebsd.org/D15456
2018-06-26 00:39:38 +00:00
Matt Macy
1ef3a74e6b ufs: remove cgbno variable where unused 2018-05-19 19:30:42 +00:00
Kirk McKusick
8ab507588b Fix warning found by Coverity.
CID 1009353:  Error handling issues  (CHECKED_RETURN)
2018-05-16 23:42:02 +00:00
Konstantin Belousov
2ebc882927 Detect and optimize reads from the hole on UFS.
- Create getblkx(9) variant of getblk(9) which can return error.
- Add GB_NOSPARSE flag for getblk()/getblkx() which requests that BMAP
  was performed before the buffer is created, and EJUSTRETURN returned
  in case the requested block does not exist.
- Make ffs_read() use GB_NOSPARSE to avoid instantiating buffer (and
  allocating the pages for it), copying from zero_region instead.

The end result is less page allocations and buffer recycling when a
hole is read, which is important for some benchmarks.

Requested and reviewed by:	jeff
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D14917
2018-05-13 09:47:28 +00:00
Kirk McKusick
f8ccf17383 Renumber soft-update types starting at 1 instead of 0 to avoid confusion
of zero'ed memory appearing to have a valid soft-update type.

Also correct some comments.

Reviewed by: kib
2018-04-05 00:32:01 +00:00
Ed Maste
d8ba45e213 Revert r313780 (UFS_ prefix) 2018-03-17 12:59:55 +00:00
Ed Maste
1e2b9afca9 Prefix UFS symbols with UFS_ to reduce namespace pollution
Followup to r313780.  Also prefix ext2's and nandfs's versions with
EXT2_ and NANDFS_.

Reported by:	kib
Reviewed by:	kib, mckusick
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D9623
2018-03-17 01:48:27 +00:00
Kirk McKusick
efbf396426 This change is some refactoring of Mark Johnston's changes in r329375
to fix the memory leak that I introduced in r328426. Instead of
trying to clear up the possible memory leak in all the clients, I
ensure that it gets cleaned up in the source (e.g., ffs_sbget ensures
that memory is always freed if it returns an error).

The original change in r328426 was a bit sparse in its description.
So I am expanding on its description here (thanks cem@ and rgrimes@
for your encouragement for my longer commit messages).

In preparation for adding check hashing to superblocks, r328426 is
a refactoring of the code to get the reading/writing of the superblock
into one place. Unlike the cylinder group reading/writing which
ends up in two places (ffs_getcg/ffs_geom_strategy in the kernel
and cgget/cgput in libufs), I have the core superblock functions
just in the kernel (ffs_sbfetch/ffs_sbput in ffs_subr.c which is
already imported into utilities like fsck_ffs as well as libufs to
implement sbget/sbput). The ffs_sbfetch and ffs_sbput functions
take a function pointer to do the actual I/O for which there are
four variants:

    ffs_use_bread / ffs_use_bwrite for the in-kernel filesystem

    g_use_g_read_data / g_use_g_write_data for kernel geom clients

    ufs_use_sa_read for the standalone code (stand/libsa/ufs.c
	but not stand/libsa/ufsread.c which is size constrained)

    use_pread / use_pwrite for libufs

Uses of these interfaces are in the UFS filesystem, geoms journal &
label, libsa changes, and libufs. They also permeate out into the
filesystem utilities fsck_ffs, newfs, growfs, clri, dump, quotacheck,
fsirand, fstyp, and quot. Some of these utilities should probably be
converted to directly use libufs (like dumpfs was for example), but
there does not seem to be much win in doing so.

Tested by: Peter Holm (pho@)
2018-03-02 04:34:53 +00:00
Conrad Meyer
d4e6557bae ffs: softdep_disk_write_complete: Quiesce spurious Coverity warning
Coverity cannot determine that handle_written_indirdep() does not access
uninitialized 'sbp' when flags argument is zero.

So, simply move the initialization slightly sooner to silence the warning.

No functional change.

Reported by:	Coverity
Sponsored by:	Dell EMC Isilon
2018-03-01 00:29:52 +00:00
Kirk McKusick
528833fae1 Use a more straight-forward approach to relaxing the location
restraints when validating one of the backup superblocks.
2018-02-26 00:34:56 +00:00
Kirk McKusick
4cbd996a84 Relax the location restraints when validating one of the
backup superblocks.
2018-02-24 03:33:46 +00:00
Kirk McKusick
f686b1710a Refactor fix in r329600 to do its check once in readsuper() rather
than in the two places that call readsuper().

No semantic change intended.

Reviewed by: kib
2018-02-21 19:56:19 +00:00
Konstantin Belousov
9f74642385 Do not free(9) uninitialized pointer.
Reported and tested by:	allanjude
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
2018-02-19 19:08:25 +00:00
Mark Johnston
16759360d4 Fix a memory leak introduced in r328426.
ffs_sbget() may return a superblock buffer even if it fails, so the
caller must be prepared to free it in this case. Moreover, when tasting
alternate superblock locations in a loop, ffs_sbget()'s readfunc
callback must free the previously allocated buffer.

Reported and tested by:	pho
Reviewed by:		kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D14390
2018-02-16 15:41:03 +00:00
Kirk McKusick
068beacf21 The goal of this change is to prevent accidental foot shooting by
folks running filesystems created on check-hash enabled kernels
(which I will call "new") on a non-check-hash enabled kernels (which
I will call "old). The idea here is to detect when a filesystem is
run on an old kernel and flag the filesystem so that when it gets
moved back to a new kernel, it will not start getting a slew of
check-hash errors.

Back when the UFS version 2 filesystem was created, it added a file
flag FS_INDEXDIRS that was to be set on any filesystem that kept
some sort of on-disk indexing for directories. The idea was precisely
to solve the issue we have today. Specifically that a newer kernel
that supported indexing would be able to tell that the filesystem
had been run on an older non-indexing kernel and that the indexes
should not be used until they had been rebuilt. Since we have never
implemented on-disk directory indicies, the FS_INDEXDIRS flag is
cleared every time any UFS version 2 filesystem ever created is
mounted for writing.

This commit repurposes the FS_INDEXDIRS flag as the FS_METACKHASH
flag. Thus, the FS_METACKHASH is definitively known to have always
been cleared. The FS_INDEXDIRS flag has been moved to a new block
of flags that will always be cleared starting with this commit
(until they get used to implement some future feature which needs
to detect that the filesystem was mounted on a kernel that predates
the new feature).

If a filesystem with check-hashes enabled is mounted on an old
kernel the FS_METACKHASH flag is cleared. When that filesystem is
mounted on a new kernel it will see that the FS_METACKHASH has been
cleared and clears all of the fs_metackhash flags. To get them
re-enabled the user must run fsck (in interactive mode without the
-y flag) which will ask for each supported check hash whether it
should be rebuilt and enabled. When fsck is run in its default preen
mode, it will just ignore the check hashes so they will remain
disabled.

The kernel has always disabled any check hash functions that it
does not support, so as more types of check hashes are added, we
will get a non-surprising result. Specifically if filesystems get
moved to kernels supporting fewer of the check hashes, those that
are not supported will be disabled. If the filesystem is moved back
to a kernel with more of the check-hashes available and fsck is run
interactively to rebuild them, then their checking will resume.
Otherwise just the smaller subset will be checked.

A side effect of this commit is that filesystems running with
cylinder-group check hashes will stop having them checked until
fsck is run to re-enable them (since none of them currently have
the FS_METACKHASH flag set). So, if you want check hashes enabled
on your filesystems after booting a kernel with these changes, you
need to run fsck to enable them. Any newly created filesystems will
have check hashes enabled. If in doubt as to whether you have check
hashes emabled, run dumpfs and look at the list of enabled flags
at the end of the superblock details.
2018-02-08 23:06:58 +00:00
Kirk McKusick
47806d1b93 Occasional cylinder-group check-hash errors were being reported on
systems running with a heavy filesystem load. Tracking down this
bug was elusive because there were actually two problems. Sometimes
the in-memory check hash was wrong and sometimes the check hash
computed when doing the read was wrong. The occurrence of either
error caused a check-hash mismatch to be reported.

The first error was that the check hash in the in-memory cylinder
group was incorrect. This error was caused by the following
sequence of events:

- We read a cylinder-group buffer and the check hash is valid.
- We update its cg_time and cg_old_time which makes the in-memory
  check-hash value invalid but we do not mark the cylinder group dirty.
- We do not make any other changes to the cylinder group, so we
  never mark it dirty, thus do not write it out, and hence never
  update the incorrect check hash for the in-memory buffer.
- Later, the buffer gets freed, but the page with the old incorrect
  check hash is still in the VM cache.
- Later, we read the cylinder group again, and the first page with
  the old check hash is still in the VM cache, but some other pages
  are not, so we have to do a read.
- The read does not actually get the first page from disk, but rather
  from the VM cache, resulting in the old check hash in the buffer.
- The value computed after doing the read does not match causing the
  error to be printed.

The fix for this problem is to only set cg_time and cg_old_time as
the cylinder group is being written to disk. This keeps the in-memory
check-hash valid unless the cylinder group has had other modifications
which will require it to be written with a new check hash calculated.
It also requires that the check hash be recalculated in the in-memory
cylinder group when it is marked clean after doing a background write.

The second problem was that the check hash computed at the end of the
read was incorrect because the calculation of the check hash on
completion of the read was being done too soon.

- When a read completes we had the following sequence:

  - bufdone()
  -- b_ckhashcalc (calculates check hash)
  -- bufdone_finish()
  --- vfs_vmio_iodone() (replaces bogus pages with the cached ones)

- When we are reading a buffer where one or more pages are already
  in memory (but not all pages, or we wouldn't be doing the read),
  the I/O is done with bogus_page mapped in for the pages that exist
  in the VM cache. This mapping is done to avoid corrupting the
  cached pages if there is any I/O overrun. The vfs_vmio_iodone()
  function is responsible for replacing the bogus_page(s) with the
  cached ones. But we were calculating the check hash before the
  bogus_page(s) were replaced. Hence, when we were calculating the
  check hash, we were partly reading from bogus_page, which means
  we calculated a bad check hash (e.g., because multiple pages have
  been mapped to bogus_page, so its contents are indeterminate).

The second fix is to move the check-hash calculation from bufdone()
to bufdone_finish() after the call to vfs_vmio_iodone() so that it
computes the check hash over the correct set of pages.

With these two changes, the occasional cylinder-group check-hash
errors are gone.

Submitted by: David Pfitzner <dpfitzner@netflix.com>
Reviewed by: kib
Tested by: David Pfitzner
2018-02-06 00:19:46 +00:00
Kirk McKusick
0d37a428f0 When reading a cylinder group, break out reporting of check hash errors
from other types of errors so that the error is correctly reported.
2018-01-31 23:13:37 +00:00
Kirk McKusick
dffce2150e Refactoring of reading and writing of the UFS/FFS superblock.
Specifically reading is done if ffs_sbget() and writing is done
in ffs_sbput(). These functions are exported to libufs via the
sbget() and sbput() functions which then used in the various
filesystem utilities. This work is in preparation for adding
subperblock check hashes.

No functional change intended.

Reviewed by: kib
2018-01-26 00:58:32 +00:00
Pedro F. Giffuni
a94a2945be ext2fs|ufs:Unsign some values related to allocation.
When allocating memory through malloc(9), we always expect the amount of
memory requested to be unsigned as a negative value would either stand for
an error or an overflow.
Unsign some values, found when considering the use of mallocarray(9), to
avoid unnecessary casting. Also consider that indexes should be of
at least the same size/type as the upper limit they pretend to index.

MFC after:	2 weeks
2018-01-24 17:58:48 +00:00
Pedro F. Giffuni
f9834d101a Revert r327781, r328093, r328056:
ufs|ext2fs: Revert uses of mallocarray(9).

These aren't really useful: drop them.
Variable unsigning will be brought again later.
2018-01-24 16:44:57 +00:00
Pedro F. Giffuni
90b618f35b ufs: use mallocarray(9).
Basic use of mallocarray to prevent overflows: static analyzers are also
likely to perform additional checks.

Since mallocarray expects unsigned parameters, unsign some
related variables to minimize sign conversions.

Reviewed by:	mckusick
2018-01-17 18:18:33 +00:00
Konstantin Belousov
147b0c1a3e Softlink inodes can own buffers with dependencies.
At least, softlinks longer than 120 bytes have data fragments.

Submitted by:	mckusick
MFC after:	5 days
2018-01-11 13:37:45 +00:00
Konstantin Belousov
c999b43527 Generalize the fix from r322757 and apply it to several more places.
The code accesses bp->b_dep without owning the ufs mount softdep lock,
which makes it possible for the derefenced workitem to be freed in
parallel.  In particular, the deallocate_dependencies(),
softdep_disk_io_initiation() and softdep_disk_write_complete() are
affected.

Move the code to safely calculate ump from the buffer with
dependencies into the helper softdep_bp_to_mp() and use it for all
found cases.

Tested by:	pho (as part of the bigger patch)
Reviewed by:	mckusick (as part of the bigger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-01-09 10:51:44 +00:00
Konstantin Belousov
e51e3c7e73 When handling write completion, take SU lock around calls to
handle_written_XXX() in case of processing the buffer with an error.

Tested by:	pho (as part of the bigger patch)
Reviewed by:	mckusick (as part of the bigger patch)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-01-09 10:44:17 +00:00
Konstantin Belousov
377f88fb08 Postpone the disassotiation of the background write buffer with devvp
so that buf_complete() sees fully constructed buffer.

This is a NOP right now, but will be needed by the forthcoming SU change.

Reported and tested by:	pho
Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-01-09 10:33:11 +00:00
Pedro F. Giffuni
51268c3852 SPDX: Complete License ID tags for UFS. 2017-12-27 19:13:50 +00:00
Eitan Adler
caa7e52f3f kernel: Fix several typos and minor errors
- duplicate words
- typos
- references to old versions of FreeBSD

Reviewed by:	imp, benno
2017-12-27 03:23:21 +00:00
Alexander Kabaev
151ba7933a Do pass removing some write-only variables from the kernel.
This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.

Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385
2017-12-25 04:48:39 +00:00
Alexander Kabaev
5f943cca65 Remove dead initialization of the inode pointer.
The pointer gets initialized again later in the code. This also
improves code style(9).
2017-12-23 16:24:02 +00:00
Mark Johnston
b6fbf003e1 Provide a sysctl to force synchronous initialization of inode blocks.
FFS performs asynchronous inode initialization, using a barrier write
to ensure that the inode block is written before the corresponding
cylinder group header update. Some GEOMs do not appear to handle
BIO_ORDERED correctly, meaning that the barrier write may not work as
intended. The sysctl allows one to work around this problem at the
cost of expensive file creation on new filesystems. The default
behaviour is unchanged.

Reviewed by:	kib, mckusick
MFC after:	1 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D13428
2017-12-09 15:44:30 +00:00
Pedro F. Giffuni
fe267a5590 sys: general adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
2017-11-27 15:23:17 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Konstantin Belousov
4e13cca54e Improve the message printed when the cylinder group checksum is wrong.
Mention the device path and mount point path, handle snapshots.

Tested by:	imp
Sponsored by:	The FreeBSD Foundation
2017-11-05 13:28:48 +00:00
Mark Johnston
4770655901 Remove a stale and incorrect comment.
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-10-28 02:51:27 +00:00
Mark Johnston
9cf7abcc1d Remove workqueue items after updating the workqueue tail pointer.
When QUEUE_MACRO_DEBUG_TRASH is configured, the queue linkage fields
are trashed upon removal of the item, so be sure to only read them before
removing the item.

No functional change intended.

MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2017-10-28 02:48:37 +00:00
Mark Johnston
4c52a9993b Make drain_output() use bufobj_wwait().
No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D12790
2017-10-25 17:20:18 +00:00
John Baldwin
800c3e80de Don't defer wakeup()s for completed journal workitems.
Normally wakeups() are performed for completed softupdates work items
in workitem_free() before the underlying memory is free()'d.
complete_jseg() was clearing the "wakeup needed" flag in work items to
defer the wakeup until the end of each loop iteration.  However, this
resulted in the item being free'd before it's address was used with
wakeup().  As a result, another part of the kernel could allocate this
memory from malloc() and use it as a wait channel for a different
"event" with a different lock.  This triggered an assertion failure
when the lock passed to sleepq_add() did not match the existing lock
associated with the sleep queue.  Fix this by removing the code to
defer the wakeup in complete_jseg() allowing the wakeup to occur
slightly earlier in workitem_free() before free() is called.

The main reason I can think of for deferring a wakeup() would be to
avoid waking up a waiter while holding a lock that the waiter would
need.  However, no locks are dropped in between the wakeup() in
workitem_free() and the end of the loop in complete_jseg() as far as I
can tell.

In general I think it is not safe to do a wakeup() after free() as one
cannot control how other parts of the kernel that might reuse the
address for a different wait channel will handle spurious wakeups.

Reported by:	pho
Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D12494
2017-09-26 23:24:15 +00:00
Konstantin Belousov
55e5a5c1f4 Fix 32bit build.
Reported by:	emaste
Sponsored by:	The FreeBSD Foundation
2017-09-22 16:42:41 +00:00
Kirk McKusick
75e3597abb Continuing efforts to provide hardening of FFS, this change adds a
check hash to cylinder groups. If a check hash fails when a cylinder
group is read, no further allocations are attempted in that cylinder
group until it has been fixed by fsck. This avoids a class of
filesystem panics related to corrupted cylinder group maps. The
hash is done using crc32c.

Check hases are added only to UFS2 and not to UFS1 as UFS1 is primarily
used in embedded systems with small memories and low-powered processors
which need as light-weight a filesystem as possible.

Specifics of the changes:

sys/sys/buf.h:
    Add BX_FSPRIV to reserve a set of eight b_xflags that may be used
    by individual filesystems for their own purpose. Their specific
    definitions are found in the header files for each filesystem
    that uses them. Also add fields to struct buf as noted below.

sys/kern/vfs_bio.c:
    It is only necessary to compute a check hash for a cylinder
    group when it is actually read from disk. When calling bread,
    you do not know whether the buffer was found in the cache or
    read. So a new flag (GB_CKHASH) and a pointer to a function to
    perform the hash has been added to breadn_flags to say that the
    function should be called to calculate a hash if the data has
    been read. The check hash is placed in b_ckhash and the B_CKHASH
    flag is set to indicate that a read was done and a check hash
    calculated. Though a rather elaborate mechanism, it should
    also work for check hashing other metadata in the future. A
    kernel internal API change was to change breada into a static
    fucntion and add flags and a function pointer to a check-hash
    function.

sys/ufs/ffs/fs.h:
    Add flags for types of check hashes; stored in a new word in the
    superblock. Define corresponding BX_ flags for the different types
    of check hashes. Add a check hash word in the cylinder group.

sys/ufs/ffs/ffs_alloc.c:
    In ffs_getcg do the dance with breadn_flags to get a check hash and
    if one is provided, check it.

sys/ufs/ffs/ffs_vfsops.c:
    Copy across the BX_FFSTYPES flags in background writes.
    Update the check hash when writing out buffers that need them.

sys/ufs/ffs/ffs_snapshot.c:
    Recompute check hash when updating snapshot cylinder groups.

sys/libkern/crc32.c:
lib/libufs/Makefile:
lib/libufs/libufs.h:
lib/libufs/cgroup.c:
    Include libkern/crc32.c in libufs and use it to compute check
    hashes when updating cylinder groups.

Four utilities are affected:

sbin/newfs/mkfs.c:
    Add the check hashes when building the cylinder groups.

sbin/fsck_ffs/fsck.h:
sbin/fsck_ffs/fsutil.c:
    Verify and update check hashes when checking and writing cylinder groups.

sbin/fsck_ffs/pass5.c:
    Offer to add check hashes to existing filesystems.
    Precompute check hashes when rebuilding cylinder group
    (although this will be done when it is written in fsutil.c
    it is necessary to do it early before comparing with the old
    cylinder group)

sbin/dumpfs/dumpfs.c
    Print out the new check hash flag(s)

sbin/fsdb/Makefile:
    Needs to add libufs now used by pass5.c imported from fsck_ffs.

Reviewed by: kib
Tested by: Peter Holm (pho)
2017-09-22 12:45:15 +00:00
John Baldwin
4f45713ae2 Add UFS_LINK_MAX for the UFS-specific limit on link counts.
ino64 expanded nlink_t to 64 bits, but the on-disk format for UFS is still
limited to 16 bits.  This is a nop currently but will matter if LINK_MAX is
increased in the future.

Reviewed by:	kib
Sponsored by:	Chelsio Communications
2017-09-18 23:30:39 +00:00
Kirk McKusick
855662c611 The new fsck recovery information to enable it to find backup
superblocks created in revision 322297 only works on disks
with sector sizes up to 4K. This update allows the recovery
information to be created by newfs and used by fsck on disks
with sector sizes up to 64K. Note that FFS currently limits
filesystem to be mounted from disks with up to 8K sectors.
Expanding this limitation will be the subject of another
commit.

Reported by: Peter Holm
Reviewed with: kib
2017-09-04 20:19:36 +00:00
Konstantin Belousov
2f9d88c7ae Protect v_rdev dereference with the vnode interlock instead of the
vnode lock.

Caller of softdep_count_dependencies() may own a buffer lock, which
might conflict with the lock order.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	10 days
2017-08-25 09:51:22 +00:00
Konstantin Belousov
f0d5223230 Avoid dereferencing potentially freed workitem in
softdep_count_dependencies().

Buffer's b_dep list is protected by the SU mount lock.  Owning the
buffer lock is not enough to guarantee the stability of the list.

Calculation of the UFS mount owning the workitems from the buffer must
be much more careful to not dereference the work item which might be
freed meantime.  To get to ump, use the pointers chain which does not
involve workitems at all.

Reported and tested by:	pho
Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-08-21 16:23:44 +00:00
Konstantin Belousov
b5f2560d09 Style.
Reviewed by:	mckusick
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2017-08-21 16:16:02 +00:00
Kirk McKusick
77b63aa0fc Since the switch to GPT disk labels, fsck for UFS/FFS has been
unable to automatically find alternate superblocks. This checkin
places the information needed to find alternate superblocks to the
end of the area reserved for the boot block.

Filesystems created with a newfs of this vintage or later will
create the recovery information. If you have a filesystem created
prior to this change and wish to have a recovery block created for
your filesystem, you can do so by running fsck in forground mode
(i.e., do not use the -p or -y options). As it starts, fsck will
ask ``SAVE DATA TO FIND ALTERNATE SUPERBLOCKS'' to which you should
answer yes.

Discussed with: kib, imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D11589
2017-08-09 05:17:21 +00:00
Kirk McKusick
33bbdde01f Avoid reading a snapshot block when it is already in the cache.
Update the use of the B_CACHE flag (since the May 1999 commit
that made it the correct test here).

Reported by: Andreas Longwitz <longwitz@incore.de>
Reviewed by: kib
Tested by: Peter Holm
MFC after: 1 week
2017-07-31 20:41:45 +00:00
Konstantin Belousov
5cf14660ae Improve publication of the newly allocated snapdata.
For freshly allocated snapdata, Lock sn_lock in advance, so
si_snapdata readers see the locked snapdata and not race.

For existing snapdata, if the thread was put to sleep waiting for
sn_lock, re-read si_snapdata.  This either closes the race or makes
the reliance on LK_DRAIN less important.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-07-21 18:42:35 +00:00
Konstantin Belousov
c536471408 Unlock correct lock in ffs_snapblkfree().
It is possible for ffs_snapblkfree() to race and lock snaplock while
the devvp snapdata is instantiated, but no snapshots exist.  In this
case the loop over snapshots in ffs_snapblkfree() is not executed, and
the local variable vp is left initialized to NULL.

Unlock using &sn->sn_lock and not vp->v_vnlock.  For the inodes on the
snapshot list, the locks are same.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-07-21 18:36:17 +00:00
Konstantin Belousov
f2e6bf5c05 Account for lock recursion when transfering snaplock to the vnode lock
in ffs_snapremove().

Apparently ffs_snapremove() may be called with the snap lock recursed,
at least one trace demonstrated this when snapshot vnode was unlinked
while synced.  It was inactivated from the syncer thread.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2017-07-21 18:28:27 +00:00
Konstantin Belousov
35bf780921 Remove write-only variable.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2017-07-16 07:12:04 +00:00