for limiting disk (actually filesystem) IO.
Note that in some cases these limits are not quite precise. It's ok,
as long as it's within some reasonable bounds.
Testing - and review of the code, in particular the VFS and VM parts - is
very welcome.
MFC after: 1 month
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D5080
This adopts the same change as r291936 for UFS.
Directly clear IN_ACCESS or IN_UPDATE when user supplied the time, and
copy the value into the inode.
This keeps the behaviour cleaner and is consistent with UFS.
Reviewed by: bde
MFC after: 1 month (only 10)
Sync with r84642 from UFS:
The panics are inappropriate because the IN_RENAME flag only fixes a
few of the huge number of race conditions that can result in the
source path becoming invalid even prior to the VOP_RENAME() call.
Found accidentally while checking an issue from PVS Static Analysis.
MFC after: 3 days
This is ongoing work from Damjan Jovanovic to improve ext4 read support
with sparse files:
Keep track of the first and last block in each extent as it descends down
the extent tree, thus being able to work out that some blocks are sparse
earlier. This solves an issue on r293680.
In ext4_bmapext() start supporting the runb parameter, which appears to be
the number of adjacent blocks prior to the block being converted in the
same way that runp is the number of blocks after, speding up random access
to mmaped files.
PR: 206652
ext2fs: passthrough any extra timestamps to the dinode struct.
While it passed the classic testing, the change appears to have
caused some regression and still requires some more precautions.
PR: 206820
MFC after: 3 days
In general we don't trust any of the extended timestamps unless the
EXT2F_ROCOMPAT_EXTRA_ISIZE feature is set. However, in the case where
we freshly allocated a new inode the information is valid and it is
better to pass it along instead of leaving the value undefined.
This should have no practical effect but should reduce the amount of
garbage if EXT2F_ROCOMPAT_EXTRA_ISIZE is set, like in cases where the
filesystem is converted from ext3 to ext4.
MFC after: 4 days
Directory index was introduced in ext3. We don't always use the
prefix to denote the ext2 variant they belong to but when we
do we should try to be accurate.
We use i_flag to carry some flags like IN_E4INDEX which newer
ext2fs variants uses internally.
fsck.ext3 rightfully complains after our implementation tags
non-directory inodes with INDEX_FL.
Initializing i_flag during allocation removes the noise factor
and quiets down fsck.
Patch from: Damjan Jovanovic
PR: 206530
The htree dir_index is perhaps one of the most characteristic
features of the linux ext3 implementation. It was removed
in r281670, due to repeated bug reports.
Damjan Jovanic detected and fixed three bugs and did some
stress testing by building Apache OpenOffice on top of it
so it is now in good shape to bring back.
Differential Revision: https://reviews.freebsd.org/D5007
Submitted by: Damjan Jovanovic
Reviewed by: pfg
Tested by: pho
Relnotes: Yes
MFC after: 2 months (only 10.x)
Add support for sparse files in ext4. Also implement read-ahead, which
greatly increases the performance when transferring files from ext4.
Both features implemented by Damjan Jovanovic.
PR: 205816
MFC after: 1 week
Merge the filesystem specific part from r274914 to ext2fs.
I only did regular testing with the change but UFS and our ext2fs
are similar enough that the code should just work with the new
sendfile.
Discussed with: glebius
The htree directory index is a highly desirable feature for research
purposes and was meant to improve performance in our ext2/3 driver.
Unfortunately our implementation has two problems:
- It never really delivered any performance improvement.
- It appears to corrupt the filesystem in undetermined circumstances.
Strictly speaking dir_index is not required for read/write support in
ext2/3 and our limited ext4 support still works fine without it.
Regain stability in the ext2 driver by removing it. We may need it back
(fixed) if we want to support encrypted ext4 support but thanks to the
wonders of version control we can always revert this change and bring it
back.
PR: 191895
PR: 198731
PR: 199309
MFC after: 5 days
use VOP_FSYNC() to perform the NFS server's Commit operation.
This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which
is set by file systems that use the buffer cache. If this flag
is not set, the NFS server always does a VOP_FSYNC().
This should be ok for old file system modules that do not set
MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although
it might not be optimal for file systems that use the buffer cache.
Reviewed by: kib
MFC after: 2 weeks
The e2fs_gd struct was not being initialized and garbage was
being used for hinting the ext2 allocator variant.
Use malloc to clear the values and also initialize e2fs_contigdirs
during allocation to keep consistency.
While here clean up small style issues.
Reported by: Clang static analyser
MFC after: 1 week
After the ext2 variant of the "orlov allocator" was implemented,
the case for a negative or zero dirsize disappeared.
Drop the dead code and unsign dirsize given that it can't be
negative anyways.
CID: 1008669
MFC after: 1 week
into namecache, to avoid cache trashing when doing large operations.
E.g., tar archive extraction is not usually followed by access to many
of the files created.
Right now, each VOP_LOOKUP() implementation explicitely knowns about
this quirk and tests for both MAKEENTRY flag presence and op != CREATE
to make the call to cache_enter(). Centralize the handling of the
quirk into VFS, by deciding to cache only by MAKEENTRY flag in VOP.
VFS now sets NOCACHE flag for CREATE namei() calls.
Note that the change in semantic is backward-compatible and could be
merged to the stable branch, and is compatible with non-changed
third-party filesystems which correctly handle MAKEENTRY.
Suggested by: Chris Torek <torek@pi-coral.com>
Reviewed by: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Overrunning buffer pointed to by (caddr_t)&oip->i_db[0] of 48 bytes by
passing it to a function which accesses it at byte offset 59 using
argument 60UL.
The issue was inherited from an older FFS implementation and
fixed there with by merging UFS2 in r98542. We follow the
FFS fix.
Discussed with: bde
CID: 1007665
MFC after: 3 days
ext2_print_inode is not really used but it was nice to
have for initial development work. #ifdef it under a
new EXT2FS_DEBUG knob so that we don't spend time
compiling it.
MFC after: 3 days
survives remount in rw, also it is set for vnodes on rootfs before
noatime can be set or clock is adjusted. All conditions result in
wrong atime for accessed vnodes.
Submitted by: bde
MFC after: 1 week
by ffs and ext2fs. Remove duplicated call to vm_page_zero_invalid(),
done by VOP and by vm_pager_getpages(). Use vm_pager_free_nonreq().
Reviewed by: alc (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 6 weeks (after r271596)
In linux EXT4_LINK_MAX is now 64000. We can't really do that
since i_nlink and va_nlink are signed so setting higher values
is likely to cause trouble.
This is a system limitation so set the EXT_LINK_MAX to
what the system can handle.
MFC after: 3 days
forcing filesystem VOP_LINK() methods to repeat the code. In
tmpfs_link(), remove redundand check for the type of the source,
already done by VFS.
Note that NFS server already performs this check before calling
VOP_LINK().
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
ext2fs: minor update to the dirpref policy.
The change in UFS r254996, reverted the change as the
older code seems to work better. This was not visible
in local testing but we can trust UFS is vastly more
exercised in diferent environments.
Bring in a minor change to the dirpref policy based on r248623.
This is pretty minimal change to keep the implementation in
sync with UFS but other parts from the original change are not
directly applicable so don't expect improvements in fsck times.
MFC after: 2 weeks
Consistently use a single tab after a #define as mentioned in style(9).
Use tabs instead of space for indenting.
Fix a typo: "hash_vesion".
No functional change.
MFC after: 3 days
The ext4 developers tend to tag Ext4-specific flags as
"incompatible" even when such features are not relevant for
read-only support. This is a consequence of the process
though which this filesystem is implemented without design
and the fact that some new features are not extensible to
ext2/3.
Organize the features according to what we support and sort
them so that we can now read-only mount filesystems with
some features that may be found in newly formatted ext4 fs.
Submitted by: Zheng Liu
Reviewed by: pfg
MFC after: 5 days
The ext4 inode flags do not have equivalents for chflags (1)
and hold information that is private to the implementation.
The i_flag field in the inode is a better place to hold the Ext4
inode flags as it saves us from masking flags while setting or
getting attributes. It should also make things cleaner if we
implement write support for Ext4.
Suggested by: bde
Tested by: Mike Ma
MFC after: 3 days
The IN_* flags should be set in i_flag instead of corrupting
i_flags [1].
Re-enable HTree dirindex as the last series of bug fixes
seems to have fixed the issues.
Reported by: bde [1]
Tested by: kevlo
MFC after: 1 week
Use the bitwise negation instead of bogus boolean negation and move
the flag manipulation with the assignment.
Fix some grammatical errors introduced in the same change.
Reported by: bde
MFC after: 3 days
r260545 cleared the inode flags to fix corruption problems but
we still need to pass some EXT4 flags for the ext4 read-only
mode. None of these attributes has an equivalent in FreeBSD and
are uninteresting for the system utilities so they should be
innaccessible in ext2_getattrib().
Note: we also use EXT4_HUGE_FILE but we use it directly from the
dinode structure so it is not necessary to translate it,
Suggested by: bde
MFC after: 3 days
After r252890 we are naively attempting to pass through the
inode flags. This is technically incorrect as the ext2
inode flags don't match the UFS/system values used in
FreeBSD and a clean conversion is needed.
Some filtering was left in place so the change didn't cause
significant changes in FreeBSD but some of the garbage passed
is likely to be the cause for warning messages in linux.
Fix the issue by resetting the flags before conversion as was
done previously. This also means we will not pass the EXT4_*
inode flags into FreeBSD's inode.
PR: kern/185448
MFC after: 3 days
According to online documentation [1], Ext4 has two new "special"
inodes so add the new exclude and replica inodes.
Reference:
[1] https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout
Reported by: Mike Ma
MFC after: 3 weeks
di_extsize is the EA size and as such it should be unsigned.
Adjust related types for consistency.
Reviewed by: mckusick (previous version)
MFC after: 3 weeks
Our code does not consider yet the case of hash collisions. This
is a rather annoying situation where two or more files that
happen to have the same hash value will not appear accessible.
The situation is not difficult to work-around but given that things
will just work without enabling htree we will save possible
embarrassments for the next release.
Reported by: Kevin Lo
Previous bandaid was not appropriate and didn't really work for
all platforms. While here, cleanup the surrounding code to match
ffs_checkoverlap()
Reported by: dim, jmallet and bde
MFC after: 3 weeks
Add definitions for e2fs_daddr_t, e4fs_daddr_t in addition
to the already existing e2fs_lbn_t and adjust them for ext4.
Other than making the code more readable these changes should
fix problems related to big filesystems.
Setting the proper types can be tricky so the process was
helped by looking at UFS. In our implementation, logical block
numbers can be negative and the code depends on it. In ext2,
block numbers are unsigned so it is convenient to keep
e2fs_daddr_t unsigned and use the complete 32 bits. In the
case of e4fs_daddr_t, while the value should be unsigned, for
ext4 we only need to support 48 bits so preserving an extra
bit from the sign is not an issue.
While here also drop the ext2_setblock() prototype that was
never used.
Discussed with: mckusick, bde
MFC after: 3 weeks
Basic support for extents was implemented by Zheng Liu as part
of his Google Summer of Code in 2010. This support is read-only
at this time.
In addition to extents we also support the huge_file extension
for read-only purposes. This works nicely with the additional
support for birthtime/nanosec timestamps and dir_index that
have been added lately.
The implementation may not work for all ext4 filesystems as
it doesn't support some features that are being enabled by
default on recent linux like flex_bg. Nevertheless, the feature
should be very useful for migration or simple access in
filesystems that have been converted from ext2/3 or don't use
incompatible features.
Special thanks to Zheng Liu for his dedication and continued
work to support ext2 in FreeBSD.
Submitted by: Zheng Liu (lz@)
Reviewed by: Mike Ma, Christoph Mallon (previous version)
Sponsored by: Google Inc.
MFC after: 3 weeks
The htree implementation uses code derived from the
RSA Data Security, Inc. MD4 Message-Digest Algorithm.
Add a proper licensing statement for the code and clarify
the corresponding comments.
Approved by: core (hrs)
as in <sys/dirent.h>
ext2_readdir() has always been very fs specific and different
with respect to its ufs_ counterpart. Recent changes from UFS
have made it possible to share more closely the implementation.
MFUFS r252438:
Always start parsing at DIRBLKSIZ aligned offset, skip first entries if
uio_offset is not DIRBLKSIZ aligned. Return EINVAL if buffer is too
small for single entry.
Preallocate buffer for cookies.
Skip entries with zero inode number.
Reviewed by: gleb, Zheng Liu
MFC after: 1 month
Merge from UFS r231313:
This change first attempts the uiomove() to the newly allocated
(and dirty) buffer and only zeros it if the uiomove() fails. The
effect is to eliminate the gratuitous zeroing of the buffer in
the usual case where the uiomove() successfully fills it.
MFC after: 3 days
This is a port of NetBSD's GSoC 2012 Ext3 HTree directory indexing
by Vyacheslav Matyushin. It was cleaned up and enhanced for FreeBSD
by Zheng Liu (lz@).
This is an excellent example of work shared among different projects:
Vyacheslav was able to look at an early prototype from Zheng Liu who
was also able to check the code from Haiku (with permission).
As in linux, the feature is not available by default and must be
enabled explicitly with tune2fs. We still do not support the
workarounds required in readdir for NFS.
Submitted by: Zheng Liu
Tested by: Mike Ma
Sponsored by: Google Inc.
MFC after: 1 week
r156418:
Don't set IN_CHANGE and IN_UPDATE on inodes for potentially suspended
file systems. This could cause deadlocks when creating snapshots.
(We can't do snapshots on ext2fs but it is useful to keep things in sync).
r183079:
- Only set i_offset in the parent directory's i-node during a lookup for
non-LOOKUP operations.
- Relax a VOP assertion for a DELETE lookup.
r187528:
Move the code from ufs_lookup.c used to do dotdot lookup, into
the helper function. It is supposed to be useful for any filesystem
that has to unlock dvp to walk to the ".." entry in lookup routine.
MFC after: 5 days
In line to what is done in UFS, define an internal type
e2fs_lbn_t for the logical block numbers.
This change is basically a no-op as the new type is unchanged
(int32_t) but it may be useful as bumping this may be required
for ext4fs.
Also, as pointed out by Bruce Evans:
-Use daddr_t for daddr in ext2_bmaparray(). This seems to
improve reliability with the reallocblks option.
- Add a cast to the fsbtodb() macro as in UFS.
Reviewed by: bde
MFC after: 3 days
In the ext2fs driver we have a mixture of headers:
- The ext2_ prefixed headers have strong influence from NetBSD
and are carry specific ext2/3/4 information.
- The unprefixed headers are inspired on UFS and carry implementation
specific information.
Do some small adjustments so that the information is easier to
find coming from either UFS or the NetBSD implementation.
MFC after: 3 days
While the changes in r245820 are in line with the ext2 spec,
the code derived from UFS can use negative values so it is
better to relax some types to keep them as they were, and
somewhat more similar to UFS. While here clean some casts.
Some of the original types are still wrong and will require
more work.
Discussed with: bde
MFC after: 3 days
The superblock in ext2fs defines all the fields as unsigned but for
some reason the in-memory superblock was carrying e2fs_bpg and
e2fs_isize as signed.
We should preserve the specified types for consistency.
MFC after: 5 days
Uncover some, previously reserved, fields that are used by Ext4.
These are currently unused but it is good to have them for future
reference.
Reviewed by: bde
MFC after: 3 days
- Use a shared bufobj lock in getblk() and inmem().
- Convert softdep's lk to rwlock to match the bufobj lock.
- Move INFREECNT to b_flags and protect it with the buf lock.
- Remove unnecessary locking around bremfree() and BKGRDINPROG.
Sponsored by: EMC / Isilon Storage Division
Discussed with: mckusick, kib, mdf
- Don't insert BKGRDMARKER bufs into the splay or dirty/clean buf lists.
No consumers need to find them there and it complicates the tree.
These flags are all FFS specific and could be moved out of the buf
cache.
- Use pbgetvp() and pbrelvp() to associate the background and journal
bufs with the vp. Not only is this much cheaper it makes more sense
for these transient bufs.
- Fix the assertions in pbget* and pbrel*. It's not safe to check list
pointers which were never initialized. Use the BX flags instead. We
also check B_PAGING in reassignbuf() so this should cover all cases.
Discussed with: kib, mckusick, attilio
Sponsored by: EMC / Isilon Storage Division
cluster_write() and cluster_wbuild() functions. The flags to be
allowed are a subset of the GB_* flags for getblk().
Sponsored by: The FreeBSD Foundation
Tested by: pho
e2fs_maxcontig was modelled after UFS when bringing the
"Orlov allocator" to ext2. On UFS fs_maxcontig is kept in the
superblock and is used by userland tools (fsck and growfs),
In ext2 this information is volatile so it is not available
for userland tools, so in this case it doesn't have sense
to carry it in the in-memory superblock.
Also remove a pointless check for MAX(1, x) > 0.
Submitted by: Christoph Mallon
MFC after: 2 weeks
- Remove unused extern declarations in fs.h
- Correct comments in ext2_dir.h
- Several panic() messages showed wrong function names.
- Remove commented out stray line in ext2_alloc.c.
- Remove the unused macro EXT2_BLOCK_SIZE_BITS() and the then
write-only member e2fs_blocksize_bits from struct m_ext2fs.
- Remove the unused macro EXT2_FIRST_INO() and the then write-only
member e2fs_first_inode from struct m_ext2fs.
- Remove EXT2_DESC_PER_BLOCK() and the member e2fs_descpb from
struct m_ext2fs.
- Remove the unused members e2fs_bmask, e2fs_dbpg and
e2fs_mount_opt from struct m_ext2fs
- Correct harmless off-by-one error for fspath in ext2_vfsops.c.
- Remove the unused and broken macros EXT2_ADDR_PER_BLOCK_BITS()
and EXT2_DESC_PER_BLOCK_BITS().
- Remove the !_KERNEL versions of the EXT2_* macros.
Submitted by: Christoph Mallon
MFC after: 2 weeks
The previous change accidentally left the substraction we
were trying to avoid in case that i_blocks could become
negative.
Reported by: bde
MFC after: 4 days
Ext2fs uses unsigned fields in its dinode struct.
FreeBSD can have negative values in some of those
fields and the inode is meant to interact with the
system so we have never respected the unsigned
nature of most of those fields.
Block numbers and the NFS generation number do
not need to be signed so redefine them as
unsigned to better match the on-disk information.
MFC after: 1 week
Testing with fsx has revealed problems and in order to
hunt the bugs properly we need reduce the complexity.
This seems to help but is not a complete solution.
MFC after: 3 days
It was plagued with style errors and the offsets had been lost.
While here took the time to update the fields according to the
latest ext4 documentation.
Reviewed by: bde
MFC after: 3 days
We also try to make better use of the fs flags instead of
trying adapt the code according to the fs structures. In
the case of subsecond timestamps and birthtime we now
check that the feature is explicitly enabled: previously
we only checked that the reserved space was available and
silently wrote them.
This approach is much safer, especially if the filesystem
happens to use embedded inodes or support EAs.
Discussed with: Zheng Liu
MFC after: 5 days
Bring several definitions required for newer ext4 features.
Rename EXT2F_COMPAT_HTREE to EXT2F_COMPAT_DIRHASHINDEX since it
is not being used yet and the new name is more compatible with
NetBSD and Linux.
This change is purely cosmetic and has no effect on the real
code.
Obtained from: NetBSD
MFC after: 3 days
When a file is first being written, the dynamic block reallocation
(implemented by ext2_reallocblks) relocates the file's blocks
so as to cluster them together into a contiguous set of blocks on
the disk.
When the cluster crosses the boundary into the first indirect block,
the first indirect block is initially allocated in a position
immediately following the last direct block. Block reallocation
would usually destroy locality by moving the indirect block out of
the way to keep the data blocks contiguous.
The issue was diagnosed long ago by Bruce Evans on ffs and surfaced
on ext2fs when block reallocaton was ported. This is only a partial
solution based on the similarities with FFS. We still require more
review of the allocation details that vary in ext2fs.
Reported by: bde
MFC after: 1 week
received granular locking) but the comment present in UFS has been
copied all over other filesystems code incorrectly for several times.
Removes comments that makes no sense now.
Reviewed by: kib
MFC after: 3 days
8.x code:
- If the lock cannot be acquired immediately unlocks 'bar' vnode
and then locks both vnodes in order.
- wrong vnode type panics from cache_enter_time after calls by
ext2_lookup.
The fix merges the fixes from ufs/ufs_lookup.c.
Submitted by: Mateusz Guzik
Approved by: jhb@ (mentor)
Reviewed by: kib@
MFC after: 1 week
The primary changes are that the user of the interface no longer
needs to manage the mount-mutex locking and that the vnode that
is returned has its mutex locked (thus avoiding the need to check
to see if its is DOOMED or other possible end of life senarios).
To minimize compatibility issues for third-party developers, the
old MNT_VNODE_FOREACH interface will remain available so that this
change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH
will be removed in head.
The reason for this update is to prepare for the addition of the
MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the
active vnodes associated with a mount point (typically less than
1% of the vnodes associated with the mount point).
Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks
Return EPERM from ext2_setattr() when an user without PRIV_VFS_SYSFLAGS
privilege attempts to toggle SF_SETTABLE flags.
Flags are now stored to ip->i_flags in one place after all checks.
Also, remove SF_NOUNLINK from the checks because ext2fs doesn't support
that flag.
Reviewed by: bde
- Use more natural ip->i_flags instead of vap->va_flags in the final
flags check.
- Style improvements.
No functional change intended.
MFC after: 2 weeks
When using big inodes there is sufficient space in ext3 to
keep extra resolution and birthtime (creation) timestamps.
The appropriate fields in the on-disk inode have been approved
for a long time but support for this in ext3 has not been
widely distributed.
In preparation for ext4 most linux distributions have enabled
by default such bigger inodes and some people use nanosecond
timestamps in ext3. We now support those when the inode is big
enough and while we do recognize the EXT4F_ROCOMPAT_EXTRA_ISIZE,
we maintain the extra timestamps even when they are not used.
An additional note by Bruce Evans:
We blindly accept unrepresentable tv_nsec in VOP_SETATTR(), but
all file systems have always done that. When POSIX gets around
to specifying the behaviour, it will probably require certain
rounding to the fs's resolution and not rejecting the request.
This unfortunately means that syscalls that set times can't
really tell if they succeeded without reading back the times
using stat() or similar and checking that they were set close
enough.
Reviewed by: bde
Approved by: jhb (mentor)
MFC after: 2 weeks
Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.
Discussed with: bde, das (previous versions)
MFC after: 1 month
ext4 but that can be used in ext3 mode.
Also adjust the internal inode to carry the birthtime,
like in UFS, which is starting to get some use when
big inodes are available.
Right now these are just placeholders for features
to come.
Approved by: jhb (mentor)
MFC after: 2 weeks
mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which
is needed to guarantee a synchronous completion of the initiated i/o
before syscall or VOP return. Global removal of MNTK_ASYNC option is
harmful because not only i/o started from corresponding thread becomes
synchronous, but all i/o is synchronous on the filesystem which is
initiated during sync(2) or syncer activity.
Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local
thread flag to disable async i/o for current thread only. Use the
opportunity to move DOINGASYNC() macro into sys/vnode.h and
consistently use it through places which tested for MNTK_ASYNC.
Some testing demonstrated 60-70% improvements in run time for the
metadata-intensive operations on async-mounted UFS volumes, but still
with great deviation due to other reasons.
Reviewed by: mckusick
Tested by: scottl
MFC after: 2 weeks
While there, remove a useless check from the code. memcchr() always
returns characters unequal to 0xff in this case, so inosused[i] ^ 0xff
can never be equal to zero. Also, the fact that memcchr() returns a
pointer instead of the number of bytes until the end, makes conversion
to an offset far more easy.
Fix a comment from the previous commit.
Use M_ZERO instead of bzero() in ext2_vfsops.c
Add include guards from PR.
PR: 162564
Approved by: jhb (mentor)
MFC after: 2 weeks
The feature has been standard for a while in UFS as a means to reduce
fragmentation, therefore maintaining consistent performance with
filesystem aging. This is also very similar to what ext4 calls
"delayed allocation".
In his 2010 GSoC, Zheng Liu ported and benchmarked the missing
FANCY_REALLOC code to find more consistent performance improvements than
with the preallocation approach.
PR: 159233
Author: Zheng Liu <gnehzuil AT SPAMFREE gmail DOT com>
Sponsored by: Google Inc.
Approved by: jhb (mentor)
MFC after: 2 weeks
This removes the obfuscations mentioned in ext2_readwrite and
places the clustering funtion in a location similar to other
UFS-based implementations.
No performance or functional changeses are expected from
this move.
PR: kern/159232
Suggested by: bde
Approved by: jhb (mentor)
MFC after: 2 weeks
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
method, so that callers can indicate the minimum vnode
locking requirement. This will allow some file systems to choose
to return a LK_SHARED locked vnode when LK_SHARED is specified
for the flags argument. This patch only adds the flag. It
does not change any file system to use it and all callers
specify LK_EXCLUSIVE, so file system semantics are not changed.
Reviewed by: kib
- 77115: Implement support for O_DIRECT.
- 98425: Fix a performance issue introduced in 70131 that was causing
reads before writes even when writing full blocks.
- 98658: Rename the BALLOC flags from B_* to BA_* to avoid confusion with
the struct buf B_ flags.
- 100344: Merge the BA_ and IO_ flags so so that they may both be used in
the same flags word. This merger is possible by assigning the IO_ flags
to the low sixteen bits and the BA_ flags the high sixteen bits.
- 105422: Fix a file-rewrite performance case.
- 129545: Implement IO_INVAL in VOP_WRITE() by marking the buffer as
"no cache".
- Readd the DOINGASYNC() macro and use it to control asynchronous writes.
Change i-node updates to honor DOINGASYNC() instead of always being
synchronous.
- Use a PRIV_VFS_RETAINSUGID check instead of checking cr_uid against 0
directly when deciding whether or not to clear suid and sgid bits.
Submitted by: Pedro F. Giffuni giffunip at yahoo