Commit Graph

277 Commits

Author SHA1 Message Date
Pedro F. Giffuni
2e621997eb ext2fs: rearrange ext4_bmapext().
While here assign error a bit later.

Reviewed by:	Damjan Jovanovich
Obtained from:	NetBSD
2016-06-07 18:23:22 +00:00
Pedro F. Giffuni
96e9f46789 ext2fs(5): Cosmetic cleanups, mostly to the ext4 code.
Obtained from:	NetBSD
2016-06-07 17:08:34 +00:00
Pedro F. Giffuni
43ce40e891 ext2fs: cleanup generation number management.
Ext2/3/4 manages generation numbers differently than UFS so adopt
some rules that should work well. When allocating a new inode,
make sure we generate a "good" random value specifically avoiding
zero.

Don't interfere with the numbers that are already generated in
the filesystem: ext2fs doesn't have the backwards compatibility
issues  where there were no generation numbers.

Reviewed by:	kevlo
MFC after:	1 week
2016-06-07 14:37:43 +00:00
Kevin Lo
57d2ac2f90 arc4random() returns 0 to (2**32)−1, use an alternative to initialize
i_gen if it's zero rather than a divide by 2.

With inputs from  delphij, mckusick, rmacklem

Reviewed by:	mckusick
2016-05-22 14:31:20 +00:00
Pedro F. Giffuni
91a25a7d6d fs/ext2fs: spelling fixes on comment.
No functional change.
2016-04-29 20:45:50 +00:00
Pedro F. Giffuni
ee7ae58a45 ext2fs: make use of the howmany() macro when available.
We have a howmany() macro in the <sys/param.h> header that is
convenient to re-use as it makes things easier to read.

MFC after:	2 weeks
2016-04-26 01:41:15 +00:00
Pedro F. Giffuni
4cb92c4cf4 ext2_htree_release(): prevent signed integer overflow in a loop.
h_levels_num, as most data structs in ext2fs, is unsigned so
the index that addresses it has to be unsigned as well.

To get to overflow here we would probably be considering a
degenerate case though.

MFC after:	5 days
2016-04-23 18:28:59 +00:00
Pedro F. Giffuni
d9c9c81c08 sys: use our roundup2/rounddown2() macros when param.h is available.
rounddown2 tends to produce longer lines than the original code
and when the code has a high indentation level it was not really
advantageous to do the replacement.

This tries to strike a balance between readability using the macros
and flexibility of having the expressions, so not everything is
converted.
2016-04-21 19:57:40 +00:00
Pedro F. Giffuni
e45e8680ed ext2fs: replace 0 with NULL for pointers.
While here do late initialization of ebap, similar as was
done in UFS.

Found with devel/coccinelle.

MFC after:	2 weeks
2016-04-11 00:12:24 +00:00
Kevin Lo
2b3506d919 Fix comment. 2016-04-08 04:29:05 +00:00
Edward Tomasz Napierala
ae34b6ff96 Add four new RCTL resources - readbps, readiops, writebps and writeiops,
for limiting disk (actually filesystem) IO.

Note that in some cases these limits are not quite precise. It's ok,
as long as it's within some reasonable bounds.

Testing - and review of the code, in particular the VFS and VM parts - is
very welcome.

MFC after:	1 month
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D5080
2016-04-07 04:23:25 +00:00
Kevin Lo
df04a188af Update comment: Linux does set a randomized generation number of an inode
on ext2/3/4.

While here use arc4random() instead of random().

Reviewed by:	pfg
MFC after:	3 days
2016-04-01 03:21:01 +00:00
Kevin Lo
98a768596e Update superblock and inode structs for ext4.
Reviewed by:	pfg
2016-03-28 07:44:55 +00:00
Pedro F. Giffuni
308c3c240f Ext2: cleanup setting of ctime/mtime/birthtime.
This adopts the same change as r291936 for UFS.
Directly clear IN_ACCESS or IN_UPDATE when user supplied the time, and
copy the value into the inode.

This keeps the behaviour cleaner and is consistent with UFS.

Reviewed by:	bde
MFC after:	1 month (only 10)
2016-02-19 15:53:08 +00:00
Pedro F. Giffuni
00e24e4173 ext2fs: Remove panics for rename() race conditions.
Sync with r84642 from UFS:

The panics are inappropriate because the IN_RENAME flag only fixes a
few of the huge number of race conditions that can result in the
source path becoming invalid even prior to the VOP_RENAME() call.

Found accidentally while checking an issue from PVS Static Analysis.

MFC after:	3 days
2016-02-14 19:52:50 +00:00
Pedro F. Giffuni
a633908d21 Ext4: Use boolean type instead of '0' and '1'
There are precedents of uses of bool in the kernel and
it is incorrect style to use integers as replacement for
a boolean type.
2016-02-11 15:27:14 +00:00
Pedro F. Giffuni
78f6ea5440 Ext4: fix handling of files with sparse blocks before extent's index.
This is ongoing work from Damjan Jovanovic to improve ext4 read support
with sparse files:

Keep track of the first and last block in each extent as it descends down
the extent tree, thus being able to work out that some blocks are sparse
earlier. This solves an issue on r293680.

In ext4_bmapext() start supporting the runb parameter, which appears to be
the number of adjacent blocks prior to the block being converted in the
same way that runp is the number of blocks after, speding up random access
to mmaped files.

PR:	206652
2016-02-11 00:34:11 +00:00
Pedro F. Giffuni
817dd2573e Revert r294695:
ext2fs: passthrough any extra timestamps to the dinode struct.

While it passed the classic testing, the change appears to have
caused some regression and still requires some more precautions.

PR:		206820
MFC after:	3 days
2016-02-03 14:31:23 +00:00
Pedro F. Giffuni
afbad87898 ext2fs: passthrough any extra timestamps to the dinode struct.
In general we don't trust any of the extended timestamps unless the
EXT2F_ROCOMPAT_EXTRA_ISIZE feature is set. However, in the case where
we freshly allocated a new inode the information is valid and it is
better to pass it along instead of leaving the value undefined.

This should have no practical effect but should reduce the amount of
garbage if EXT2F_ROCOMPAT_EXTRA_ISIZE is set, like in cases where the
filesystem is converted from ext3 to ext4.

MFC after:	4 days
2016-01-24 23:24:47 +00:00
Pedro F. Giffuni
386b134364 ext2: rename some directory index constants.
Missed from r294653.

Pointyhat:	me
2016-01-24 04:30:30 +00:00
Pedro F. Giffuni
c22ff471b4 Fix comment. 2016-01-24 02:44:00 +00:00
Pedro F. Giffuni
9b58c8019f Rename some directory index constants.
Directory index was introduced in ext3. We don't always use the
prefix to denote the ext2 variant they belong to but when we
do we should try to be accurate.
2016-01-24 02:41:49 +00:00
Pedro F. Giffuni
e08ad8f068 ext2: Initialize i_flag after allocation.
We use i_flag to carry some flags like IN_E4INDEX which newer
ext2fs variants uses internally.

fsck.ext3 rightfully complains after our implementation tags
non-directory inodes with INDEX_FL.

Initializing i_flag during allocation removes the noise factor
and quiets down fsck.

Patch from:	Damjan Jovanovic
PR:		206530
2016-01-24 02:25:41 +00:00
Pedro F. Giffuni
9824e4adbe ext2fs: Bring back the htree dir_index implementation.
The htree dir_index is perhaps one of the most characteristic
features of the linux ext3 implementation. It was removed
in r281670, due to repeated bug reports.

Damjan Jovanic detected and fixed three bugs and did some
stress testing by building Apache OpenOffice on top of it
so it is now in good shape to bring back.

Differential Revision:	https://reviews.freebsd.org/D5007

Submitted by:	Damjan Jovanovic
Reviewed by:	pfg
Tested by:	pho
Relnotes:	Yes
MFC after:	2 months (only 10.x)
2016-01-21 14:50:28 +00:00
Pedro F. Giffuni
daf884fa9f ext4: mount panic from freeing invalid pointers
Initialize the struct with those fields to zeroes on allocation,
preventing the panic.

Patch by:	Damjan Jovanovic.

PR:		206056
MFC after:	3 days
2016-01-11 19:25:43 +00:00
Pedro F. Giffuni
e813d9d7fa ext4: add support for reading sparse files
Add support for sparse files in ext4. Also implement read-ahead, which
greatly increases the performance when transferring files from ext4.

Both features implemented by Damjan Jovanovic.

PR:		205816
MFC after:	1 week
2016-01-11 19:14:55 +00:00
Pedro F. Giffuni
7135ca50c1 ext2fs: reading mmaped file in Ext4 causes panic
Always call brelse(path.ep_bp), fixing reading EXT4 files using mmap().

Patch by Damjan Jovanovic.

PR:		205938
MFC after:	1 week
2016-01-07 21:43:43 +00:00
Pedro F. Giffuni
26069aec57 ext2: recognize ext4 INCOMPAT_RECOVER flag
This is a flag specific for journalling in ext4.
Add it to the list of ext4 features we ignore for
read-only purposes.

PR:		205668
MFC after:	1 week
2015-12-29 15:51:52 +00:00
Jeff Roberson
7d07bfd8a3 - Remove some dead code copied from ffs. 2015-07-29 03:06:08 +00:00
Pedro F. Giffuni
e54a659a26 Provide VOP_GETPAGES_ASYNC() for extfs.
Merge the filesystem specific part from r274914 to ext2fs.

I only did regular testing with the change but UFS and our ext2fs
are similar enough that the code should just work with the new
sendfile.

Discussed with:	glebius
2015-05-28 21:06:59 +00:00
Pedro F. Giffuni
f738ee4825 Drop experimental dir_index support.
The htree directory index is a highly desirable feature for research
purposes and was meant to improve performance in our ext2/3 driver.
Unfortunately our implementation has two problems:

- It never really delivered any performance improvement.
- It appears to corrupt the filesystem in undetermined circumstances.

Strictly speaking dir_index is not required for read/write support in
ext2/3 and our limited ext4 support still works fine without it.

Regain stability in the ext2 driver by removing it. We may need it back
(fixed) if we want to support encrypted ext4 support but thanks to the
wonders of version control we can always revert this change and bring it
back.

PR:	191895
PR:	198731
PR:	199309

MFC after:	5 days
2015-04-17 22:26:01 +00:00
Rick Macklem
dda11d4ab9 File systems that do not use the buffer cache (such as ZFS) must
use VOP_FSYNC() to perform the NFS server's Commit operation.
This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which
is set by file systems that use the buffer cache. If this flag
is not set, the NFS server always does a VOP_FSYNC().
This should be ok for old file system modules that do not set
MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although
it might not be optimal for file systems that use the buffer cache.

Reviewed by:	kib
MFC after:	2 weeks
2015-04-15 20:16:31 +00:00
Pedro F. Giffuni
e5c356b2a2 ext2fs: Plug small memory leak
free() e2fs_contigdirs upon error.
Undo zeroing of e2fs_gd as this was actually a false positive.

X-MFC with:	278790
2015-02-15 14:25:00 +00:00
Pedro F. Giffuni
6be4cf2244 Reuse value of cursize instead of recalculating.
Reported by:	Clang static checker
MFC after:	1 week
2015-02-15 01:34:00 +00:00
Pedro F. Giffuni
f3ee91ed2b Initialize the allocation of variables related to the ext2 allocator.
The e2fs_gd struct was not being initialized and garbage was
being used for hinting the ext2 allocator variant.
Use malloc to clear the values and also initialize e2fs_contigdirs
during allocation to keep consistency.

While here clean up small style issues.

Reported by:	Clang static analyser
MFC after:	1 week
2015-02-15 01:12:15 +00:00
Enji Cooper
ae266f893f Fix the build when INVARIANTS is defined by restoring bo's definition in
ext2_truncate(..) and by putting it under INVARIANTS ifdefs

X-MFC with: r277354
MFC after: 2 weeks
2015-01-19 07:10:08 +00:00
Pedro F. Giffuni
9a53618ab2 ext2: Garbage-collect some unused variables
Reported by:	clang static analysis
MFC after:	2 weeks
2015-01-19 03:30:45 +00:00
Pedro F. Giffuni
7075482d4a ext2: fix for uninitialized pointer read.
path.ep_bp was being used uninitialized in ext4_ext_find_extent().

CID:		1062344
MFC after:	1 week
2015-01-18 21:18:28 +00:00
Pedro F. Giffuni
955ba37baa Remove dead code.
After the ext2 variant of the "orlov allocator" was implemented,
the case for a negative or zero dirsize disappeared.

Drop the dead code and unsign dirsize given that it can't be
negative anyways.

CID:		1008669
MFC after:	1 week
2015-01-18 20:26:27 +00:00
Pedro F. Giffuni
84b170d298 ext2: cosmetical issues
Minor sorting and note when the cases are expected to fall through.

MFC after:	1 week
2015-01-17 15:19:18 +00:00
Konstantin Belousov
789bdfdbc6 Handle MAKEENTRY cnp flag in the VOP_CREATE(). Curiously, some
fs, e.g. smbfs, already did it.

Tested by:	pho (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-12-21 13:29:33 +00:00
Konstantin Belousov
6c21f6edb8 The VOP_LOOKUP() implementations for CREATE op do not put the name
into namecache, to avoid cache trashing when doing large operations.
E.g., tar archive extraction is not usually followed by access to many
of the files created.

Right now, each VOP_LOOKUP() implementation explicitely knowns about
this quirk and tests for both MAKEENTRY flag presence and op != CREATE
to make the call to cache_enter().  Centralize the handling of the
quirk into VFS, by deciding to cache only by MAKEENTRY flag in VOP.
VFS now sets NOCACHE flag for CREATE namei() calls.

Note that the change in semantic is backward-compatible and could be
merged to the stable branch, and is compatible with non-changed
third-party filesystems which correctly handle MAKEENTRY.

Suggested by:	Chris Torek <torek@pi-coral.com>
Reviewed by:	mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-12-18 10:01:12 +00:00
Gleb Kurtsou
dde58752db Adjust printf format specifiers for dev_t and ino_t in kernel.
ino_t and dev_t are about to become uint64_t.

Reviewed by:	kib, mckusick
2014-12-17 07:27:19 +00:00
Pedro F. Giffuni
8f87059b41 ext2fs: Fix old out-of-bounds access.
Overrunning buffer pointed to by (caddr_t)&oip->i_db[0] of 48 bytes by
passing it to a function which accesses it at byte offset 59 using
argument 60UL.

The issue was inherited from an older FFS implementation and
fixed there with by merging UFS2 in r98542. We follow the
FFS fix.

Discussed with:	bde
CID:		1007665
MFC after:	3 days
2014-12-09 14:56:00 +00:00
Pedro F. Giffuni
33587684a6 ifdef ext2_print_inode which is not really used.
ext2_print_inode is not really used but it was nice to
have for initial development work. #ifdef it under a
new EXT2FS_DEBUG knob so that we don't spend time
compiling it.

MFC after:	3 days
2014-11-12 16:23:56 +00:00
Konstantin Belousov
df5c9c0411 Do not set IN_ACCESS flag for read-only mounts. The IN_ACCESS
survives remount in rw, also it is set for vnodes on rootfs before
noatime can be set or clock is adjusted.  All conditions result in
wrong atime for accessed vnodes.

Submitted by:	bde
MFC after:	1 week
2014-10-11 19:09:56 +00:00
Konstantin Belousov
d15b55c554 Provide the unique implementation for the VOP_GETPAGES() method used
by ffs and ext2fs.  Remove duplicated call to vm_page_zero_invalid(),
done by VOP and by vm_pager_getpages().  Use vm_pager_free_nonreq().

Reviewed by:	alc (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	6 weeks (after r271596)
2014-09-15 12:28:29 +00:00
Alan Cox
3e5c84e292 We don't need an exclusive object lock on the expected execution path
through {ext2,ffs}_getpages().

Reviewed by:	kib, pfg
MFC after:	6 weeks
Sponsored by:	EMC / Isilon Storage Division
2014-09-13 18:26:13 +00:00
Pedro F. Giffuni
6e582ca2a5 Extra space from r271467.
MFC after:	2 months
2014-09-12 15:54:18 +00:00
Pedro F. Giffuni
1d0fce9bfe ext2fs: add ext2_getpages().
Literally copy/pasted from ffs_getpages().

Tested with:	fsx
MFC after:	2 months
2014-09-12 15:49:21 +00:00
Pedro F. Giffuni
7c1be3f5c7 Revert r269523:
Providing a higher EXT2_LINK_MAX limit is a bad idea for ext2/3.

Discussed with:	bde
2014-08-05 01:25:14 +00:00
Pedro F. Giffuni
55806cca37 set EXT2_LINK_MAX to LINK_MAX
In linux EXT4_LINK_MAX is now 64000.  We can't really do that
since i_nlink and va_nlink are signed so setting higher values
is likely to cause trouble.

This is a system limitation so set the EXT_LINK_MAX to
what the system can handle.

MFC after:	3 days
2014-08-04 16:41:06 +00:00
Konstantin Belousov
65589a29f4 Check for the cross-device cross-link attempt in the VFS, instead of
forcing filesystem VOP_LINK() methods to repeat the code.  In
tmpfs_link(), remove redundand check for the type of the source,
already done by VFS.

Note that NFS server already performs this check before calling
VOP_LINK().

Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-07-16 14:04:46 +00:00
Pedro F. Giffuni
ca73017a2d Revert r263449;
ext2fs: minor update to the dirpref policy.

The change in UFS r254996, reverted the change as the
older code seems to work better. This was not visible
in local testing but we can trust UFS is vastly more
exercised in diferent environments.
2014-03-21 04:33:38 +00:00
Pedro F. Giffuni
e23c349230 ext2fs: minor update to the dirpref policy.
Bring in a minor change to the dirpref policy based on r248623.

This is pretty minimal change to keep the implementation in
sync with UFS but other parts from the original change are not
directly applicable so don't expect improvements in fsck times.

MFC after:	2 weeks
2014-03-20 21:19:13 +00:00
Pedro F. Giffuni
157b40af8e ext2fs: Fix a bug when sorting htree entries.
This a typo introduced when bringing the original code from NetBSD.

Reported by:	Mike Ma
MFC after:	3 days
2014-03-06 21:02:16 +00:00
Pedro F. Giffuni
c3b76e1345 ext2fs: small formatting fixes.
Remove some redundant spaces.
No functional change.

MFC after:	3 days
2014-03-01 21:22:20 +00:00
Pedro F. Giffuni
3a54024da4 ext2fs: use of tab vs spaces.
Consistently use a single tab after a #define as mentioned in style(9).
Use tabs instead of space for indenting.
Fix a typo: "hash_vesion".

No functional change.

MFC after:	3 days
2014-02-28 21:25:32 +00:00
Pedro F. Giffuni
67da48a15b ext2fs: fully enable ext4 read-only support.
The ext4 developers tend to tag Ext4-specific flags as
"incompatible" even when such features are not relevant for
read-only support.  This is a consequence of the process
though which this filesystem is implemented without design
and the fact that some new features are not extensible to
ext2/3.

Organize the features according to what we support and sort
them so that we can now read-only mount filesystems with
some features that may be found in newly formatted ext4 fs.

Submitted by:	Zheng Liu
Reviewed by:	pfg
MFC after:	5 days
2014-02-22 22:07:16 +00:00
Pedro F. Giffuni
ad3d96a730 ext2fs: Use i_flag instead of i_flags for Ext4 inode flags.
The ext4 inode flags do not have equivalents for chflags (1)
and hold information that is private to the implementation.
The i_flag field in the inode is a better place to hold the Ext4
inode flags as it saves us from masking flags while setting or
getting attributes.  It should also make things cleaner if we
implement write support for Ext4.

Suggested by:	bde
Tested by:	Mike Ma
MFC after:	3 days
2014-01-28 14:39:05 +00:00
Pedro F. Giffuni
99984d229c ext2fs: Re-enable reallocblk.
The major corruption issues affecting this code have been fixed
a while ago.

MFC after:	1 week
2014-01-24 20:26:00 +00:00
Pedro F. Giffuni
1093104cf7 ext2fs: fix a bug in dirindex and re-enable.
The IN_* flags should be set in i_flag instead of corrupting
i_flags [1].

Re-enable HTree dirindex as the last series of bug fixes
seems to have fixed the issues.

Reported by:	bde [1]
Tested by:	kevlo
MFC after:	1 week
2014-01-24 13:51:38 +00:00
Pedro F. Giffuni
b7bbf8b9f3 ext2fs: fix logic error in the previous change.
Use the bitwise negation instead of bogus boolean negation and move
the flag manipulation with the assignment.
Fix some grammatical errors introduced in the same change.

Reported by:	bde
MFC after:	3 days
2014-01-22 19:09:41 +00:00
Pedro F. Giffuni
a7710d51c4 ext2fs: Translate the EXT4_EXTENTS and EXT4_INDEX to the inode flags.
r260545 cleared the inode flags to fix corruption problems but
we still need to pass some EXT4 flags for the ext4 read-only
mode.  None of these attributes has an equivalent in FreeBSD and
are uninteresting for the system utilities so they should be
innaccessible in ext2_getattrib().

Note: we also use EXT4_HUGE_FILE but we use it directly from the
dinode structure so it is not necessary to translate it,

Suggested by:	bde
MFC after:	3 days
2014-01-21 19:06:29 +00:00
Pedro F. Giffuni
c2e2b77b19 ext2fs: fix inode flag conversion.
After r252890 we are naively attempting to pass through the
inode flags.  This is technically incorrect as the ext2
inode flags don't match the UFS/system values used in
FreeBSD and a clean conversion is needed.

Some filtering was left in place so the change didn't cause
significant changes in FreeBSD but some of the garbage passed
is likely to be the cause for warning messages in linux.

Fix the issue by resetting the flags before conversion as was
done previously. This also means we will not pass the EXT4_*
inode flags into FreeBSD's inode.

PR:		kern/185448
MFC after:	3 days
2014-01-11 15:19:04 +00:00
Pedro F. Giffuni
b41f53c43b ext2fs: make the hashing algorithm match the linux code.
There appears to be a hash function compatibility issue.
The code is currently disabled but fix it nevertheless.

PR:		kern/183230
MFC after:	3 days
2013-12-23 19:47:34 +00:00
Pedro F. Giffuni
244f00cc0d ext2fs: add two new reserved inodes.
According to online documentation [1], Ext4 has two new "special"
inodes so add the new exclude and replica inodes.

Reference:
[1] https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout

Reported by:	Mike Ma
MFC after:	3 weeks
2013-12-04 02:27:52 +00:00
Pedro F. Giffuni
4b367145f7 UFS2: make di_extsize unsigned.
di_extsize is the EA size and as such it should be unsigned.
Adjust related types for consistency.

Reviewed by:	mckusick (previous version)
MFC after:	3 weeks
2013-10-24 00:33:29 +00:00
Pedro F. Giffuni
1f7c9f2bc8 ext2fs: temporarily disable htree directory index.
Our code does not consider yet the case of hash collisions. This
is a rather annoying situation where two or more files that
happen to have the same hash value will not appear accessible.

The situation is not difficult to work-around but given that things
will just work without enabling htree we will save possible
embarrassments for the next release.

Reported by:	Kevin Lo
2013-09-07 02:45:51 +00:00
Pedro F. Giffuni
4a62545173 ext2fs: update format specifiers for ext4 type.
Previous bandaid was not appropriate and didn't really work for
all platforms. While here, cleanup the surrounding code to match
ffs_checkoverlap()

Reported by:	dim, jmallet and bde
MFC after:	3 weeks
2013-08-14 14:22:46 +00:00
Pedro F. Giffuni
88ae190ea0 ext2fs: update format specifiers for ext4 type.
Reported by:	Sam Fourman Jr.
MFC after:	3 weeks
2013-08-13 18:39:36 +00:00
Pedro F. Giffuni
70097aac13 Define ext2fs local types and use them.
Add definitions for e2fs_daddr_t, e4fs_daddr_t in addition
to the already existing e2fs_lbn_t and adjust them for ext4.
Other than making the code more readable these changes should
fix problems related to big filesystems.

Setting the proper types can be tricky so the process was
helped by looking at UFS. In our implementation, logical block
numbers can be negative and the code depends on it. In ext2,
block numbers are unsigned so it is convenient to keep
e2fs_daddr_t unsigned and use the complete 32 bits. In the
case of e4fs_daddr_t, while the value should be unsigned, for
ext4 we only need to support 48 bits so preserving an extra
bit from the sign is not an issue.

While here also drop the ext2_setblock() prototype that was
never used.

Discussed with:	mckusick, bde
MFC after:	3 weeks
2013-08-13 15:40:43 +00:00
Pedro F. Giffuni
d7511a40a7 Add read-only support for extents in ext2fs.
Basic support for extents was implemented by Zheng Liu as part
of his Google Summer of Code in 2010. This support is read-only
at this time.

In addition to extents we also support the huge_file extension
for read-only purposes. This works nicely with the additional
support for birthtime/nanosec timestamps and dir_index that
have been added lately.

The implementation may not work for all ext4 filesystems as
it doesn't support some features that are being enabled by
default on recent linux like flex_bg. Nevertheless, the feature
should be very useful for migration or simple access in
filesystems that have been converted from ext2/3 or don't use
incompatible features.

Special thanks to Zheng Liu for his dedication and continued
work to support ext2 in FreeBSD.

Submitted by:	Zheng Liu (lz@)
Reviewed by:	Mike Ma, Christoph Mallon (previous version)
Sponsored by:	Google Inc.
MFC after:	3 weeks
2013-08-12 21:34:48 +00:00
Pedro F. Giffuni
95f1f8d262 Small typo.
MFC after:	3 days
2013-08-08 22:07:59 +00:00
Pedro F. Giffuni
d192e40f77 Add license for the half MD4 algorithm used in ext2_half_md4().
The htree implementation uses code derived from the
RSA Data Security, Inc. MD4 Message-Digest Algorithm.

Add a proper licensing statement for the code and clarify
the corresponding comments.

Approved by:	core (hrs)
2013-08-01 16:04:48 +00:00
Pedro F. Giffuni
9670f48107 ext2fs: Return EINVAL for negative uio_offset as in UFS.
While here drop old comment that doesn't really apply.

MFC after:	1 month
Discussed with:	gleb
2013-07-25 19:37:49 +00:00
Pedro F. Giffuni
0b54fe540c ext2fs: Drop a check that wan't supposed to be in r253651.
MFC after:	1 month
2013-07-25 16:04:55 +00:00
Pedro F. Giffuni
78d912bbc3 ext2fs: Don't assume that on-disk format of a directory is the same
as in <sys/dirent.h>

ext2_readdir() has always been very fs specific and different
with respect to its ufs_ counterpart. Recent changes from UFS
have made it possible to share more closely the implementation.

MFUFS r252438:
Always start parsing at DIRBLKSIZ aligned offset, skip first entries if
uio_offset is not DIRBLKSIZ aligned. Return EINVAL if buffer is too
small for single entry.

Preallocate buffer for cookies.

Skip entries with zero inode number.

Reviewed by:	gleb, Zheng Liu
MFC after:	1 month
2013-07-25 15:34:20 +00:00
Pedro F. Giffuni
c5249f35b8 Implement 1003.1-2001 pathconf() keys.
This is based on r106058 in UFS.

MFC after:	1 month
2013-07-10 22:03:01 +00:00
Pedro F. Giffuni
db20714a87 Reinstate the assertion from r253045.
UFS r232732 reverted the change as the real problem was to be fixed
at the syscall level.

Reported by:	bde
2013-07-09 14:23:00 +00:00
Pedro F. Giffuni
bf3c9330ba Enhancement when writing an entire block of a file.
Merge from UFS r231313:

This change first attempts the uiomove() to the newly allocated
(and dirty) buffer and only zeros it if the uiomove() fails. The
effect is to eliminate the gratuitous zeroing of the buffer in
the usual case where the uiomove() successfully fills it.

MFC after:	3 days
2013-07-09 01:31:04 +00:00
Pedro F. Giffuni
7ce75e5f1f Avoid a panic and return EINVAL instead.
Merge from UFS r232692:
syscall() fuzzing can trigger this panic.

MFC after:	3 days
2013-07-08 20:21:36 +00:00
Pedro F. Giffuni
bdf1d79884 Implement SEEK_HOLE/SEEK_DATA for ext2fs.
Merged from r236044 on UFS.

MFC after:	3 days
2013-07-07 15:51:28 +00:00
Pedro F. Giffuni
d66aed2e76 Fix some typos.
MFC after:	1 week
2013-07-07 01:32:52 +00:00
Pedro F. Giffuni
91f5a4670f Initial implementation of the HTree directory index.
This is a port of NetBSD's GSoC 2012 Ext3 HTree directory indexing
by Vyacheslav Matyushin.  It was cleaned up and enhanced for FreeBSD
by Zheng Liu (lz@).

This is an excellent example of work shared among different projects:
Vyacheslav was able to look at an early prototype from Zheng Liu who
was also able to check the code from Haiku (with permission).

As in linux, the feature is not available by default and must be
enabled explicitly with tune2fs. We still do not support the
workarounds required in readdir for NFS.

Submitted by:	Zheng Liu
Tested by:	Mike Ma
Sponsored by:	Google Inc.
MFC after:	1 week
2013-07-06 18:28:06 +00:00
Pedro F. Giffuni
e2bc2ccec0 ext2fs: Use the complete random() range in i_gen.
i_gen is unsigned in ext2fs so we can handle the complete
32 bits.

MFC after:	1 week
2013-06-30 00:42:51 +00:00
Pedro F. Giffuni
d849f17dca Bring some updates from ufs_lookup to ext2fs.
r156418:

Don't set IN_CHANGE and IN_UPDATE on inodes for potentially suspended
file systems.  This could cause deadlocks when creating snapshots.
(We can't do snapshots on ext2fs but it is useful to keep things in sync).

r183079:

- Only set i_offset in the parent directory's i-node during a lookup for
  non-LOOKUP operations.
- Relax a VOP assertion for a DELETE lookup.

r187528:

Move the code from ufs_lookup.c used to do dotdot lookup, into
the helper function. It is supposed to be useful for any filesystem
that has to unlock dvp to walk to the ".." entry in lookup routine.

MFC after:	5 days
2013-06-29 01:35:28 +00:00
Pedro F. Giffuni
fafb835a0b Minor sorting.
MFC after:	3 days
2013-06-26 19:43:22 +00:00
Pedro F. Giffuni
da057ed2d3 Define and use e2fs_lbn_t in ext2fs.
In line to what is done in UFS, define an internal type
e2fs_lbn_t for the logical block numbers.

This change is basically a no-op as the new type is unchanged
(int32_t) but it may be useful as bumping this may be required
for ext4fs.

Also, as pointed out by Bruce Evans:

-Use daddr_t for daddr in ext2_bmaparray(). This seems to
improve reliability with the reallocblks option.
- Add a cast to the fsbtodb() macro as in UFS.

Reviewed by:	bde
MFC after:	3 days
2013-06-23 02:44:42 +00:00
Pedro F. Giffuni
3f5747b69d Rename some prefixes in the Block Group Descriptor fields to ext4bgd_
Change prefix to avoid confusion and denote that these fields
are generally only available starting with ext4.

MFC after:	3 days
2013-06-20 00:00:33 +00:00
Pedro F. Giffuni
9e43acf6c0 More ext2fs header cleanups:
- Set MAXMNTLEN nearer to where it is used.
- Move EXT2_LINK_MAX to ext2_dir.h .

MFC after:	3 days
2013-06-18 15:49:30 +00:00
Pedro F. Giffuni
ebf0f88839 Rename remaining DIAGNOSTIC to INVARIANTS.
MFC after:	3 days
2013-06-17 00:39:23 +00:00
Pedro F. Giffuni
b6113fb31a Re-sort ext2fs headers to make things easier to find.
In the ext2fs driver we have a mixture of headers:

- The ext2_ prefixed headers have strong influence from NetBSD
and are carry specific ext2/3/4 information.
- The unprefixed headers are inspired on UFS and carry implementation
specific information.

Do some small adjustments so that the information is easier to
find coming from either UFS or the NetBSD implementation.

MFC after:	3 days
2013-06-16 16:10:45 +00:00
Pedro F. Giffuni
f744956b4a Relax some unnecessary unsigned type changes in ext2fs.
While the changes in r245820 are in line with the ext2 spec,
the code derived from UFS can use negative values so it is
better to relax some types to keep them as they were, and
somewhat more similar to UFS. While here clean some casts.

Some of the original types are still wrong and will require
more work.

Discussed with:	bde
MFC after:	3 days
2013-06-13 03:23:24 +00:00
Pedro F. Giffuni
77b193c249 Turn DIAGNOSTICs to INVARIANTS in ext2fs.
This is done to be consistent with what other filesystems and
particularly ffs already does (see r173464).

MFC after:	5 days
2013-06-12 15:24:48 +00:00
Pedro F. Giffuni
abe38ac774 s/file system/filesystem/g
Based on r96755 from UFS.

MFC after:	3 days
2013-06-11 02:47:07 +00:00
Pedro F. Giffuni
f7d4b4d3d1 e2fs_bpg and e2fs_isize are always unsigned.
The superblock in ext2fs defines all the fields as unsigned but for
some reason the in-memory superblock was carrying e2fs_bpg and
e2fs_isize as signed.

We should preserve the specified types for consistency.

MFC after:	5 days
2013-06-09 01:38:51 +00:00
Pedro F. Giffuni
532ebe1313 ext2fs: space vs tab.
Obtained from:	Christoph Mallon
MFC after:	3 days
2013-06-03 20:33:05 +00:00
Pedro F. Giffuni
fc3ea958b2 ext2fs: Small cosmetic fixes.
Make a long macro readable and sort a header.

Obtained from:	Christoph Mallon
MFC after:	3 days
2013-06-03 20:02:45 +00:00
Pedro F. Giffuni
4f69a09308 ext2fs: Update Block Group Descriptor struct.
Uncover some, previously reserved, fields that are used by Ext4.
These are currently unused but it is good to have them for future
reference.

Reviewed by:	bde
MFC after:	3 days
2013-06-03 18:52:14 +00:00
Jeff Roberson
22a722605d - Convert the bufobj lock to rwlock.
- Use a shared bufobj lock in getblk() and inmem().
 - Convert softdep's lk to rwlock to match the bufobj lock.
 - Move INFREECNT to b_flags and protect it with the buf lock.
 - Remove unnecessary locking around bremfree() and BKGRDINPROG.

Sponsored by:	EMC / Isilon Storage Division
Discussed with:	mckusick, kib, mdf
2013-05-31 00:43:41 +00:00
Jeff Roberson
26089666b6 Prepare to replace the buf splay with a trie:
- Don't insert BKGRDMARKER bufs into the splay or dirty/clean buf lists.
   No consumers need to find them there and it complicates the tree.
   These flags are all FFS specific and could be moved out of the buf
   cache.
 - Use pbgetvp() and pbrelvp() to associate the background and journal
   bufs with the vp.  Not only is this much cheaper it makes more sense
   for these transient bufs.
 - Fix the assertions in pbget* and pbrel*.  It's not safe to check list
   pointers which were never initialized.  Use the BX flags instead.  We
   also check B_PAGING in reassignbuf() so this should cover all cases.

Discussed with:	kib, mckusick, attilio
Sponsored by:	EMC / Isilon Storage Division
2013-04-06 22:21:23 +00:00
Konstantin Belousov
c535690b33 Add currently unused flag argument to the cluster_read(),
cluster_write() and cluster_wbuild() functions.  The flags to be
allowed are a subset of the GB_* flags for getblk().

Sponsored by:	The FreeBSD Foundation
Tested by:	pho
2013-03-14 20:28:26 +00:00
Pedro F. Giffuni
a9d1b29995 ext2fs: Use prototype declarations for function definitions
Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-10 19:49:37 +00:00
Pedro F. Giffuni
4e1e0e2582 ext2fs: Replace redundant EXT2_MIN_BLOCK with EXT2_MIN_BLOCK_SIZE.
Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-08 21:09:44 +00:00
Pedro F. Giffuni
1a125d6d85 ext2fs: make e2fs_maxcontig local and remove tautological check.
e2fs_maxcontig was modelled after UFS when bringing the
"Orlov allocator" to ext2. On UFS fs_maxcontig is kept in the
superblock and is used by userland tools (fsck and growfs),

In ext2 this information is volatile so it is not available
for userland tools, so in this case it doesn't have sense
to carry it in the in-memory superblock.

Also remove a pointless check for MAX(1, x) > 0.

Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-08 20:58:00 +00:00
Pedro F. Giffuni
a940ce65cd Remove unused MAXSYMLINKLEN macro.
Reviewed by:	mckusick
PR:		kern/175794
MFC after:	1 week
2013-02-08 20:30:19 +00:00
Pedro F. Giffuni
d427334435 ext2fs: move assignment where it is not dead.
Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-05 03:26:34 +00:00
Pedro F. Giffuni
80b6a61199 ext2fs: Remove unused em_e2fsb definition..
Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-05 03:23:56 +00:00
Pedro F. Giffuni
555368dcf1 ext2fs: Remove useless rootino local variable.
Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-05 03:17:41 +00:00
Pedro F. Giffuni
ef024b0da9 ext2fs: Correct off-by-one errors in FFTODT() and DDTOFT().
Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-05 03:13:05 +00:00
Pedro F. Giffuni
666116e4a3 ext2fs: Use nitems().
Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-05 03:08:56 +00:00
Pedro F. Giffuni
fdc100e4c2 ext2fs: Use EXT2_LINK_MAX instead of LINK_MAX
Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-05 03:01:04 +00:00
Pedro F. Giffuni
757224cbdb ext2fs: general cleanup.
- Remove unused extern declarations in fs.h
- Correct comments in ext2_dir.h
- Several panic() messages showed wrong function names.
- Remove commented out stray line in ext2_alloc.c.
- Remove the unused macro EXT2_BLOCK_SIZE_BITS() and the then
  write-only member e2fs_blocksize_bits from struct m_ext2fs.
- Remove the unused macro EXT2_FIRST_INO() and the then write-only
  member e2fs_first_inode from struct m_ext2fs.
- Remove EXT2_DESC_PER_BLOCK() and the member e2fs_descpb from
  struct m_ext2fs.
- Remove the unused members e2fs_bmask, e2fs_dbpg and
  e2fs_mount_opt from struct m_ext2fs
- Correct harmless off-by-one error for fspath in ext2_vfsops.c.
- Remove the unused and broken macros EXT2_ADDR_PER_BLOCK_BITS()
  and EXT2_DESC_PER_BLOCK_BITS().
- Remove the !_KERNEL versions of the EXT2_* macros.

Submitted by:	Christoph Mallon
MFC after:	2 weeks
2013-02-02 22:23:45 +00:00
Pedro F. Giffuni
646a7fea0c Clean some 'svn:executable' properties in the tree.
Submitted by:	Christoph Mallon
MFC after:	3 days
2013-01-26 22:08:21 +00:00
Pedro F. Giffuni
879aeda7b6 Cosmetical off-by-one
Technically, the case when all the blocks are released
is not a sanity check.
Move further the comment while here.

Suggested by:	bde
MFC after:	3 days
2013-01-26 21:50:52 +00:00
Pedro F. Giffuni
69017f8d8c ext2fs: fix a check for negative block numbers.
The previous change accidentally left the substraction we
were trying to avoid in case that i_blocks could become
negative.

Reported by:	bde
MFC after:	4 days
2013-01-23 14:29:29 +00:00
Pedro F. Giffuni
1d04725a7a ext2fs: make some inode fields match the ext2 spec.
Ext2fs uses unsigned fields in its dinode struct.
FreeBSD can have negative values in some of those
fields and the inode is meant to interact with the
system so we have never respected the unsigned
nature of most of those fields.

Block numbers and the NFS generation number do
not need to be signed so redefine them as
unsigned to better match the on-disk information.

MFC after:	1 week
2013-01-22 18:54:03 +00:00
Pedro F. Giffuni
4b21c8fda9 ext2fs: temporarily disable the reallocation code.
Testing with fsx has revealed problems and in order to
hunt the bugs properly we need reduce the complexity.

This seems to help but is not a complete solution.

MFC after:	3 days
2013-01-22 18:36:31 +00:00
Pedro F. Giffuni
b187108f66 ext2fs: Add some DOINGASYNC check to match ffs.
This is mostly cosmetical.

Reviewed by:	bde
MFC after:	3 days
2013-01-18 19:11:17 +00:00
Pedro F. Giffuni
98f5c0d41a ext2fs: cleanup de dinode structure.
It was plagued with style errors and the offsets had been lost.
While here took the time to update the fields according to the
latest ext4 documentation.

Reviewed by:	bde
MFC after:	3 days
2013-01-07 03:36:32 +00:00
Pedro F. Giffuni
e28f5d5222 More constant renaming in preparation for newer features.
We also try to make better use of the fs flags instead of
trying adapt the code according to the fs structures. In
the case of subsecond timestamps and birthtime we now
check that the feature is explicitly enabled: previously
we only checked that the reserved space was available and
silently wrote them.

This approach is much safer, especially if the filesystem
happens to use embedded inodes or support EAs.

Discussed with:	Zheng Liu
MFC after:	5 days
2012-12-20 02:22:36 +00:00
Pedro F. Giffuni
371f338bfd Update some definitions or make them match NetBSD's headers.
Bring several definitions required for newer ext4 features.

Rename EXT2F_COMPAT_HTREE to EXT2F_COMPAT_DIRHASHINDEX since it
is not being used yet and the new name is more compatible with
NetBSD and Linux.

This change is purely cosmetic and has no effect on the real
code.

Obtained from:	NetBSD
MFC after:	3 days
2012-11-28 15:48:32 +00:00
Pedro F. Giffuni
7306dea4e8 Partially bring r242520 to ext2fs.
When a file is first being written, the dynamic block reallocation
(implemented by ext2_reallocblks) relocates the file's blocks
so as to cluster them together into a contiguous set of blocks on
the disk.

When the cluster crosses the boundary into the first indirect block,
the first indirect block is initially allocated in a position
immediately following the last direct block.  Block reallocation
would usually destroy locality by moving the indirect block out of
the way to keep the data blocks contiguous.

The issue was diagnosed long ago by Bruce Evans on ffs and surfaced
on ext2fs when block reallocaton was ported. This is only a partial
solution based on the similarities with FFS. We still require more
review of the allocation details that vary in ext2fs.

Reported by:	bde
MFC after:	1 week
2012-11-28 00:36:40 +00:00
Attilio Rao
c6e0355cee r16312 is not any longer real since many years (likely since when VFS
received granular locking) but the comment present in UFS has been
copied all over other filesystems code incorrectly for several times.

Removes comments that makes no sense now.

Reviewed by:	kib
MFC after:	3 days
2012-11-19 22:43:45 +00:00
Attilio Rao
bc2258da88 Complete MPSAFE VFS interface and remove MNTK_MPSAFE flag.
Porters should refer to __FreeBSD_version 1000021 for this change as
it may have happened at the same timeframe.
2012-11-09 18:02:25 +00:00
Matthew D Fleming
fc8fdae0df Fix up kernel sources to be ready for a 64-bit ino_t.
Original code by:	Gleb Kurtsou
2012-09-27 23:30:49 +00:00
Kevin Lo
8e46bf68d1 Fix style nit 2012-09-11 08:36:41 +00:00
Pedro F. Giffuni
051b0df565 Add some basic definitions for a future htree implementation.
MFC after:	3 days
2012-08-24 01:12:07 +00:00
Kevin Lo
5bb295c408 Fix typo 2012-08-18 16:13:16 +00:00
Mateusz Guzik
1ec9bedabe Remove unused member of struct indir (in_exists) from UFS and EXT2 code.
Reviewed by:	mckusick
Approved by:	trasz (mentor)
MFC after:	1 week
2012-08-17 17:45:27 +00:00
Kevin Lo
f7a3729c91 Use NULL instead of 0 for pointers 2012-07-22 15:40:31 +00:00
Kevin Lo
d6cc34a1ad Fix a typo 2012-07-03 08:03:07 +00:00
Pedro F. Giffuni
553c9b4d08 Fix a couple of issues that appear to be inherited from the old
8.x code:
- If the lock cannot be acquired immediately unlocks 'bar' vnode
and then locks both vnodes in order.
- wrong vnode type panics from cache_enter_time after calls by
ext2_lookup.

The fix merges the fixes from ufs/ufs_lookup.c.

Submitted by:	Mateusz Guzik
Approved by:	jhb@ (mentor)
Reviewed by:	kib@
MFC after:	1 week
2012-05-16 15:53:38 +00:00
Sergey Kandaurov
7d5f5d83f5 Fix mount interlock oversights from the previous change in r234386.
Reported by:	dougb
Submitted by:	Mateusz Guzik <mjguzik at gmail com>
Reviewed by:	Kirk McKusick
Tested by:	pho
2012-05-10 20:28:33 +00:00
Edward Tomasz Napierala
af6e6b87ad Remove unused thread argument to vrecycle().
Reviewed by:	kib
2012-04-23 14:10:34 +00:00
Edward Tomasz Napierala
c52fd858ae Remove unused thread argument from vtruncbuf().
Reviewed by:	kib
2012-04-23 13:21:28 +00:00
Kirk McKusick
71469bb38f Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL.
The primary changes are that the user of the interface no longer
needs to manage the mount-mutex locking and that the vnode that
is returned has its mutex locked (thus avoiding the need to check
to see if its is DOOMED or other possible end of life senarios).

To minimize compatibility issues for third-party developers, the
old MNT_VNODE_FOREACH interface will remain available so that this
change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH
will be removed in head.

The reason for this update is to prepare for the addition of the
MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the
active vnodes associated with a mount point (typically less than
1% of the vnodes associated with the mount point).

Reviewed by: kib
Tested by:   Peter Holm
MFC after:   2 weeks
2012-04-17 16:28:22 +00:00
Jaakko Heinonen
295a542d96 Apply changes from r234103 to ext2fs:
Return EPERM from ext2_setattr() when an user without PRIV_VFS_SYSFLAGS
privilege attempts to toggle SF_SETTABLE flags.

Flags are now stored to ip->i_flags in one place after all checks.

Also, remove SF_NOUNLINK from the checks because ext2fs doesn't support
that flag.

Reviewed by:	bde
2012-04-13 05:48:31 +00:00
Jaakko Heinonen
e6b8bdf252 Restore the blank line incorrectly removed in r234104.
Pointed out by:	bde
2012-04-11 15:48:50 +00:00
Jaakko Heinonen
034efc61ba Apply changes from r233787 to ext2fs:
- Use more natural ip->i_flags instead of vap->va_flags in the final
  flags check.
- Style improvements.

No functional change intended.

MFC after:	2 weeks
2012-04-10 16:05:52 +00:00
Konstantin Belousov
b80dcb55aa Remove fifo.h. The only used function declaration from the header is
migrated to sys/vnode.h.

Submitted by:	gianni
2012-03-11 12:19:58 +00:00
Pedro F. Giffuni
035e4e0494 Add support for ns timestamps and birthtime to the ext2/3 driver.
When using big inodes there is sufficient space in ext3 to
keep extra resolution and birthtime (creation) timestamps.
The appropriate fields in the on-disk inode have been approved
for a long time but support for this in ext3 has not been
widely  distributed.

In preparation for ext4 most linux distributions have enabled
by default such bigger inodes and some people use nanosecond
timestamps in ext3. We now support those when the inode is big
enough and while we do recognize the EXT4F_ROCOMPAT_EXTRA_ISIZE,
we maintain the extra timestamps even when they are not used.

An additional note by Bruce Evans:
We blindly accept unrepresentable tv_nsec in VOP_SETATTR(), but
all file  systems have always done that.  When POSIX gets around
to  specifying the behaviour, it will probably require certain
rounding to the fs's resolution and not rejecting the request.
This unfortunately means that syscalls that set times can't
really tell if they succeeded without reading back the times
using stat() or similar and checking that they were set close
enough.

Reviewed by:	bde
Approved by:	jhb (mentor)
MFC after:	2 weeks
2012-03-08 21:06:05 +00:00
Konstantin Belousov
526d0bd547 Fix found places where uio_resid is truncated to int.
Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.

Discussed with:	bde, das (previous versions)
MFC after:	1 month
2012-02-21 01:05:12 +00:00
Pedro F. Giffuni
3cc6ae1f57 Update the data structures with some fields reserved for
ext4 but that can be used in ext3 mode.

Also adjust the internal inode to carry the birthtime,
like in UFS, which is starting to get some use when
big inodes are available.

Right now these are just placeholders for features
to come.

Approved by:	jhb (mentor)
MFC after:	2 weeks
2012-02-07 22:31:28 +00:00
Konstantin Belousov
c480f781ea Current implementations of sync(2) and syncer vnode fsync() VOP uses
mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which
is needed to guarantee a synchronous completion of the initiated i/o
before syscall or VOP return.  Global removal of MNTK_ASYNC option is
harmful because not only i/o started from corresponding thread becomes
synchronous, but all i/o is synchronous on the filesystem which is
initiated during sync(2) or syncer activity.

Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local
thread flag to disable async i/o for current thread only. Use the
opportunity to move DOINGASYNC() macro into sys/vnode.h and
consistently use it through places which tested for MNTK_ASYNC.

Some testing demonstrated 60-70% improvements in run time for the
metadata-intensive operations on async-mounted UFS volumes, but still
with great deviation due to other reasons.

Reviewed by:	mckusick
Tested by:	scottl
MFC after:	2 weeks
2012-02-06 11:04:36 +00:00
Ed Schouten
8f8d30274a Migrate ufs and ext2fs from skpc() to memcchr().
While there, remove a useless check from the code. memcchr() always
returns characters unequal to 0xff in this case, so inosused[i] ^ 0xff
can never be equal to zero. Also, the fact that memcchr() returns a
pointer instead of the number of bytes until the end, makes conversion
to an offset far more easy.
2012-01-01 20:47:33 +00:00
Pedro F. Giffuni
5ed5554f0a Style cleanups by jh@.
Fix a comment from the previous commit.
Use M_ZERO instead of bzero() in ext2_vfsops.c
Add include guards from PR.

PR:		162564
Approved by:	jhb (mentor)
MFC after:	2 weeks
2011-12-16 15:47:43 +00:00
Pedro F. Giffuni
5b63c1252b Bring in reallocblk to ext2fs.
The feature has been standard for a while in UFS as a means to reduce
fragmentation, therefore maintaining consistent performance with
filesystem aging. This is also very similar to what ext4 calls
"delayed allocation".

In his 2010 GSoC, Zheng Liu ported and benchmarked the missing
FANCY_REALLOC code to find more consistent performance improvements than
with the preallocation approach.

PR:		159233
Author:		Zheng Liu <gnehzuil AT SPAMFREE gmail DOT com>
Sponsored by:	Google Inc.
Approved by:	jhb (mentor)
MFC after:	2 weeks
2011-12-15 20:31:18 +00:00
Pedro F. Giffuni
c14d4ad1c6 Merge ext2_readwrite.c into ext2_vnops.c as done in UFS in r101729.
This removes the obfuscations mentioned in ext2_readwrite and
places the clustering funtion in a location similar to other
UFS-based implementations.

No performance or functional changeses are expected from
this move.

PR:		kern/159232
Suggested by:	bde
Approved by:	jhb (mentor)
MFC after:	2 weeks
2011-12-14 22:04:14 +00:00