1. Clustered I/O is switched by the MNT_NOCLUSTERR and MNT_NOCLUSTERW
bits of the mnt_flag. The sysctl variables, vfs.foo.doclusterread
and vfs.foo.doclusterwrite are deleted. Only mount option can
control clustered I/O from userland.
2. When foofs_mount mounts block device, foofs_mount checks D_CLUSTERR
and D_CLUSTERW bits of the d_flags member in the block device switch
table. If D_NOCLUSTERR / D_NOCLUSTERW are set, MNT_NOCLUSTERR /
MNT_NOCLUSTERW bits will be set. In this case, MNT_NOCLUSTERR and
MNT_NOCLUSTERW cannot be cleared from userland.
3. Vnode driver disables both clustered read and write.
4. Union filesystem disables clutered write.
Reviewed by: bde
plus the previous changes to use the zone allocator decrease the useage
of malloc by half. The Zone allocator will be upgradeable to be able
to use per CPU-pools, and has more intelligent usage of SPLs. Additionally,
it has reasonable stats gathering capabilities, while making most calls
inline.
socket addresses in mbufs. (Socket buffers are the one exception.) A number
of kernel APIs needed to get fixed in order to make this happen. Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead. Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.
uerror == 0 && lerror == EACCES, lowervp == NULLVP and union_allocvp
doesn't find existing union node and new union node is created.
Sicne it is dificult to cover all the case, union_lookup always
returns when union_lookup1() returns EACCES.
Submitted by: Naofumi Honda <honda@Kururu.math.sci.hokudai.ac.jp>
Obtained from: NetBSD/pc98
in savedvp variable and it is used for the argument of
MOUNTTOUNIONMOUNT(). I didn't realize ap->a_vp is modified before
MOUNTTOUNIONMOUNT(), so the change by revision 1.22 is incorrect.
UN_KLOCK flag.
When UN_KLOCK is set, VOP_UNLOCK should keep uppervp locked and clear
UN_ULOCK flag. To do this, when UN_KLOCK is set, (1) union_unlock
clears UN_ULOCK and does not clear UN_KLOCK, (2) union_lock() does not
access uppervp and does not clear UN_KLOCK, and (3) callers of
vput/VOP_UNLOCK should clear UN_KLOCK. For example, vput becomes:
SETKLOCK(union_node);
vput(vnode);
CLEARKLOCK(union_node);
where SETKLOCK macro sets UN_KLOCK and CLEARKLOCK macro clears
UN_KLOCK.
Our vput calls vm_object_deallocate() --> vm_object_terminate(). The
vm_object_terminate() calls vn_lock(), since UN_LOCKED has been
already cleared in union_unlock(). Then, union_lock locks upper vnode
when UN_ULOCK is not set. The upper vnode is not unlocked when
UN_KLOCK is set in union_unlock(), thus, union_lock tries to lock
locked vnode and we get panic.
UN_ULOCK flag. This shows a locking violation but I couldn't find the
reason UN_ULOCK is not set or upper vnode is not unlocked. I added
the code that detect this case and adjust un_flags. DIAGNOSTIC kernel
doesn't adjust un_flags, but just panic here to help debug by kernel
hackers.
# mount -t union (or null) dir1 dir2
# mount -t union (or null) dir2 dir1
The function namei in union_mount calls union_root. The upper vnode
has been already locked and vn_lock in union_root causes above panic.
Add printf's included in `#ifdef DIAGNOSTIC' for EDEADLK cases.
is NULLVP, union node will have neither uppervp nor lowervp. This
causes page fault trap.
The union_removed_upper just remove union node from cache and it
doesn't set uppervp to NULLVP. Since union node is removed from
cache, it will not be referenced.
The code that remove union node from cache was copied from
union_inactive.
VOP_LINK(). The reason of strange behavior was wrong order of the
argument, that is, the operation
# ln foo bar
in a union fs tried to do
# ln bar foo
in ufs layer.
Now we can make a link in a union fs.
fix!
The ufs_link() assumes that vnode is not unlocked and tries to lock it
in certain case. Because union_link calls VOP_LINK after locking vnode,
vn_lock in ufs_link causes above panic.
Currently, I don't know the real fix for a locking violation in
union_link, but I think it is important to avoid panic.
A vnode is unlocked before calling VOP_LINK and is locked after it if
the vnode is not union fs. Even though panic went away, the process
that access the union fs in which link was made will hang-up.
Hang-up can be easily reproduced by following operation:
mount -t union a b
cd b
ln foo bar
ls
same directory pair.
If we do:
mount -t union a b
mount -t union a b
then, (1) namei tries to lock fs which has been already locked by
first union mount and (2) union_root() tries to lock locked fs. To
avoid first deadlock condition, unlock vnode if lowerrootvp is union
node, and to avoid second case, union_mount returns EDEADLK when multi
union mount is detected.
dolock is not set (that is, targetvp == overlaying vnode object).
Current code use FIXUP macro to do this, and never unlocks overlaying
vnode object in union_fsync. So, the vnode object will be locked
twice and never unlocked.
PR: 3271
Submitted by: kato
relookup() in union_relookup() is succeeded. However, if relookup()
returns non-zero value, that is relookup fails, VOP_MKDIR is never
called (c.f. union_mkshadow). Thus, pathname buffer is never FREEed.
Reviewed by: phk
Submitted by: kato
PR: 3262
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.
The system boots and can mount UFS filesystems.
Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.
Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
files are off the vendor branch, so this should not change anything.
A "U" marker generally means that the file was not changed in between
the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally
means that there was a change.
[two new auxillary files in miscfs/union]
it 1138 times (:-() in casts and a few more times in declarations.
This change is null for the i386.
The type has to be `typedef int vop_t(void *)' and not `typedef
int vop_t()' because `gcc -Wstrict-prototypes' warns about the
latter. Since vnode op functions are called with args of different
(struct pointer) types, neither of these function types is any use
for type checking of the arg, so it would be preferable not to use
the complete function type, especially since using the complete
type requires adding 1138 casts to avoid compiler warnings and
another 40+ casts to reverse the function pointer conversions before
calling the functions.
calls.
Found by: gcc -Wstrict-prototypes after I supplied some of the 5000+
missing prototypes. Now I have 9000+ lines of warnings and errors
about bogus conversions of function pointers.
wrong vp's ops vector being used by changing the VOP_LINK's argument order.
The special-case hack doesn't go far enough and breaks the generic
bypass routine used in some non-leaf filesystems. Pointed out by Kirk
McKusick.