1. Use the default function to access all the specfs operations.
2. Use the default function to access all the fifofs operations.
3. Use the default function to access all the ufs operations.
4. Fix VCALL usage in vfs_cache.c
5. Use VOCALL to access specfs functions in devfs_vnops.c
6. Staticize most of the spec and fifofs vnops functions.
7. Make UFS panic if it lacks bits of the underlying storage handling.
1. Remove comment stating the blatantly obvious.
2. Align in two columns.
3. Sort all but the default element alphabetically.
4. Remove XXX comments pointing out entries not needed.
Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.
A couple of finer points by: bde
1. Clustered I/O is switched by the MNT_NOCLUSTERR and MNT_NOCLUSTERW
bits of the mnt_flag. The sysctl variables, vfs.foo.doclusterread
and vfs.foo.doclusterwrite are deleted. Only mount option can
control clustered I/O from userland.
2. When foofs_mount mounts block device, foofs_mount checks D_CLUSTERR
and D_CLUSTERW bits of the d_flags member in the block device switch
table. If D_NOCLUSTERR / D_NOCLUSTERW are set, MNT_NOCLUSTERR /
MNT_NOCLUSTERW bits will be set. In this case, MNT_NOCLUSTERR and
MNT_NOCLUSTERW cannot be cleared from userland.
3. Vnode driver disables both clustered read and write.
4. Union filesystem disables clutered write.
Reviewed by: bde
plus the previous changes to use the zone allocator decrease the useage
of malloc by half. The Zone allocator will be upgradeable to be able
to use per CPU-pools, and has more intelligent usage of SPLs. Additionally,
it has reasonable stats gathering capabilities, while making most calls
inline.
socket addresses in mbufs. (Socket buffers are the one exception.) A number
of kernel APIs needed to get fixed in order to make this happen. Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead. Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.
uerror == 0 && lerror == EACCES, lowervp == NULLVP and union_allocvp
doesn't find existing union node and new union node is created.
Sicne it is dificult to cover all the case, union_lookup always
returns when union_lookup1() returns EACCES.
Submitted by: Naofumi Honda <honda@Kururu.math.sci.hokudai.ac.jp>
Obtained from: NetBSD/pc98
reading/writing of mem and regs). Also have to check for the requesting
process being group KMEM -- this is a bit of a hack, but ps et al need it.
Reviewed by: davidg
in savedvp variable and it is used for the argument of
MOUNTTOUNIONMOUNT(). I didn't realize ap->a_vp is modified before
MOUNTTOUNIONMOUNT(), so the change by revision 1.22 is incorrect.
UN_KLOCK flag.
When UN_KLOCK is set, VOP_UNLOCK should keep uppervp locked and clear
UN_ULOCK flag. To do this, when UN_KLOCK is set, (1) union_unlock
clears UN_ULOCK and does not clear UN_KLOCK, (2) union_lock() does not
access uppervp and does not clear UN_KLOCK, and (3) callers of
vput/VOP_UNLOCK should clear UN_KLOCK. For example, vput becomes:
SETKLOCK(union_node);
vput(vnode);
CLEARKLOCK(union_node);
where SETKLOCK macro sets UN_KLOCK and CLEARKLOCK macro clears
UN_KLOCK.
Our vput calls vm_object_deallocate() --> vm_object_terminate(). The
vm_object_terminate() calls vn_lock(), since UN_LOCKED has been
already cleared in union_unlock(). Then, union_lock locks upper vnode
when UN_ULOCK is not set. The upper vnode is not unlocked when
UN_KLOCK is set in union_unlock(), thus, union_lock tries to lock
locked vnode and we get panic.
UN_ULOCK flag. This shows a locking violation but I couldn't find the
reason UN_ULOCK is not set or upper vnode is not unlocked. I added
the code that detect this case and adjust un_flags. DIAGNOSTIC kernel
doesn't adjust un_flags, but just panic here to help debug by kernel
hackers.
# mount -t union (or null) dir1 dir2
# mount -t union (or null) dir2 dir1
The function namei in union_mount calls union_root. The upper vnode
has been already locked and vn_lock in union_root causes above panic.
Add printf's included in `#ifdef DIAGNOSTIC' for EDEADLK cases.
is NULLVP, union node will have neither uppervp nor lowervp. This
causes page fault trap.
The union_removed_upper just remove union node from cache and it
doesn't set uppervp to NULLVP. Since union node is removed from
cache, it will not be referenced.
The code that remove union node from cache was copied from
union_inactive.
VOP_LINK(). The reason of strange behavior was wrong order of the
argument, that is, the operation
# ln foo bar
in a union fs tried to do
# ln bar foo
in ufs layer.
Now we can make a link in a union fs.
fix!
The ufs_link() assumes that vnode is not unlocked and tries to lock it
in certain case. Because union_link calls VOP_LINK after locking vnode,
vn_lock in ufs_link causes above panic.
Currently, I don't know the real fix for a locking violation in
union_link, but I think it is important to avoid panic.
A vnode is unlocked before calling VOP_LINK and is locked after it if
the vnode is not union fs. Even though panic went away, the process
that access the union fs in which link was made will hang-up.
Hang-up can be easily reproduced by following operation:
mount -t union a b
cd b
ln foo bar
ls
same directory pair.
If we do:
mount -t union a b
mount -t union a b
then, (1) namei tries to lock fs which has been already locked by
first union mount and (2) union_root() tries to lock locked fs. To
avoid first deadlock condition, unlock vnode if lowerrootvp is union
node, and to avoid second case, union_mount returns EDEADLK when multi
union mount is detected.
dolock is not set (that is, targetvp == overlaying vnode object).
Current code use FIXUP macro to do this, and never unlocks overlaying
vnode object in union_fsync. So, the vnode object will be locked
twice and never unlocked.
PR: 3271
Submitted by: kato
relookup() in union_relookup() is succeeded. However, if relookup()
returns non-zero value, that is relookup fails, VOP_MKDIR is never
called (c.f. union_mkshadow). Thus, pathname buffer is never FREEed.
Reviewed by: phk
Submitted by: kato
PR: 3262
by Alan Cox <alc@cs.rice.edu>, and his description of the problem.
The bug was primarily in procfs_mem, but the mistake likely happened
due to the lack of vm system support for the operation. I added
better support for selective marking of page dirty flags so that
vm_map_pageable(wiring) will not cause this problem again.
The code in procfs_mem is now less bogus (but maybe still a little
so.)
in procfs_allocvp(). This fixes at least stat() of /proc/*/mem.
stat() of /proc/*/file already worked. I think procfs_allocvp() isn't
actually called for type Pfile.
partly because the #define's for them were moved to a different
file. At least the null VOP_LOCK() no longer works, since vclean()
expects VOP_LOCK( ..., LK_DRAIN | LK_INTERLOCK, ...) to clear the
interlock. This probably only matters when simple_lock() is not
null, i.e., when there are multiple CPUs or SIMPLELOCK_DEBUG is
defined.
Call vget/VOP_UNLOCK with the correct number of
arguments. Call vn_lock where appropriate.
vfs_goneall is now replaced by VOP_REVOKE.
Submitted by: bde
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.
The system boots and can mount UFS filesystems.
Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.
Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
Broke locking on named pipes in the same way as locking on non-vnodes
(wrong errno). This will be fixed later.
The fix involves negative logic. Named pipes are now distinguished from
other types of files with vnodes, and there is additional code to handle
vnodes and named pipes in the same way only where that makes sense (not
for lseek, locking or TIOCSCTTY).
fcntl() and EOPNOTSUPP for flock(). POSIX specifies the weaker EINVAL
errno and the man page agrees.
Not fixed:
deadfs: always returns wrong EBADF
devfs, msdosfs: always return sometimes-wrong EINVAL
cd9660, fdesc, kernfs, portal: always return sometimes-wrong EOPNOTSUPP
procfs: always returns wrong EIO
mfs: panic?!
nfs: fudged
NetBSD uses a generic file system genfs to do return the sometimes-wrong
EOPNOTSUPP more consistently :-)(.
Found by: NIST-PCTS
also fixes a bug I've been chasing for a LONG TIME,
due to the fact that spec_bwrite is a NOP and I didn't realise it..
old symptom:
mount -t devfs devfs /mnt
mount /mnt/wd0e /mnt/mnt2
umount /mnt2 <process hangs>
there are some pretty large structural differences internal to devfs
but outwards it should look the same.
I have not yet tested extensively but will do so and fix 3 warnings tomorrow.
The interface into the "VMIO" system has changed to be more consistant
and robust. Essentially, it is now no longer necessary to call vn_open
to get merged VM/Buffer cache operation, and exceptional conditions
such as merged operation of VBLK devices is simpler and more correct.
This code corrects a potentially large set of problems including the
problems with ktrace output and loaded systems, file create/deletes,
etc.
Most of the changes to NFS are cosmetic and name changes, eliminating
a layer of subroutine calls. The direct calls to vput/vrele have
been re-instituted for better cross platform compatibility.
Reviewed by: davidg
cleaning up some of the vnode usage..
(I'm sure it still needs more..)
where can one find out what each vfs call expects to be locked
on completion, and how can one find out what each layer expects
to be freed on error.?
it only barely works so don't get too carried away..
I noticed that teh symlink is length 0..
I guess I'll fix that tomorrow..
it also sometimes panics with "cleaned vnode isn't" but it's not more
broken than it was before.. I really want to go over it with someone
who understands the lifecycle of a vnode better than I do..
terry?
kirk?
david?
john?
to information from a single process causes hangs. Specifically, this
fixes problems (hangs) with concurrent ps commands, when the system is under
heavy memory load.
Reviewed by: davidg
but not there. The extent of the object lock is expanded to be over the
range that it is needed. Additionally, clean up the code so that it conforms
to better coding style.
with multiple entries as follows:
start address, end address, resident pages in range, private pages
in range, RW/RO, COW or not, (vnode/device/swap/default).
process won't possibly block before filling in the fsnode pointer (v_data)
which might be dereferenced during a sync since the vnode is put on the
mnt_vnodelist by getnewvnode.
Pointed out by Matt Day <mday@artisoft.com>
device have reference count problems. We mark the underlying object
ono-persistent, and account for the reference count that the VM system
maintainsfor the special device close. This should fix the removable
device problem.
files are off the vendor branch, so this should not change anything.
A "U" marker generally means that the file was not changed in between
the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally
means that there was a change.
[two new auxillary files in miscfs/union]
DEVFS filesystems..
- if ( error = dev_add_name(child->name,parent->dnp
+ if ( error = dev_add_name(child->name,falias->dnp
Ok bruce, this is the one you were seeing..
This is a really ugly bandaid on the problem, but it works well enough for
'ps -u' to start working again. The problem was caused by the user
address space shrinking by a little bit and the UPAGES being "cast off" to
become a seperate entity rather than being at the top of the process's
vmspace. That optimization was part of John's most recent VM speedups.
Now, rather than decoding the VM space, it merely ensures the pages are
in core and accesses them the same way the ptrace(PT_READ_U..) code does,
ie: off the p->p_addr pointer.
Implement a "variable" directory structure. Files that do not make
sense for the given process do not "appear" and cannot be opened.
For example, "system" processes do not have "file", "regs" or "fpregs",
because they do not have a user area.
"attempt" to fill in the user area of a given process when it is being
accessed via /proc/pid/mem (the user struct is just after
VM_MAXUSER_ADDRESS in the process address space.)
Dont do IO to the U area while it's swapped, hold it in place if possible.
Lock off access to the "ctl" file if it's done a setuid like the other
pseudo-files in there.