relookup() in union_relookup() is succeeded. However, if relookup()
returns non-zero value, that is relookup fails, VOP_MKDIR is never
called (c.f. union_mkshadow). Thus, pathname buffer is never FREEed.
Reviewed by: phk
Submitted by: kato
PR: 3262
by Alan Cox <alc@cs.rice.edu>, and his description of the problem.
The bug was primarily in procfs_mem, but the mistake likely happened
due to the lack of vm system support for the operation. I added
better support for selective marking of page dirty flags so that
vm_map_pageable(wiring) will not cause this problem again.
The code in procfs_mem is now less bogus (but maybe still a little
so.)
of setting it (compiled into vfs_conf.c), but we have a dynamic system
in place. This could probably be better done via a runtime configure
flag in the VFS_SET() VFS declaration, perhaps VFCF_LOCAL, and have the
VFS code propagate this down into MNT_LOCAL at mount time. The other FS's
would need to be updated, havinf UFS and MSDOSFS filesystems without
MNT_LOCAL breaks a few things.. the man page rebuild scans for local
filesystems and currently fails, I suspect that other tools like find
and tar with their "local filesystem only" modes might be affected.
in procfs_allocvp(). This fixes at least stat() of /proc/*/mem.
stat() of /proc/*/file already worked. I think procfs_allocvp() isn't
actually called for type Pfile.
partly because the #define's for them were moved to a different
file. At least the null VOP_LOCK() no longer works, since vclean()
expects VOP_LOCK( ..., LK_DRAIN | LK_INTERLOCK, ...) to clear the
interlock. This probably only matters when simple_lock() is not
null, i.e., when there are multiple CPUs or SIMPLELOCK_DEBUG is
defined.
- *fs_init routines now take a "struct vfsconf * vfsp" pointer
as an argument.
- Use the correct type for cookies.
- Update function prototypes.
Submitted by: bde
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.
The system boots and can mount UFS filesystems.
Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.
Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
Firstly, now our read-ahead clustering is on a file descriptor basis and not
on a per-vnode basis. This will allow multiple processes reading the
same file to take advantage of read-ahead clustering. Secondly, there
previously was a problem with large reads still using the ramp-up
algorithm. Of course, that was bogus, and now we read the entire
"chunk" off of the disk in one operation. The read-ahead clustering
algorithm should use less CPU than the previous also (I hope :-)).
NOTE: THAT LKMS MUST BE REBUILT!!!
successful write. Only do it for the IO_SYNC case (like ufs). On
one of my systems, this speeds up `iozone 24 512' from 32K/sec
(1/128 as fast as ufs) to 2.8MB/sec (7/10 as fast as ufs).
Obtained from: partly from NetBSD
Broke locking on named pipes in the same way as locking on non-vnodes
(wrong errno). This will be fixed later.
The fix involves negative logic. Named pipes are now distinguished from
other types of files with vnodes, and there is additional code to handle
vnodes and named pipes in the same way only where that makes sense (not
for lseek, locking or TIOCSCTTY).
fcntl() and EOPNOTSUPP for flock(). POSIX specifies the weaker EINVAL
errno and the man page agrees.
Not fixed:
deadfs: always returns wrong EBADF
devfs, msdosfs: always return sometimes-wrong EINVAL
cd9660, fdesc, kernfs, portal: always return sometimes-wrong EOPNOTSUPP
procfs: always returns wrong EIO
mfs: panic?!
nfs: fudged
NetBSD uses a generic file system genfs to do return the sometimes-wrong
EOPNOTSUPP more consistently :-)(.
Found by: NIST-PCTS
certain error conditions, it is possible for pages to be left allocated
in the object beyond it's end. It is generally bad practice to allocate
pages beyond the end of an object.
/*
* Structure defined by POSIX.4 to be like a timeval.
*/
struct timespec {
time_t ts_sec; /* seconds */
long ts_nsec; /* and nanoseconds */
};
The correct names of the fields are tv_sec and tv_nsec.
Reminded by: James Drobina <jdrobina@infinet.com>
The interface into the "VMIO" system has changed to be more consistant
and robust. Essentially, it is now no longer necessary to call vn_open
to get merged VM/Buffer cache operation, and exceptional conditions
such as merged operation of VBLK devices is simpler and more correct.
This code corrects a potentially large set of problems including the
problems with ktrace output and loaded systems, file create/deletes,
etc.
Most of the changes to NFS are cosmetic and name changes, eliminating
a layer of subroutine calls. The direct calls to vput/vrele have
been re-instituted for better cross platform compatibility.
Reviewed by: davidg
to information from a single process causes hangs. Specifically, this
fixes problems (hangs) with concurrent ps commands, when the system is under
heavy memory load.
Reviewed by: davidg
but not there. The extent of the object lock is expanded to be over the
range that it is needed. Additionally, clean up the code so that it conforms
to better coding style.
with multiple entries as follows:
start address, end address, resident pages in range, private pages
in range, RW/RO, COW or not, (vnode/device/swap/default).
All new code is "#ifdef PC98"ed so this should make no difference to
PC/AT (and its clones) users.
Ok'd by: core
Submitted by: FreeBSD(98) development team
process won't possibly block before filling in the fsnode pointer (v_data)
which might be dereferenced during a sync since the vnode is put on the
mnt_vnodelist by getnewvnode.
Pointed out by Matt Day <mday@artisoft.com>
process won't possibly block before filling in the fsnode pointer (v_data)
which might be dereferenced during a sync since the vnode is put on the
mnt_vnodelist by getnewvnode.
device have reference count problems. We mark the underlying object
ono-persistent, and account for the reference count that the VM system
maintainsfor the special device close. This should fix the removable
device problem.
files are off the vendor branch, so this should not change anything.
A "U" marker generally means that the file was not changed in between
the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally
means that there was a change.
[note, new file: cd9660_mount.h]
This is a really ugly bandaid on the problem, but it works well enough for
'ps -u' to start working again. The problem was caused by the user
address space shrinking by a little bit and the UPAGES being "cast off" to
become a seperate entity rather than being at the top of the process's
vmspace. That optimization was part of John's most recent VM speedups.
Now, rather than decoding the VM space, it merely ensures the pages are
in core and accesses them the same way the ptrace(PT_READ_U..) code does,
ie: off the p->p_addr pointer.
Implement a "variable" directory structure. Files that do not make
sense for the given process do not "appear" and cannot be opened.
For example, "system" processes do not have "file", "regs" or "fpregs",
because they do not have a user area.
"attempt" to fill in the user area of a given process when it is being
accessed via /proc/pid/mem (the user struct is just after
VM_MAXUSER_ADDRESS in the process address space.)
Dont do IO to the U area while it's swapped, hold it in place if possible.
Lock off access to the "ctl" file if it's done a setuid like the other
pseudo-files in there.
Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish
overhead for merged cache.
Efficiency improvement for vfs_cluster. It used to do alot of redundant
calls to cluster_rbuild.
Correct the ordering for vrele of .text and release of credentials.
Use the selective tlb update for 486/586/P6.
Numerous fixes to the size of objects allocated for files. Additionally,
fixes in the various pagers.
Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs.
Fixes in the swap pager for exhausted resources. The pageout code
will not as readily thrash.
Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into
page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE),
thereby improving efficiency of several routines.
Eliminate even more unnecessary vm_page_protect operations.
Significantly speed up process forks.
Make vm_object_page_clean more efficient, thereby eliminating the pause
that happens every 30seconds.
Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the
case of filesystems mounted async.
Fix a panic with busy pages when write clustering is done for non-VMIO
buffers.
seems to work hre just fine though I can't check every file
that changed due to limmited h/w, however I've checked enught to be petty
happy withe hte code..
WARNING... struct lkm[mumble] has changed
so it might be an idea to recompile any lkm related programs
most devsw referenced functions are now static, as they are
in the same file as their devsw structure. I've also added DEVFS
support for nearly every device in the system, however
many of the devices have 'incorrect' names under DEVFS
because I couldn't quickly work out the correct naming conventions.
(but devfs won't be coming on line for a month or so anyhow so that doesn't
matter)
If you "OWN" a device which would normally have an entry in /dev
then search for the devfs_add_devsw() entries and munge to make them right..
check out similar devices to see what I might have done in them in you
can't see what's going on..
for a laugh compare conf.c conf.h defore and after... :)
I have not doen DEVFS entries for any DISKSLICE devices yet as that will be
a much more complicated job.. (pass 5 :)
pass 4 will be to make the devsw tables of type (cdevsw * )
rather than (cdevsw)
seems to work here..
complaints to the usual places.. :)
cd9660_rrip.c:
Added lots of bogus casts to hide type errors exposed by the prototypes.
(Different structs are assumed to have a common prefix.)
cd9660_vnops.c:
Finished staticizing.
add a few safety checks in specfs because
now it's possible to get entries in [cd]devsw[] which are ALL NULL
so it's better to discover this BEFORE jumping into the d_open() entry..
more check to come later.. this getsthe code to the stage where I
can start testing it, even if I haven't caught every little error case...
I guess I'll find them quick enough..
for certain common combinations of directory sizes, cluster sizes, and i/o
sizes (e.g., 4K, 4K, and 4K). The fix in rev. 1.21 was incomplete.
Reviewed by: dfr
Obtained from: party from NetBSD
The fix for this in Lite-2 is more complete, but these quick hacks of mine
are safer for now. I plan to integrate the additional Lite-2 stuff at some
later time. Should completely fix PR810.
it 1138 times (:-() in casts and a few more times in declarations.
This change is null for the i386.
The type has to be `typedef int vop_t(void *)' and not `typedef
int vop_t()' because `gcc -Wstrict-prototypes' warns about the
latter. Since vnode op functions are called with args of different
(struct pointer) types, neither of these function types is any use
for type checking of the arg, so it would be preferable not to use
the complete function type, especially since using the complete
type requires adding 1138 casts to avoid compiler warnings and
another 40+ casts to reverse the function pointer conversions before
calling the functions.
Should anybody out there wonder about this vendetta against global
variables, it is basically to make it more visible what our interfaces
in the kernel really are.
I'm almost convinced we should have a
#define PUBLIC /* public interface */
and use it in the #includes...
filesystem layer, as was done in lite-2. Merged in some other cosmetic
changes while I was at it. Rewrote most of msdosfs_access() to be more
like ufs_access() and to include the FS read-only check.
Obtained from: partially from 4.4BSD-lite2
calls.
Found by: gcc -Wstrict-prototypes after I supplied some of the 5000+
missing prototypes. Now I have 9000+ lines of warnings and errors
about bogus conversions of function pointers.
SunOS and SCO. You can then even use the pipe as a cheap fifo stack
(yuck!). A semantic change also important (but not limited) to iBCS2
compatibility.
Submitted by: swallace
wrong vp's ops vector being used by changing the VOP_LINK's argument order.
The special-case hack doesn't go far enough and breaks the generic
bypass routine used in some non-leaf filesystems. Pointed out by Kirk
McKusick.
don't go away when the kernel is compiled with -O.
The functions are backed up by extern versions in cd9660_util.c,
but these versions are disabled by `#ifdef __notanymore__'. They
could have been enabled by using `#if defined(__notanymore__) ||
!defined(__OPTIMIZE__)' but then I would have had to check that
they still work. The correct way to handle all this is to replace
`extern inline' by `EXTERN_INLINE' and define `EXTERN_INLINE' as
`extern inline' in most modules and as empty in one module.
associated files.
Submitted by: leo@dachau.marco.de (Matthias Pfaller)
Not-obtained from: NetBSD. Instead sent directly to me by Matthias.
(Sorry, this is to prevent people from claiming i might have gotten
this from NetBSD. :)
proc or any VM system structure will have to be rebuilt!!!
Much needed overhaul of the VM system. Included in this first round of
changes:
1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".
2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.
3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.
4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.
5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.
6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.
7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.
8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.
9) Some almost useless debugging code removed.
10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.
11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.
12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).
13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.
14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)
TODO:
1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.
2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.
3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.
4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.
5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).
regular user could panic the machine with a simple "tail /proc/curproc/mem"
command. The problem was twofold: both kernfs and procfs didn't fill in
the mnt_stat statfs struct (which would later lead to an integer divide
fault in the vnode pager), and kernfs bogusly paniced if a bmap was
attempted.
Reviewed by: John Dyson
These changes solve the problem in a general way by moving the
initialization out of the individual fs_mountroot's and into swaponvp().
Submitted by: Poul-Henning Kamp