loading breakage'). The patch fixes serious issues with the VFS
operations vector array which results in a crash when a filesystem module
adding a new VOP is loaded into the kernel. Basically what was happening
before was that the old operations vector was being freed and a new one
allocated. The original MALLOC code tended to reuse the same address
for the case and so the bug did not rear its ugly head until the new memory
subsystem was emplaced.
This patch replaces the temporary workaround Dave O'Brien comitted in 1.58.
The patch is clean enough that I intend to MFC it to stable at some point.
Submitted by: Alexander Kabaev <ak03@gte.com>
MFC after: 1 week
modules (ie. procfs.ko).
When the kernel loads dynamic filesystem module, it looks for any of the
VOP operations specified by the new filesystem that have not been registered
already by the currently known filesystems. If any of such operations exist,
vfs_add_vnops function calls vfs_opv_recalc function, which rebuilds vop_t
vectors for each filesystem and sets all global pointers like ufs_vnops_p,
devfs_specop_p, etc to the new values and then frees the old pointers. This
behavior is bad because there might be already active vnodes whose v_op fields
will be left pointing to the random garbage, leading to inevitable crash soon.
Submitted by: Alexander Kabaev <ak03@gte.com>
Includes some minor whitespace changes, and re-ordering to be able to document
properly (e.g, grouping of variables and the SYSCTL macro calls for them, where
the documentation has been added.)
Reviewed by: phk (but all errors are mine)
sanity check, but it is too easy to run into, eg: making an ACL syscall
when no filesystems have the ACL implementation enabled.
The original reason for the panic was that the VOP_ vector had not been
assigned and therefor could not be passed down the stack.. and there
was no point passing it down since nothing implemented it anyway.
vop_defaultop entries could not pass it on because it had a zero (unknown)
vector that was indistinguishable from another unknown VOP vector.
Anyway, we can do something reasonable in this case, we shouldn't need
to panic here as there is a reasonable recovery option (return EOPNOTSUPP
and dont pass it down the stack).
Requested by: rwatson
unregister them after sysuninits when unloading.
* Add code to vfs_register() to set the oid number of vfs sysctls to
the type number of the filesystem.
Reviewed by: bde
This makes it possible to change the sysctl tree at runtime.
* Change KLD to find and register any sysctl nodes contained in the loaded
file and to unregister them when the file is unloaded.
Reviewed by: Archie Cobbs <archie@whistle.com>,
Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)
leaked memory on each unload and were limited to items referenced in
the kernel copy of vnode_if.c. Now a kernel module is free to create
it's own VOP_FOO() routines and the rest of the system will happily
deal with it, including passthrough layers like union/umap/etc.
Have VFS_SET() call a common vfs_modevent() handler rather than
inline duplicating the common code all over the place.
Have VNODEOP_SET() have the vnodeops removed at unload time (assuming a
module) so that the vop_t ** vector is reclaimed.
Slightly adjust the vop_t ** vectors so that calling slot 0 is a panic
rather than a page fault. This could happen if VOP_something() was called
without *any* handlers being present anywhere (including in vfs_default.c).
slot 1 becomes the default vector for the vnodeop table.
TODO: reclaim zones on unload (eg: nfs code)
This is the bulk of the support for doing kld modules. Two linker_sets
were replaced by SYSINIT()'s. VFS's and exec handlers are self registered.
kld is now a superset of lkm. I have converted most of them, they will
follow as a seperate commit as samples.
This all still works as a static a.out kernel using LKM's.
NFS_ROOT will produce kernel that cannot mount a UFS /.
Vfs type numbers must be distinct from VFS_GENERIC (and VFS_VFSCONF, but
that has the same value and should go away).
The problem happens because NFS is the first vfs (in sys/conf order) so it
gets type number 0 and conflicts harmfully with VFS_GENERIC which is also 0.
The conflict is apparently harmless in the usual case when another vfs
gets type number 0, because nfs is the only vfs that has sysctls.
Inital fix by: Dima <dima@tejblum.dnttm.rssi.ru>
Reason why it worked by: bde
type numbers in vfs attach order (modulo incomplete reuse of old
numbers after vfs LKMs are unloaded). This requires reinitializing
the sysctl tree (or at least the vfs subtree) for vfs's that support
sysctls (currently only nfs). sysctl_order() already handled
reinitialization reasonably except it checked for annulled self
references in the wrong place.
Fixed sysctls for vfs LKMs.
1. Add new file "sys/kern/vfs_default.c" where default actions for
VOPs go. Implement proper defaults for ABORTOP, BWRITE, LEASE,
POLL, REVOKE and STRATEGY. Various stuff spread over the entire
tree belongs here.
2. Change VOP_BLKATOFF to a normal function in cd9660.
3. Kill VOP_BLKATOFF, VOP_TRUNCATE, VOP_VFREE, VOP_VALLOC. These
are private interface functions between UFS and the underlying
storage manager layer (FFS/LFS/MFS/EXT2FS). The functions now
live in struct ufsmount instead.
4. Remove a kludge of VOP_ functions in all filesystems, that did
nothing but obscure the simplicity and break the expandability.
If a filesystem doesn't implement VOP_FOO, it shouldn't have an
entry for it in its vnops table. The system will try to DTRT
if it is not implemented. There are still some cruft left, but
the bulk of it is done.
5. Fix another VCALL in vfs_cache.c (thanks Bruce!)
Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.
A couple of finer points by: bde
plus the previous changes to use the zone allocator decrease the useage
of malloc by half. The Zone allocator will be upgradeable to be able
to use per CPU-pools, and has more intelligent usage of SPLs. Additionally,
it has reasonable stats gathering capabilities, while making most calls
inline.
the old VFS_VFSCONF sysctl is enabled by default.
Initialize the vfc_vfsops field to non-NULL in sysctl_ovfs_conf()
so that the old VFS_VFSCONF sysctl actually works. The old (still
current) getvfsent.c uses this "kernel-only" field to decide which
vfs's are configured (the old implementation returned null entries
for unconfigured vfs's).
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.
The system boots and can mount UFS filesystems.
Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.
Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
Unstaticize a function in scsi/scsi_base that was used, with an undocumented
option.
My last count on the LINT kernel shows:
Total symbols: 3647
unref symbols: 463
undef symbols: 4
1 ref symbols: 1751
2 ref symbols: 485
Approaching the pain threshold now.
Move a lot of variables home to their own code (In good time before xmas :-)
Introduce the string descrition of format.
Add a couple more functions to poke into these marvels, while I try to
decide what the correct interface should look like.
Next is adding vars on the fly, and sysctl looking at them too.
Removed a tine bit of defunct and #ifdefed notused code in swapgeneric.
Convert the remaining sysctl stuff to the new way of doing things.
the devconf stuff is the reason for the large number of files.
Cleaned up some compiler warnings while I were there.
it 1138 times (:-() in casts and a few more times in declarations.
This change is null for the i386.
The type has to be `typedef int vop_t(void *)' and not `typedef
int vop_t()' because `gcc -Wstrict-prototypes' warns about the
latter. Since vnode op functions are called with args of different
(struct pointer) types, neither of these function types is any use
for type checking of the arg, so it would be preferable not to use
the complete function type, especially since using the complete
type requires adding 1138 casts to avoid compiler warnings and
another 40+ casts to reverse the function pointer conversions before
calling the functions.