plus the previous changes to use the zone allocator decrease the useage
of malloc by half. The Zone allocator will be upgradeable to be able
to use per CPU-pools, and has more intelligent usage of SPLs. Additionally,
it has reasonable stats gathering capabilities, while making most calls
inline.
This unifies several times in theory indentical 50 lines of code.
The filesystems have a new method: vop_cachedlookup, which is the
meat of the lookup, and use vfs_cache_lookup() for their vop_lookup
method. vfs_cache_lookup() will check the namecache and pass on
to the vop_cachedlookup method in case of a miss.
It's still the task of the individual filesystems to populate the
namecache with cache_enter().
Filesystems that do not use the namecache will just provide the
vop_lookup method as usual.
socket addresses in mbufs. (Socket buffers are the one exception.) A number
of kernel APIs needed to get fixed in order to make this happen. Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead. Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.
code that says this:
nfsm_request(vp, NFSPROC_FSSTAT, p, cred);
if (v3)
nfsm_postop_attr(vp, retattr);
if (!error)
nfsm_dissect(sfp, struct nfs_statfs *, NFSX_STATFS(v3));
The problem here is that if error != 0, nfsm_dissect() will not be
called, which leaves sfp == NULL. But nfs_statfs() does not bail out
at this point: it continues processing until it tries to dereference
sfp, which causes a panic. I was able to generate this crash under
the following conditions:
1) Set up a machine as an NFS server and NFS client, with amd running
(using NIS maps). /usr/local is exported, though any exported fs
can can be used to trigger the bug.
2) Log in as normal user, with home directory mounted from a SunOS 4.1.3
NFS server via amd (along with a few other NFS filesystems from same
machine).
3) Su to root and type the following:
# mount localhost:/usr/local /mnt
# df
To fix the panic, I changed the code to read:
if (!error) {
nfsm_dissect(sfp, struct nfs_statfs *, NFSX_STATFS(v3));
} else
goto nfsmout;
This is a bit kludgy in that nfsmout is a label defined by the nfsm_subs.h
macros, but these macros are themselves more than a little kludgy. This
stops the machine from crashing, but does not fix the overall bug: 'error'
somehow becomes 5 (EIO) when a statfs() is performed on the locally mounted
NFS filesystem. This seems to only happen the first time the filesystem
is accesed: on subsequent accesses, it seems to work fine again.
Now, I know there's no practical use in mounting a local filesystem
via NFS, but doing it shouldn't cause the system to melt down.
writes sent to the server were synchronous and therefore no commits are
needed. This is the same as the vfs.nfs.async variable on the server but
allows each client to choose whether to work this way.
Also make the vfs.nfs.async variable do the 'right' thing for NFSv3, i.e.
pretend that the write was synchronous.
attached to the vnode, some of them could be re-written synchronously
(if they overflowed the fixed size array nfs_flush had for them). The
fix involves mallocing an array if there are more than its limited
size stack buffer.
Reviewed by: Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp>
and b_validend. The changes to vfs_bio.c are a bit ugly but hopefully
can be tidied up later by a slight redesign.
PR: kern/2573, kern/2754, kern/3046 (possibly)
Reviewed by: dyson
compatible with boot proms made from the 2.2 source.
Convert the nfs arguments when copying to the new diskless structure.
Copy the gateway field in the diskless structure.
to fill in the nfs_diskless structure, at the cost of some kernel
bloat. The advantage is that this code works on a wider range of
network adapters than netboot. Several new kernel options are
documented in LINT.
Obtained from: parts of the code comes from NetBSD.
accessing files which it shouldn't be able to. This required a better
approximation of VOP_ACCESS for NFSv2 (NFSv3 already has an ACCESS rpc
which is a better solution) and adding a call to VOP_ACCESS from VOP_LOOKUP.
PR: kern/876, kern/2635
Submitted by: David Malone <dwmalone@maths.tcd.ie> (for kern/2635)
".." vnode. This is cheaper storagewise than keeping it in the
namecache, and it makes more sense since it's a 1:1 mapping.
2. Also handle the case of "." more intelligently rather than stuff
the namecache with pointless entries.
3. Add two lists to the vnode and hang namecache entries which go from
or to this vnode. When cleaning a vnode, delete all namecache
entries it invalidates.
4. Never reuse namecache enties, malloc new ones when we need it, free
old ones when they die. No longer a hard limit on how many we can
have.
5. Remove the upper limit on namelength of namecache entries.
6. Make a global list for negative namecache entries, limit their number
to a sysctl'able (debug.ncnegfactor) fraction of the total namecache.
Currently the default fraction is 1/16th. (Suggestions for better
default wanted!)
7. Assign v_id correctly in the face of 32bit rollover.
8. Remove the LRU list for namecache entries, not needed. Remove the
#ifdef NCH_STATISTICS stuff, it's not needed either.
9. Use the vnode freelist as a true LRU list, also for namecache accesses.
10. Reuse vnodes more aggresively but also more selectively, if we can't
reuse, malloc a new one. There is no longer a hard limit on their
number, they grow to the point where we don't reuse potentially
usable vnodes. A vnode will not get recycled if still has pages in
core or if it is the source of namecache entries (Yes, this does
indeed work :-) "." and ".." are not namecache entries any longer...)
11. Do not overload the v_id field in namecache entries with whiteout
information, use a char sized flags field instead, so we can get
rid of the vpid and v_id fields from the namecache struct. Since
we're linked to the vnodes and purged when they're cleaned, we don't
have to check the v_id any more.
12. NFS knew about the limitation on name length in the namecache, it
shouldn't and doesn't now.
Bugs:
The namecache statistics no longer includes the hits for ".."
and "." hits.
Performance impact:
Generally in the +/- 0.5% for "normal" workstations, but
I hope this will allow the system to be selftuning over a
bigger range of "special" applications. The case where
RAM is available but unused for cache because we don't have
any vnodes should be gone.
Future work:
Straighten out the namecache statistics.
"desiredvnodes" is still used to (bogusly ?) size hash
tables in the filesystems.
I have still to find a way to safely free unused vnodes
back so their number can shrink when not needed.
There is a few uses of the v_id field left in the filesystems,
scheduled for demolition at a later time.
Maybe a one slot cache for unused namecache entries should
be implemented to decrease the malloc/free frequency.
This commit includes the following changes:
1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility
glue for them is deleted, and the kernel will panic on boot if any are compiled
in.
2) Certain protocol entry points are modified to take a process structure,
so they they can easily tell whether or not it is possible to sleep, and
also to access credentials.
3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt()
call. Protocols should use the process pointer they are now passed.
4) The PF_LOCAL and PF_ROUTE families have been updated to use the new
style, as has the `raw' skeleton family.
5) PF_LOCAL sockets now obey the process's umask when creating a socket
in the filesystem.
As a result, LINT is now broken. I'm hoping that some enterprising hacker
with a bit more time will either make the broken bits work (should be
easy for netipx) or dike them out.
wind up writing zeros instead of real data when the file is on an NFSv2
mounted directory.
While tracking this bug down, I noticed that nfs_asyncio was waking *all*
the iods when a block was written instead of just one per block. Fixing this
gives a 25% performance improvment for writes on v2 (less for v3).
Both are 2.2 candidates.
PR: kern/2774
Zero the b_dirty{off,end} after cluster-comitting a group of buffers.
With these fixes, I was able to complete a 'make world' with remote src
and obj directories.