1. Add defaults for more VOPs
VOP_LOCK vop_nolock
VOP_ISLOCKED vop_noislocked
VOP_UNLOCK vop_nounlock
and remove direct reference in filesystems.
2. Rename the nfsv2 vnop tables to improve sorting order.
1. Remove VOP_UPDATE, it is (also) an UFS/{FFS,LFS,EXT2FS,MFS}
intereface function, and now lives in the ufsmount structure.
2. Remove VOP_SEEK, it was unused.
3. Add mode default vops:
VOP_ADVLOCK vop_einval
VOP_CLOSE vop_null
VOP_FSYNC vop_null
VOP_IOCTL vop_enotty
VOP_MMAP vop_einval
VOP_OPEN vop_null
VOP_PATHCONF vop_einval
VOP_READLINK vop_einval
VOP_REALLOCBLKS vop_eopnotsupp
And remove identical functionality from filesystems
4. Add vop_stdpathconf, which returns the canonical stuff. Use
it in the filesystems. (XXX: It's probably wrong that specfs
and fifofs sets this vop, shouldn't it come from the "host"
filesystem, for instance ufs or cd9660 ?)
5. Try to make system wide VOP functions have vop_* names.
6. Initialize the um_* vectors in LFS.
(Recompile your LKMS!!!)
1. Add new file "sys/kern/vfs_default.c" where default actions for
VOPs go. Implement proper defaults for ABORTOP, BWRITE, LEASE,
POLL, REVOKE and STRATEGY. Various stuff spread over the entire
tree belongs here.
2. Change VOP_BLKATOFF to a normal function in cd9660.
3. Kill VOP_BLKATOFF, VOP_TRUNCATE, VOP_VFREE, VOP_VALLOC. These
are private interface functions between UFS and the underlying
storage manager layer (FFS/LFS/MFS/EXT2FS). The functions now
live in struct ufsmount instead.
4. Remove a kludge of VOP_ functions in all filesystems, that did
nothing but obscure the simplicity and break the expandability.
If a filesystem doesn't implement VOP_FOO, it shouldn't have an
entry for it in its vnops table. The system will try to DTRT
if it is not implemented. There are still some cruft left, but
the bulk of it is done.
5. Fix another VCALL in vfs_cache.c (thanks Bruce!)
at moment of delivery. Restoring the signal mask after the tsleep()
is next to useless since the signal is still queued.. This was interacting
with usleep(3) on receipt of a SIGALRM causing it to near busy loop.
Now, we set the new signal mask "permanently" for signanosleep().
Problem noted by: bde
1. Use the default function to access all the specfs operations.
2. Use the default function to access all the fifofs operations.
3. Use the default function to access all the ufs operations.
4. Fix VCALL usage in vfs_cache.c
5. Use VOCALL to access specfs functions in devfs_vnops.c
6. Staticize most of the spec and fifofs vnops functions.
7. Make UFS panic if it lacks bits of the underlying storage handling.
Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.
A couple of finer points by: bde
the struct kmemstats that describes the type.
This allows subsystems to declare their malloc types locally
and <sys/malloc.h> doesn't need tweaked everytime somebody
gets an idea. You can even have a type local to a lkm...
I don't know if we really need the longdesc, comments welcome.
TODO: There is a single nit in ext2fs, that will be fixed later,
and I intend to remove all unused malloc types and distribute
the rest closer to their use.
now corrected. New tunables/instrumentation added. The code is now
likely "good enough to use." I will add the userland support soon.
The "high performance" mode for raw devices is still missing, and will
be added next. POSIX system calls that now appear to work:
aio_cancel, aio_error, aio_read, aio_return, aio_suspend, aio_write,
lio_listio. Missing, but to be added soon: aio_fsync.
pointy hat last? :-]
When one is selecting (or polling) for write, it helps if we use the
write side of the pipe when requesting wakeups instead of the read side.
This broke ghostview (at least) - I'm suprised it wasn't noticed for
so long.
Reviewed by: Greg Lehey <grog@lemis.com>
in a P6 SMP system. Some MB bios'es don't set the registers up correctly
for the AP's. Additionally, set the memory between 0xa0000 and 0xbffff
as write combining.
1. Clustered I/O is switched by the MNT_NOCLUSTERR and MNT_NOCLUSTERW
bits of the mnt_flag. The sysctl variables, vfs.foo.doclusterread
and vfs.foo.doclusterwrite are deleted. Only mount option can
control clustered I/O from userland.
2. When foofs_mount mounts block device, foofs_mount checks D_CLUSTERR
and D_CLUSTERW bits of the d_flags member in the block device switch
table. If D_NOCLUSTERR / D_NOCLUSTERW are set, MNT_NOCLUSTERR /
MNT_NOCLUSTERW bits will be set. In this case, MNT_NOCLUSTERR and
MNT_NOCLUSTERW cannot be cleared from userland.
3. Vnode driver disables both clustered read and write.
4. Union filesystem disables clutered write.
Reviewed by: bde
(usually a couple of thousand) to 25. The measured impact on cache-hits
doesn't justify spending memory this way:
Target number of free vnodes versus namecache hit rate in % during a
make world:
10 98.5316
200 98.5479
500 98.5546
1000 98.5709
3000 98.6006
4000 98.6126
hash chain traversal isn't needed. This also allows untimeout to recompute
the hash to find the bucket that the entry to remove is stored in so
that each callout entry no longer needs to store that information.
Reviewed by: Nate Williams <nate@mt.sri.com>
Add support for "interrupt driven configuration hooks".
A component of the kernel can register a hook, most likely
during auto-configuration, and receive a callback once
interrupt services are available. This callback will occur before
the root and dump devices are configured, so the configuration
task can affect the selection of those two devices or complete
any tasks that need to be performed prior to launching init.
System boot is posponed so long as a hook is registered. The
hook owner is responsible for removing the hook once their task
is complete or the system boot can continue.
kern_acct.c kern_clock.c kern_exit.c kern_synch.c kern_time.c:
Change the interface and implementation for the kernel callout
service. The new implemntaion is based on the work of
Adam M. Costello and George Varghese, published in a technical
report entitled "Redesigning the BSD Callout and Timer Facilities".
The interface used in FreeBSD is a little different than the one
outlined in the paper. The new function prototypes are:
struct callout_handle timeout(void (*func)(void *),
void *arg, int ticks);
void untimeout(void (*func)(void *), void *arg,
struct callout_handle handle);
If a client wishes to remove a timeout, it must store the
callout_handle returned by timeout and pass it to untimeout.
The new implementation gives 0(1) insert and removal of callouts
making this interface scale well even for applications that
keep 100s of callouts outstanding.
See the updated timeout.9 man page for more details.
Add cpu_rootconf and cpu_dumpconf so that configuring these
two devices can be better controlled by the MI configuration
code.
machdep.c:
MD initialization code for the new callout interface.
trap.c:
Add support for printing out whether cam interrupts are masked
during a panic.
plus the previous changes to use the zone allocator decrease the useage
of malloc by half. The Zone allocator will be upgradeable to be able
to use per CPU-pools, and has more intelligent usage of SPLs. Additionally,
it has reasonable stats gathering capabilities, while making most calls
inline.
extern in <sys/malloc.h> and it should not have been staticized for
the !(KMEMSTATS || DIAGNOSTIC) case.
Fixed the !(KMEMSTATS || DIAGNOSTIC) case. The MALLOC() and FREE()
macros are evil, but code generally doesn't allow for this and some code
involving else clauses did not compile.
Finished staticization.
or a partition is larger than the slice.
Now `disklabel -Brw sdX auto' should fail properly on sliced disks
without partition of type 165, e.g., on zip disks with the factory
default formatting. Previously it set a bogus in-core label for
the compatibility slice and used this to corrupt the MBR (the slice
has offset 0 and size 0, but setting the label in effect corrupted
its size to nonzero).
`disklabel -Brw sdX auto' already failed properly on normally (not
dangerously dedicated) sliced disks _with_ partition of type 165,
because the compatibility slice has a nonzero offset so the MBR
remained inaccessible when the size was corrupted.
This bug only affected in-core labels. On-disk labels are checked
carefully when they read and written.
A couple of stylistic nits from Bruce.
If your libc contains version 1.11 or 1.12 of getcwd.c, (ie: if
you recompiled libc one of the last couple of days):
>>> Recompile LIBC before you boot a new kernel <<<
A new libc will deal with both old and new kernels.
adapted from NetBSD.. However, there are some differences in the tty
system that are big enough to cause their code to not fit comfortably.
Obtained from: NetBSD (I think)
detail is passed back and forwards). This mostly came from NetBSD, except
that our interfaces have changed a lot and this funciton is in a different
part of the kernel.
Obtained from: NetBSD
The implementation is done (unlike what i've originally been
contemplating) by reparenting kids of processes that have the
appropriate bit set to PID 1, and let PID 1 handle the zombie. This
is far less problematical than what would seem to be ``doing it
right'', for a number of reasons.
Of our currently shipping PID-1-intended programs, 50 % fail the above
assumption. ;-) (Read this: sysinstall doesn't do it right. This is
no problem as long as no program called by sysinstall actually uses
SA_NOCLDWAIT.)
ToDo: . clarify the correct SA_* flag inheritance, compared
to other systems,
. decide whether the compat cruft (osigvec(9)) should
deal with new system additions or not,
. merge OpenBSD's SA_SIGINFO implementation. ;)
Reviewed by: bde
local filesystem metadata at the first brelse call when the
block device vnode has v_tag set to VT_NFS.
Reviewed by: phk
Submitted by: Tor Egge <tegge@idi.ntnu.no>
declaration macros so that a semicolon can be added when the macros
are invoked without giving a (pedantic) syntax error. Invocations
need to be followed by a semicolon so that programs like indent and
gtags don't get confused.
Fixed the one invocation that wasn't followed by a trailing semicolon.
since that might cause in_pcballoc to call MALLOC with M_WAITOK during
a software interrupt.
Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
Introduce VFREE which indicates that vnode is on freelist.
Rename vholdrele() to vdrop().
Create vfree() and vbusy() to add/delete vnode from freelist.
Add vfree()/vbusy() to keep (v_holdcnt != 0 || v_usecount != 0)
vnodes off the freelist.
Generalize vhold()/v_holdcnt to mean "do not recycle".
Fix reassignbuf()s lack of use of vhold().
Use vhold() instead of checking v_cache_src list.
Remove vtouch(), the vnodes are always vget'ed soon enough
after for it to have any measuable effect.
Add sysctl debug.freevnodes to keep track of things.
Move cache_purge() up in getnewvnodes to avoid race.
Decrement v_usecount after VOP_INACTIVE(), put a vhold() on
it during VOP_INACTIVE()
Unmacroize vhold()/vdrop()
Print out VDOOMED and VFREE flags (XXX: should use %b)
Reviewed by: dyson
holding CPU along with the lock. When a CPU fails to get the lock
it compares its own id to the holder id. If they are the same it
panic()s, as simple locks are binary, and this would cause a deadlock.
Controlled by smptests.h: SL_DEBUG, ON by default.
Some minor cleanup.
Add a simplelock to deal with disable_intr()/enable_intr() as used in UP kernel.
UP kernel expects that this is enough to guarantee exclusive access to
regions of code bracketed by these 2 functions.
Add a simplelock to bracket clock accesses in clock.c: clock_lock.
Help from: Bruce Evans <bde@zeta.org.au>
smp_active = 1 used to indicate that the system had frozen previously
started AP's, while smp_active = 0 was "AP's not yet started". I have split
this into smp_started (which is set when the AP's come online), and
smp_active is left for turning on/off AP scheduling.
- We now have enough per-cpu idle context, the real idle loop has been
revived (cpu's halt now with nothing to do).
- Some preliminary support for running some operations outside the
global lock (eg: zeroing "free but not yet zeroed pages") is present
but appears to cause problems. Off by default.
- the smp_active sysctl now behaves differently. It's merely a 'true/false'
option. Setting smp_active to zero causes the AP's to halt in the idle
loop and stop scheduling processes.
- bootstrap is a lot safer. Instead of sharing a statically compiled in
stack a number of times (which has caused lots of problems) and then
abandoning it, we use the idle context to boot the AP's directly. This
should help >2 cpu support since the bootlock stuff was in doubt.
- print physical apic id in traps.. helps identify private pages getting
out of sync. (You don't want to know how much hair I tore out with this!)
More cleanup to follow, this is more of a checkpoint than a
'finished' thing.
This unifies several times in theory indentical 50 lines of code.
The filesystems have a new method: vop_cachedlookup, which is the
meat of the lookup, and use vfs_cache_lookup() for their vop_lookup
method. vfs_cache_lookup() will check the namecache and pass on
to the vop_cachedlookup method in case of a miss.
It's still the task of the individual filesystems to populate the
namecache with cache_enter().
Filesystems that do not use the namecache will just provide the
vop_lookup method as usual.