Commit Graph

304 Commits

Author SHA1 Message Date
phk
36c3965ff9 Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by:    peter
2000-05-05 09:59:14 +00:00
phk
5df766a0f8 Rename the existing BUF_STRATEGY() to DEV_STRATEGY()
substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)

substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)

This patch is machine generated except for the ccd.c and buf.h parts.
2000-03-20 11:29:10 +00:00
chris
dd595b28f0 In vn_isdisk(), check whether vp->v_rdev is NULL. If it is, then
return ENXIO (Device not configured).  Without this, vn_isdisk()
could (and did in the case of lstat() under fdesc) pass a NULL pointer
to devsw(), which caused a page fault.

Reviewed by:	alfred
2000-03-18 01:27:44 +00:00
phk
6b3385b773 Eliminate the undocumented, experimental, non-delivering and highly
dangerous MAX_PERF option.
2000-03-16 08:51:55 +00:00
bde
33e9ba0226 Don't try so hard to make the lower 16 bits of fsids unique. It tended
to recycle full fsids after only 16 mount/unmount's.  This is probably
too often for exported fsids.  Now we recycle the full fsids only
after 2^16 mount/ umount's and only ensure uniqueness in the lower 16
bits if there have been <= 256 calls to vfs_getnewfsid() since the
system started.
2000-03-14 14:19:49 +00:00
bde
2af0bce593 Try harder to make the lower 16 bits of fsids unique. The vfs type
number was packed very wastefully, giving perfect non-uniqeness in
the lower 16 bits of fsids for filesystems with the same vfs type.
This made linux_stat() return perfectly non-unique (broken) 16-bit
st_dev's for nfs mount points, and effectively reduced mntid_base to
8 bits so that the vfs_getnewfsid() looped endlessly when there are
already 256 mounted filesystems with the required vfs type.

Approved by:	jkh
2000-03-12 14:23:21 +00:00
sos
dc230127da Do refcounting of open devices (more) correctly.
count_dev funtion by phk.
2000-02-07 23:05:40 +00:00
rwatson
7cc7ecf5e1 Remove static qualifier from vgonel, as it is needed by the Arla folk
outside of vfs_subr.c.

Submitted by:	Assar Westerlund <assar@sics.se>
Reviewed by:	rwatson
Approved by:	jkh
2000-02-02 07:07:17 +00:00
rwatson
2aada2e694 This patch fixes a locking bug that can result in deadlock if
the codepath is followed.

From the PR:

  vclean calls vrele leading to deadlock (if usecount > 0)

  vclean() calls vrele() if v_usecount of the node was higher than one.
  But before calling it, it sets the VXLOCK flag, which will make
  vn_lock called from vrele dead-lock.

PR:		kern/15117
Submitted by:	Assar Westerlund <assar@stacken.kth.se>
Reviewed by:	rwatson
Obtained from:	NetBSD
2000-01-29 15:22:58 +00:00
phk
ae0c1ec8f7 Give vn_isdisk() a second argument where it can return a suitable errno.
Suggested by:	bde
2000-01-10 12:04:27 +00:00
mckusick
a44e140976 Remove the P_BUFEXHAUST flag from the syncer process (leaving
it only on the buf_daemon process). The problem is that when the
syncer process starts running the worklist, it wants to delete
lots of files. It does this by VFS_VGET'ing the vnodes, clearing
the blocks in them and bdwrite'ing the buffer. It can process close
to a thousand files per second which generates a large number of
dirty buffers. So, giving it special priviledge at the buffer trough
leads to trouble as the buf_daemon does occationally need a free
buffer to proceed and if the syncer has used every last one up,
we are toast.
2000-01-10 00:07:24 +00:00
eivind
767bad2cc1 Change NDFREE() from a macro to a function for the time being; the macro
version caused intolerable bloat (30k).  I'm likely to revisit this with an
attempt at a smarter macro.

Bloat noticed by:       bde
2000-01-08 16:20:06 +00:00
luoqi
e100d44d55 Introduce a mechanism to suspend/resume system processes. Suspend syncer
and bufdaemon prior to disk sync during system shutdown.
2000-01-07 08:36:44 +00:00
dillon
c6689c797d Enhance reassignbuf(). When a buffer cannot be time-optimally inserted
into vnode dirtyblkhd we append it to the list instead of prepend it to
    the list in order to maintain a 'forward' locality of reference, which
    is arguably better then 'reverse'.  The original algorithm did things this
    way to but at a huge time cost.

    Enhance the append interlock for NFS writes to handle intr/soft mounts
    better.

    Fix the hysteresis for NFS async daemon I/O requests to reduce the
    number of unnecessary context switches.

    Modify handling of NFS mount options.  Any given user option that is
    too high now defaults to the kernel maximum for that option rather then
    the kernel default for that option.

Reviewed by:	 Alfred Perlstein <bright@wintelcom.net>
2000-01-05 05:11:37 +00:00
mckusick
0529f59c27 Prettyness police: Identify flags in b_xflags with BX_ to distinguish
them from flags in b_flags which are prefixed with B_
1999-12-22 03:11:04 +00:00
dillon
b66fb2c648 Add MAP_NOSYNC feature to mmap(), and MADV_NOSYNC and MADV_AUTOSYNC to
madvise().

    This feature prevents the update daemon from gratuitously flushing
    dirty pages associated with a mapped file-backed region of memory.  The
    system pager will still page the memory as necessary and the VM system
    will still be fully coherent with the filesystem.  Modifications made
    by other means to the same area of memory, for example by write(), are
    unaffected.  The feature works on a page-granularity basis.

    MAP_NOSYNC allows one to use mmap() to share memory between processes
    without incuring any significant filesystem overhead, putting it in
    the same performance category as SysV Shared memory and anonymous memory.

Reviewed by: julian, alc, dg
1999-12-12 03:19:33 +00:00
eivind
287836faea Lock reporting and assertion changes.
* lockstatus() and VOP_ISLOCKED() gets a new process argument and a new
  return value: LK_EXCLOTHER, when the lock is held exclusively by another
  process.
* The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them
* Extend the vnode_if.src format to allow more exact specification than
  locked/unlocked.

This commit should not do any semantic changes unless you are using
DEBUG_VFS_LOCKS.

Discussed with:	grog, mch, peter, phk
Reviewed by:	peter
1999-12-11 16:13:02 +00:00
dillon
881e17e778 Remove vfs_getrootfsid() function (a temporary hack added a few months
ago to make BOOTP work again).  It is no longer required by BOOTP and
    no longer used.
1999-11-29 22:25:36 +00:00
phk
1848d96439 Convert various pieces of code to use vn_isdisk() rather than checking
for vp->v_type == VBLK.

In ccd: we don't need to call VOP_GETATTR to find the type of a vnode.

Reviewed by:    sos
1999-11-22 10:33:55 +00:00
phk
1adcecffd9 struct mountlist and struct mount.mnt_list have no business being
a CIRCLEQ.  Change them to TAILQ_HEAD and TAILQ_ENTRY respectively.

This removes ugly  mp != (void*)&mountlist  comparisons.

Requested by:   phk
Submitted by:   Jake Burkholder jake@checker.org
PR:             14967
1999-11-20 10:00:46 +00:00
phk
ec4e24bd52 Commit the remaining part of PR14914:
Alot of the code in sys/kern directly accesses the *Q_HEAD and *Q_ENTRY
   structures for list operations.  This patch makes all list operations
   in sys/kern use the queue(3) macros, rather than directly accessing the
   *Q_{HEAD,ENTRY} structures.

Reviewed by:    phk
Submitted by:   Jake Burkholder <jake@checker.org>
PR:     14914
1999-11-16 16:28:58 +00:00
phk
8c9bc6b146 Next step in the device cleanup process.
Correctly lock vnodes when calling VOP_OPEN() from filesystem mount code.

Unify spec_open() for bdev and cdev cases.

Remove the disabled bdev specific read/write code.
1999-11-09 14:15:33 +00:00
phk
8e3c3eafed useracc() the prequel:
Merge the contents (less some trivial bordering the silly comments)
of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>.  This puts
the #defines for the vm_inherit_t and vm_prot_t types next to their
typedefs.

This paves the road for the commit to follow shortly: change
useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE}
as argument.
1999-10-29 18:09:36 +00:00
peter
787140aa42 Trim unused options (or #ifdef for undoc options).
Submitted by:	phk
1999-10-11 15:19:12 +00:00
phk
8b06d6a2fb Move the buffered read/write code out of spec_{read|write} and into
two new functions spec_buf{read|write}.

Add sysctl vfs.bdev_buffered which defaults to 1 == true.  This
sysctl can be used to experimentally turn buffered behaviour for
bdevs off.  I should not be changed while any blockdevices are
open.  Remove the misplaced sysctl vfs.enable_userblk_io.

No other changes in behaviour.
1999-10-04 11:23:10 +00:00
phk
073b941095 Remove v_maxio from struct vnode.
Replace it with mnt_iosize_max in struct mount.

Nits from:	bde
1999-09-29 20:05:33 +00:00
dillon
493548f6b6 Final commit to remove vnode->v_lastr. vm_fault now handles read
clustering issues (replacing code that used to be in
    ufs/ufs/ufs_readwrite.c).  vm_fault also now uses the new VM page counter
    inlines.

    This completes the changeover from vnode->v_lastr to vm_entry_t->v_lastr
    for VM, and fp->f_nextread and fp->f_seqcount (which have been in the
    tree for a while).  Determination of the I/O strategy (sequential, random,
    and so forth) is now handled on a descriptor-by-descriptor basis for
    base I/O calls, and on a memory-region-by-memory-region and
    process-by-process basis for VM faults.

Reviewed by:	David Greenman <dg@root.com>, Alan Cox <alc@cs.rice.edu>
1999-09-21 00:36:16 +00:00
phk
c0818069f7 Initialize vp->v_maxio to its default in getnetvnode() rather than
four different places in vfs_cluster.c
1999-09-20 19:53:23 +00:00
dillon
beba2c930c Fix BOOTP root FS mounts. Also cleanup vfs_getnewfsid() and collapse
addaliasu() into addalias() (no operational change) and clarify comments
    relating to a trick that vclean() uses.

    The fix to BOOTP is yet another hack.  Actually, rootfsid handling
    is already a major hack.  The whole thing needs to be cleaned up.

Reviewed by:	David Greenman <dg@root.com>, Alan Cox <alc@cs.rice.edu>
1999-09-19 06:24:21 +00:00
dillon
2c39c80e00 Add vfs.enable_userblk_io sysctl to control whether user reads and writes
to buffered block devices are allowed.  The default is to be backwards
    compatible, i.e. reads and writes are allowed.

    The idea is for a larger crowd to start running with this disabled and
    see what problems, if any, crop up, and then to change the default to
    off and see if any problems crop up in the next 6 months prior to
    potentially removing support entirely.  There are still a few people,
    Julian and myself included, who believe the buffered block device
    access from usermode to be useful.

    Remove use of vnode->v_lastr from buffered block device I/O in
    preparation for removal of vnode->v_lastr field, replacing it with
    the already existing seqcount metric to detect sequential operation.

Reviewed by:	Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
1999-09-17 06:10:27 +00:00
phk
244faf2e1b Add dev_t freeing code. Controlled by sysctl debug.free_devt, default
is off.
1999-08-29 09:09:12 +00:00
phk
d311a0563b remove unused variables. 1999-08-28 19:21:03 +00:00
peter
3b842d34e8 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
phk
591c94d4c6 Simplify the handling of VCHR and VBLK vnodes using the new dev_t:
Make the alias list a SLIST.

        Drop the "fast recycling" optimization of vnodes (including
        the returning of a prexisting but stale vnode from checkalias).
        It doesn't buy us anything now that we don't hardlimit
        vnodes anymore.

        Rename checkalias2() and checkalias() to addalias() and
        addaliasu() - which takes dev_t and udev_t arg respectively.

        Make the revoke syscalls use vcount() instead of VALIASED.

        Remove VALIASED flag, we don't need it now and it is faster
        to traverse the much shorter lists than to maintain the
        flag.

        vfs_mountedon() can check the dev_t directly, all the vnodes
        point to the same one.

Print the devicename in specfs/vprint().

Remove a couple of stale LFS vnode flags.

Remove unimplemented/unused LK_DRAINED;
1999-08-26 14:53:31 +00:00
phk
ea55d63475 Introduce vn_isdisk(struct vnode *vp) function, and use it to test for diskness. 1999-08-25 12:24:39 +00:00
julian
7ceff14f18 Make DEVFS use PHK's specinfo struct as the source of dev_t and devsw.
In lookup() however it's the other way around as we need to supply the
dev_t for the vnode, so devfs still has a copy of it stashed away.

Sourcing it from the vnode in the vnops however is useful as it makes
a lot of the code almost the same as that in specfs.
1999-08-25 04:55:20 +00:00
jdp
9f71d680aa Support full-precision file timestamps. Until now, only the seconds
have been maintained, and that is still the default.  A new sysctl
variable "vfs.timestamp_precision" can be used to enable higher
levels of precision:

      0 = seconds only; nanoseconds zeroed (default).
      1 = seconds and nanoseconds, accurate within 1/HZ.
      2 = seconds and nanoseconds, truncated to microseconds.
    >=3 = seconds and nanoseconds, maximum precision.

Level 1 uses getnanotime(), which is fast but can be wrong by up
to 1/HZ.  Level 2 uses microtime().  It might be desirable for
consistency with utimes() and friends, which take timeval structures
rather than timespecs.  Level 3 uses nanotime() for the higest
precision.

I benchmarked levels 0, 1, and 3 by copying a 550 MB tree with
"cpio -pdu".  There was almost negligible difference in the system
times -- much less than 1%, and less than the variation among
multiple runs at the same level.  Bruce Evans dreamed up a torture
test involving 1-byte reads with intervening fstat() calls, but
the cpio test seems more realistic to me.

This feature is currently implemented only for the UFS (FFS and
MFS) filesystems.  But I think it should be easy to support it in
the others as well.

An earlier version of this was reviewed by Bruce.  He's not to
blame for any breakage I've introduced since then.

Reviewed by:	bde (an earlier version of the code)
1999-08-22 00:15:16 +00:00
phk
7b7ae40370 The bdevsw() and cdevsw() are now identical, so kill the former. 1999-08-13 10:29:38 +00:00
phk
683c2698ff s/v_specinfo/v_rdev/ 1999-08-13 10:10:12 +00:00
phk
e938d317d5 Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>,
a few lines into <sys/vnode.h>.

Add a few fields to struct specinfo, paving the way for the fun part.
1999-08-08 18:43:05 +00:00
alc
743eeab638 Add sysctl and support code to allow directories to be VMIO'd. The default
setting for the sysctl is OFF, which is the historical operation.

Submitted by:	dillon
1999-07-26 06:25:53 +00:00
phk
cacc73aa18 Now a dev_t is a pointer to struct specinfo which is shared by all specdev
vnodes referencing this device.

Details:
        cdevsw->d_parms has been removed, the specinfo is available
        now (== dev_t) and the driver should modify it directly
        when applicable, and the only driver doing so, does so:
        vn.c.  I am not sure the logic in checking for "<" was right
        before, and it looks even less so now.

        An intial pool of 50 struct specinfo are depleted during
        early boot, after that malloc had better work.  It is
        likely that fewer than 50 would do.

        Hashing is done from udev_t to dev_t with a prime number
        remainder hash, experiments show no better hash available
        for decent cost (MD5 is only marginally better)  The prime
        number used should not be close to a power of two, we use
        83 for now.

        Add new checkalias2() to get around the loss of info from
        dev2udev() in bdevvp();

        The aliased vnodes are hung on a list straight of the dev_t,
        and speclisth[SPECSZ] is unused.  The sharing of struct
        specinfo means that the v_specnext moves into the vnode
        which grows by 4 bytes.

        Don't use a VBLK dev_t which doesn't make sense in MFS, now
        we hang a dummy cdevsw on B/Cmaj 253 so that things look sane.

	Storage overhead from all of this is O(50k).

        Bump __FreeBSD_version to 400009

The next step will add the stuff needed so device-drivers can start to
hang things from struct specinfo
1999-07-20 09:47:55 +00:00
phk
f3c07181e3 [click] Now all dev_t's in the kernel have their char device major.
Only know casualy of this is swapinfo/pstat which should be fixes
the right way:  Store the actual pathname in the kernel like mount
does.  [Volounteers sought for this task]

The road map from here is roughly:  expand struct specinfo into struct
based dev_t.  Add dev_t registration facilities for device drivers and
start to use them.
1999-07-19 09:37:59 +00:00
phk
63c9fe9157 Introduce the vn_todev(struct vnode*) function, which returns the dev_t
corresponding to a VBLK or VCHR node, or NODEV.
1999-07-18 14:30:37 +00:00
phk
fb0144f561 Fix 2nd arg to udev2dev(). 1999-07-17 19:38:00 +00:00
phk
6c373ff516 I have not one single time remembered the name of this function correctly
so obviously I gave it the wrong name.  s/umakedev/makeudev/g
1999-07-17 18:43:50 +00:00
kris
d5babdb93c Correct a couple of spelling errors in comments. 1999-07-12 15:02:51 +00:00
mckusick
52ea4270f3 These changes appear to give us benefits with both small (32MB) and
large (1G) memory machine configurations.  I was able to run 'dbench 32'
on a 32MB system without bring the machine to a grinding halt.

    * buffer cache hash table now dynamically allocated.  This will
      have no effect on memory consumption for smaller systems and
      will help scale the buffer cache for larger systems.

    * minor enhancement to pmap_clearbit().  I noticed that
      all the calls to it used constant arguments.  Making
      it an inline allows the constants to propogate to
      deeper inlines and should produce better code.

    * removal of inherent vfs_ioopt support through the emplacement
      of appropriate #ifdef's, with John's permission.  If we do not
      find a use for it by the end of the year we will remove it entirely.

    * removal of getnewbufloops* counters & sysctl's - no longer
      necessary for debugging, getnewbuf() is now optimal.

    * buffer hash table functions removed from sys/buf.h and localized
      to vfs_bio.c

    * VFS_BIO_NEED_DIRTYFLUSH flag and support code added
      ( bwillwrite() ), allowing processes to block when too many dirty
      buffers are present in the system.

    * removal of a softdep test in bdwrite() that is no longer necessary
      now that bdwrite() no longer attempts to flush dirty buffers.

    * slight optimization added to bqrelse() - there is no reason
      to test for available buffer space on B_DELWRI buffers.

    * addition of reverse-scanning code to vfs_bio_awrite().
      vfs_bio_awrite() will attempt to locate clusterable areas
      in both the forward and reverse direction relative to the
      offset of the buffer passed to it.  This will probably not
      make much of a difference now, but I believe we will start
      to rely on it heavily in the future if we decide to shift
      some of the burden of the clustering closer to the actual
      I/O initiation.

    * Removal of the newbufcnt and lastnewbuf counters that Kirk
      added.  They do not fix any race conditions that haven't already
      been fixed by the gbincore() test done after the only call
      to getnewbuf().  getnewbuf() is a static, so there is no chance
      of it being misused by other modules.  ( Unless Kirk can think
      of a specific thing that this code fixes.  I went through it
      very carefully and didn't see anything ).

    * removal of VOP_ISLOCKED() check in flushbufqueues().  I do not
      think this check is necessary, the buffer should flush properly
      whether the vnode is locked or not. ( yes? ).

    * removal of extra arguments passed to getnewbuf() that are not
      necessary.

    * missed cluster_wbuild() that had to be a cluster_wbuild_wb() in
      vfs_cluster.c

    * vn_write() now calls bwillwrite() *PRIOR* to locking the vnode,
      which should greatly aid flushing operations in heavy load
      situations - both the pageout and update daemons will be able
      to operate more efficiently.

    * removal of b_usecount.  We may add it back in later but for now
      it is useless.  Prior implementations of the buffer cache never
      had enough buffers for it to be useful, and current implementations
      which make more buffers available might not benefit relative to
      the amount of sophistication required to implement a b_usecount.
      Straight LRU should work just as well, especially when most things
      are VMIO backed.  I expect that (even though John will not like
      this assumption) directories will become VMIO backed some point soon.

Submitted by:	Matthew Dillon <dillon@backplane.com>
Reviewed by:	Kirk McKusick <mckusick@mckusick.com>
1999-07-08 06:06:00 +00:00
mckusick
9d4f0d78fa The buffer queue mechanism has been reformulated. Instead of having
QUEUE_AGE, QUEUE_LRU, and QUEUE_EMPTY we instead have QUEUE_CLEAN,
QUEUE_DIRTY, QUEUE_EMPTY, and QUEUE_EMPTYKVA.  With this patch clean
and dirty buffers have been separated.  Empty buffers with KVM
assignments have been separated from truely empty buffers.  getnewbuf()
has been rewritten and now operates in a 100% optimal fashion.  That is,
it is able to find precisely the right kind of buffer it needs to
allocate a new buffer, defragment KVM, or to free-up an existing buffer
when the buffer cache is full (which is a steady-state situation for
the buffer cache).

Buffer flushing has been reorganized.  Previously buffers were flushed
in the context of whatever process hit the conditions forcing buffer
flushing to occur.  This resulted in processes blocking on conditions
unrelated to what they were doing.  This also resulted in inappropriate
VFS stacking chains due to multiple processes getting stuck trying to
flush dirty buffers or due to a single process getting into a situation
where it might attempt to flush buffers recursively - a situation that
was only partially fixed in prior commits.  We have added a new daemon
called the buf_daemon which is responsible for flushing dirty buffers
when the number of dirty buffers exceeds the vfs.hidirtybuffers limit.
This daemon attempts to dynamically adjust the rate at which dirty buffers
are flushed such that getnewbuf() calls (almost) never block.

The number of nbufs and amount of buffer space is now scaled past the
8MB limit that was previously imposed for systems with over 64MB of
memory, and the vfs.{lo,hi}dirtybuffers limits have been relaxed
somewhat.  The number of physical buffers has been increased with the
intention that we will manage physical I/O differently in the future.

reassignbuf previously attempted to keep the dirtyblkhd list sorted which
could result in non-deterministic operation under certain conditions,
such as when a large number of dirty buffers are being managed.  This
algorithm has been changed.  reassignbuf now keeps buffers locally sorted
if it can do so cheaply, and otherwise gives up and adds buffers to
the head of the dirtyblkhd list.  The new algorithm is deterministic but
not perfect.  The new algorithm greatly reduces problems that previously
occured when write_behind was turned off in the system.

The P_FLSINPROG proc->p_flag bit has been replaced by the more descriptive
P_BUFEXHAUST bit.  This bit allows processes working with filesystem
buffers to use available emergency reserves.  Normal processes do not set
this bit and are not allowed to dig into emergency reserves.  The purpose
of this bit is to avoid low-memory deadlocks.

A small race condition was fixed in getpbuf() in vm/vm_pager.c.

Submitted by:	Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by:	Kirk McKusick <mckusick@mckusick.com>
1999-07-04 00:25:38 +00:00
phk
b6d067fe70 Make sure that stat(2) and friends always return a valid st_dev field.
Pseudo-FS need not fill in the va_fsid anymore, the syscall code
will use the first half of the fsid, which now looks like a udev_t
with major 255.
1999-07-02 16:29:47 +00:00
peter
6cb5fe6c6b Slight reorganization of kernel thread/process creation. Instead of using
SYSINIT_KT() etc (which is a static, compile-time procedure), use a
NetBSD-style kthread_create() interface.  kproc_start is still available
as a SYSINIT() hook.  This allowed simplification of chunks of the
sysinit code in the process.  This kthread_create() is our old kproc_start
internals, with the SYSINIT_KT fork hooks grafted in and tweaked to work
the same as the NetBSD one.

One thing I'd like to do shortly is get rid of nfsiod as a user initiated
process.  It makes sense for the nfs client code to create them on the
fly as needed up to a user settable limit.  This means that nfsiod
doesn't need to be in /sbin and is always "available".  This is a fair bit
easier to do outside of the SYSINIT_KT() framework.
1999-07-01 13:21:46 +00:00
mckusick
5b58f2f951 Convert buffer locking from using the B_BUSY and B_WANTED flags to using
lockmgr locks. This commit should be functionally equivalent to the old
semantics. That is, all buffer locking is done with LK_EXCLUSIVE
requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will
be done in future commits.
1999-06-26 02:47:16 +00:00
mckusick
88e39a63db Add a vnode argument to VOP_BWRITE to get rid of the last vnode
operator special case. Delete special case code from vnode_if.sh,
vnode_if.src, umap_vnops.c, and null_vnops.c.
1999-06-16 23:27:55 +00:00
mckusick
02e5fe8035 Get rid of the global variable rushjob and replace it with a function in
kern/vfs_subr.c named speedup_syncer() which handles the speedup request.
Change the various clients of rushjob to use the new function.
1999-06-15 23:37:29 +00:00
phk
6a5dc97620 Simplify cdevsw registration.
The cdevsw_add() function now finds the major number(s) in the
struct cdevsw passed to it.  cdevsw_add_generic() is no longer
needed, cdevsw_add() does the same thing.

cdevsw_add() will print an message if the d_maj field looks bogus.

Remove nblkdev and nchrdev variables.  Most places they were used
bogusly.  Instead check a dev_t for validity by seeing if devsw()
or bdevsw() returns NULL.

Move bdevsw() and devsw() functions to kern/kern_conf.c

Bump __FreeBSD_version to 400006

This commit removes:
        72 bogus makedev() calls
        26 bogus SYSINIT functions

if_xe.c bogusly accessed cdevsw[], author/maintainer please fix.

I4b and vinum not changed.  Patches emailed to authors.  LINT
probably broken until they catch up.
1999-05-31 11:29:30 +00:00
jb
5dc5dd6c49 Remove the test for bdevsw(dev) == NULL from bdevvp() because it fails
if there is no character device associated with the block device. In this
case that doesn't matter because bdevvp() doesn't use the character
device structure.

I can use the pointy bit of the axe too.
1999-05-24 00:34:10 +00:00
luoqi
6f6fbfa99e Legally acquire a major number for mfs. 1999-05-14 20:40:23 +00:00
mckusick
48fc2a0eac Previously directories were sync'ed every 10 seconds while bitmaps &
inodes were synced every 15 seconds. This is now reversed as during
directory create, we cannot commit the directory entry until its
inode has been written. With this switch, the inodes will be more
likely to be written by the time that the directory is written thus
reducing the number of directory rollbacks that are needed.
1999-05-14 01:29:21 +00:00
peter
64dd9c44eb Fix (?) SPECHASH dev_t/major/minor/etc args 1999-05-12 19:06:40 +00:00
phk
23c70ba4d7 Don't peek into dev_t 1999-05-12 11:06:07 +00:00
phk
7e26ca1d1a Divorce "dev_t" from the "major|minor" bitmap, which is now called
udev_t in the kernel but still called dev_t in userland.

Provide functions to manipulate both types:
        major()         umajor()
        minor()         uminor()
        makedev()       umakedev()
        dev2udev()      udev2dev()

For now they're functions, they will become in-line functions
after one of the next two steps in this process.

Return major/minor/makedev to macro-hood for userland.

Register a name in cdevsw[] for the "filedescriptor" driver.

In the kernel the udev_t appears in places where we have the
major/minor number combination, (ie: a potential device: we
may not have the driver nor the device), like in inodes, vattr,
cdevsw registration and so on, whereas the dev_t appears where
we carry around a reference to a actual device.

In the future the cdevsw and the aliased-from vnode will be hung
directly from the dev_t, along with up to two softc pointers for
the device driver and a few houskeeping bits.  This will essentially
replace the current "alias" check code (same buck, bigger bang).

A little stunt has been provided to try to catch places where the
wrong type is being used (dev_t vs udev_t), if you see something
not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if
it makes a difference.  If it does, please try to track it down
(many hands make light work) or at least try to reproduce it
as simply as possible, and describe how to do that.

Without DEVT_FASCIST I belive this patch is a no-op.

Stylistic/posixoid comments about the userland view of the <sys/*.h>
files welcome now, from userland they now contain the end result.

Next planned step: make all dev_t's refer to the same devsw[] which
means convert BLK's to CHR's at the perimeter of the vnodes and
other places where they enter the game (bootdev, mknod, sysctl).
1999-05-11 19:55:07 +00:00
phk
bac74fbd54 Fix some of the places where too much inside knowledge about major/minor
layout and dev_t structure is being (ab)used.
1999-05-08 07:02:41 +00:00
phk
500e41bd71 I got tired of seeing all the cdevsw[major(foo)] all over the place.
Made a new (inline) function devsw(dev_t dev) and substituted it.

Changed to the BDEV variant to this format as well: bdevsw(dev_t dev)

DEVFS will eventually benefit from this change too.
1999-05-08 06:40:31 +00:00
phk
693dd58bb3 Continue where Julian left off in July 1998:
Virtualize bdevsw[] from cdevsw.  bdevsw() is now an (inline)
        function.

        Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention
        to the order of the cmaj/bmaj arguments!)

        Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE
        (ditto!)

(Next step will be to convert all bdev dev_t's to cdev dev_t's
before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)
1999-05-07 10:11:40 +00:00
billf
dd35516544 Add sysctl descriptions to many SYSCTL_XXXs
PR:		kern/11197
Submitted by:	Adrian Chadd <adrian@FreeBSD.org>
Reviewed by:	billf(spelling/style/minor nits)
Looked at by:	bde(style)
1999-05-03 23:57:32 +00:00
julian
f27c95753f Reviewed by: Many at differnt times in differnt parts,
including alan, john, me, luoqi, and kirk
Submitted by:	Matt Dillon <dillon@frebsd.org>

This change implements a relatively sophisticated fix to getnewbuf().
There were two problems with getnewbuf(). First, the writerecursion
can lead to a system stack overflow when you have NFS and/or VN
devices in the system. Second, the free/dirty buffer accounting was
completely broken. Not only did the nfs routines blow it trying to
manually account for the buffer state, but the accounting that was
done did not work well with the purpose of their existance: figuring
out when getnewbuf() needs to sleep.

The meat of the change is to kern/vfs_bio.c. The remaining diffs are
all minor except for NFS, which includes both the fixes for bp
interaction AND fixes for a 'biodone(): buffer already done' lockup.
Sys/buf.h also contains a chaining structure which is not used by
this patchset but is used by other patches that are coming soon.
This patch deliniated by tags PRE_MAT_GETBUF and POST_MAT_GETBUF.
(sorry for the missing T matt)
1999-03-12 02:24:58 +00:00
dillon
9b8aaaa737 Reviewed by: Julian Elischer <julian@whistle.com>
Add d_parms() to {c,b}devsw[].  If non-NULL this function points to
    a device routine that will properly fill in the specinfo structure.
    vfs_subr.c's checkalias() supplies appropriate defaults.  This change
    should be fully backwards compatible with existing devices.
1999-02-25 05:22:30 +00:00
dillon
17b6679e7e Protect vn worklist and vn->v_{clean,dirty}blkhd at splbio().
Get rid of extra LIST_REMOVE()

Reviewed by:	hsu@FreeBSD.ORG (Jeffrey Hsu), mckusick@McKusick.COM
Submitted by:	hsu@FreeBSD.ORG (Jeffrey Hsu), dillon@backplane.com ( Matthew Dillon )
1999-02-19 17:36:58 +00:00
dillon
ecfbb44ba1 vp->v_object must be valid after normal flow of vfs_object_create()
completes, change if() to KASSERT().  This is not a bug, we are
    simplify clarifying and optimizing the code.

    In if/else in vfs_object_create(), the failure of both conditionals
    will lead to a NULL object.  Exit gracefully if this case occurs.
    ( this case does not normally occur, but needed to be handled ).

Obtained from: Eivind Eklund <eivind@FreeBSD.org>
1999-02-04 18:25:39 +00:00
dillon
a784c6e33d More const fixes for -Wall, -Wcast-qual 1999-01-29 23:18:50 +00:00
dillon
975fba8a24 Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile
1999-01-28 00:57:57 +00:00
dillon
df24433bbe This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
    fixes, several VM optimizations, and some additional revamping of the
    VM code.  The specific bug fixes will be documented with additional
    forced commits.  This commit is somewhat rough in regards to code
    cleanup issues.

Reviewed by:	"John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
1999-01-21 08:29:12 +00:00
eivind
89e1199534 KNFize, by bde. 1999-01-10 01:58:29 +00:00
eivind
a8dc66f457 Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT as
discussed on -hackers.

Introduce 'KASSERT(assertion, ("panic message", args))' for simple
check + panic.

Reviewed by:	msmith
1999-01-08 17:31:30 +00:00
eivind
ffaaca5874 Remove the 'waslocked' parameter to vfs_object_create(). 1999-01-05 18:50:03 +00:00
eivind
2b3e3223c1 Finish staticization. 1999-01-05 18:12:29 +00:00
bde
734d13314e Ifdefed conditionally used simplock variables. 1999-01-02 11:34:57 +00:00
bde
85c558753b Restored rev.1.31 which was clobbered by rev.1.69 (the big Lite2
merge).  This fixes at least hanging in revoke(2) when a somewhat
active slave pty is revoked.  The hang made the window for the
null pointer bug in ufsspec_{read,write} much larger.

There are many other bugs in this area (revoke of an active fifo
at best leaks memory...).
1998-12-24 12:07:16 +00:00
eivind
a4213663c9 Check return value of tsleep(). I've checked of all call points -
there does not seem to be a problem with this.

PR:		kern/8732
Analysis by:	David G Andersen <danderse@cs.utah.edu>
Tested by:	Alfred Perlstein <bright@hotjobs.com>
1998-12-22 00:44:11 +00:00
eivind
a0317115f8 Staticize. 1998-12-21 23:38:33 +00:00
archie
982e80577d Examine all occurrences of sprintf(), strcat(), and str[n]cpy()
for possible buffer overflow problems. Replaced most sprintf()'s
with snprintf(); for others cases, added terminating NUL bytes where
appropriate, replaced constants like "16" with sizeof(), etc.

These changes include several bug fixes, but most changes are for
maintainability's sake. Any instance where it wasn't "immediately
obvious" that a buffer overflow could not occur was made safer.

Reviewed by:	Bruce Evans <bde@zeta.org.au>
Reviewed by:	Matthew Dillon <dillon@apollo.backplane.com>
Reviewed by:	Mike Spengler <mks@networkcs.com>
1998-12-04 22:54:57 +00:00
peter
ca67637f92 Convert lists for bufs attached to vnodes from a LIST to a TAILQ.
- Use TAILQ_* macros extensively instead of internal names
- use b_xflags instead of the NOLIST magic number hack in the next pointer
- clean bufs are inserted at the tail rather than the head.
- redo dirty buffer insert so that metadata (negative lbn) goes to the
  tail directly rather than at the HEAD.  This makes a difference when
  inserting dirty data blocks in lbn sorted order since data block
  insertion will not have to bypass all the metadata cruft.  data is
  lbn sorted since it makes sense for clustering and writeback ordering,
  while metadata sorting doesn't help much since the lbn's are
  meaningless when walking the list for writebacks.

Small systems will not notice much (if any) benefit from this, but really
busy systems with large dirty block lists should get a lot more.

I've tested this with softdep, and it doesn't seem to mind the change of
queueing of metadata.

Reviewed (in princible) by: dg
Obtained from: partly from John Dyson's work-in-progress patches in June.
1998-10-31 14:20:39 +00:00
peter
c9ebff45d0 The last argument to vm_object_page_clean() are now bit flags, rather than
the old true/false.

While here, have vfs_msync() only call vm_object_page_clean() with
OBJPC_SYNC if called with MNT_WAIT flags.  vfs_msync() is called at unmount
time (with MNT_WAIT) and from the syncer process (formerly update).
This should make dirty mmap writebacks a little less nasty.

I have tested this a little with SOFTUPDATES enabled, but I don't normally
use it since I've been badly burned too many times.
1998-10-31 07:42:04 +00:00
bde
9a84781068 Oops, rev.1.167 made the device number checking in bdevvp() too strict
for mfs root mounts.  Don't require major 255 to be in bdevsw[].
1998-10-29 11:50:32 +00:00
peter
fe9b0651f9 Remove the V_SAVEMETA flag, nothing uses it any more now that msdosfs and
ext2fs call vtruncbuf() directly.  This simplifies and cleans up
vinvalbuf() a little.
1998-10-29 09:51:28 +00:00
bde
285a46e7e4 Updated the major number check in vfs_object_create(). It's not
clear if the check is necessary, but vfs_object_create() is called
for all vnodes and it was silly to create objects for VBLK vnodes
that don't even have a driver.
1998-10-26 08:07:00 +00:00
phk
13c66194f4 Nitpicking and dusting performed on a train. Removes trivial warnings
about unused variables, labels and other lint.
1998-10-25 17:44:59 +00:00
bde
b17bde009a Fixed device number checking in bdevvp():
- dev != NODEV was checked for, but 0 was returned on failure.  This was
  fixed in Lite2 (except the return code was still slightly wrong (ENODEV
  instead of ENXIO)) but the changes were not merged.  This case probably
  doesn't actually occur under FreeBSD.
- major(dev) was not checked to have a valid non-NULL bdevsw entry.  This
  caused panics when the driver for the root device didn't exist.

Fixed minor misformattings in bdevvp().  Rev.1.14 consisted mainly of
gratuitous reformattings that seem to have caused many Lite2 merge
errors.

PR:			8417
1998-10-25 16:11:49 +00:00
dt
1f69c742ab Backed out rev. 1.164. It caused problems on SMP.
PR:		8309
1998-10-14 15:05:52 +00:00
dg
3defb6d13f Fixed two potentially serious classes of bugs:
1) The vnode pager wasn't properly tracking the file size due to
   "size" being page rounded in some cases and not in others.
   This sometimes resulted in corrupted files. First noticed by
   Terry Lambert.
   Fixed by changing the "size" pager_alloc parameter to be a 64bit
   byte value (as opposed to a 32bit page index) and changing the
   pagers and their callers to deal with this properly.
2) Fixed a bogus type cast in round_page() and trunc_page() that
   caused some 64bit offsets and sizes to be scrambled. Removing
   the cast required adding casts at a few dozen callers.
   There may be problems with other bogus casts in close-by
   macros. A quick check seemed to indicate that those were okay,
   however.
1998-10-13 08:24:45 +00:00
dt
016509e247 UnVMIO vnodes of block devices when they are no longer in use. (Some
things, like msdosfs, do not work (panic) on devices with VMIO enabled.
FFS enable VMIO on mounted devices, and nothing previously disabled it, so,
after you mounted FFS floppy, you could not mount msdosfs floppy anymore...)

This is mostly a quick before-release fix.

Reviewed by:	bde
1998-10-12 20:14:09 +00:00
sos
8397655514 Remove the SLICE code.
This clearly needs alot more thought, and we dont need this to hunt
us down in 3.0-RELEASE.
1998-09-14 19:56:42 +00:00
bde
a84a2dedfc Instantiate `nfs_mount_type' in a standard file so that it is present
when nfs is an LKM.  Declare it in a header file.  Don't forget to use
it in non-Lite2 code.  Initialize it to -1 instead of to 0, since 0
will soon be the mount type number for the first vfs loaded.

NetBSD uses strcmp() to avoid this ugly global.
1998-09-05 15:17:34 +00:00
bde
d3f8ad1875 Oops, the previous revision unconfigured too much pre-Lite2 compatibilty
cruft.  At least lsvfs(1) was broken.
1998-08-29 13:13:10 +00:00
bde
92b68e1a4b Don't configure compatibility code for pre-Lite2 mount() calls by
default.  This code should go away soon.
1998-08-12 20:17:42 +00:00
dfr
bdb4d90548 Initialise all the fields separately in vattr_null since on the alpha
they are not all the same width.
1998-07-12 16:45:39 +00:00
bde
f0b863f4b5 Fixed printf format errors. 1998-07-11 07:46:16 +00:00
bde
9e868cbb1a Removed unused includes. 1998-06-21 18:02:50 +00:00
julian
da325152ee Replace 'sleep()' with 'tsleep()'
Accidentally imported from Kirk's codebase.

Pointed out by: various.
1998-06-10 22:02:14 +00:00
julian
acd840b15c Submitted by: Kirk McKusick <mckusick@McKusick.COM>
Fix for potential hang when trying to reboot the system or
to forcibly unmount a soft update enabled filesystem.
FreeBSD already handled the reboot case differently, this is however a better
fix.
1998-06-10 18:13:19 +00:00