1994-05-24 10:09:53 +00:00
|
|
|
/*
|
1997-02-10 02:22:35 +00:00
|
|
|
* Copyright (c) 1989, 1993, 1995
|
1994-05-24 10:09:53 +00:00
|
|
|
* The Regents of the University of California. All rights reserved.
|
|
|
|
*
|
|
|
|
* Redistribution and use in source and binary forms, with or without
|
|
|
|
* modification, are permitted provided that the following conditions
|
|
|
|
* are met:
|
|
|
|
* 1. Redistributions of source code must retain the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer.
|
|
|
|
* 2. Redistributions in binary form must reproduce the above copyright
|
|
|
|
* notice, this list of conditions and the following disclaimer in the
|
|
|
|
* documentation and/or other materials provided with the distribution.
|
|
|
|
* 3. All advertising materials mentioning features or use of this software
|
|
|
|
* must display the following acknowledgement:
|
|
|
|
* This product includes software developed by the University of
|
|
|
|
* California, Berkeley and its contributors.
|
|
|
|
* 4. Neither the name of the University nor the names of its contributors
|
|
|
|
* may be used to endorse or promote products derived from this software
|
|
|
|
* without specific prior written permission.
|
|
|
|
*
|
|
|
|
* THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
|
|
|
* ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
|
|
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
|
|
* ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
|
|
|
* FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
|
|
|
* DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
|
|
|
* OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
|
|
|
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
|
|
|
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
|
|
|
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
|
|
|
* SUCH DAMAGE.
|
|
|
|
*
|
1997-02-10 02:22:35 +00:00
|
|
|
* @(#)spec_vnops.c 8.14 (Berkeley) 5/21/95
|
1999-08-28 01:08:13 +00:00
|
|
|
* $FreeBSD$
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
|
|
|
|
|
|
|
#include <sys/param.h>
|
2002-02-23 11:12:57 +00:00
|
|
|
#include <sys/lock.h>
|
|
|
|
#include <sys/sx.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <sys/proc.h>
|
|
|
|
#include <sys/systm.h>
|
|
|
|
#include <sys/kernel.h>
|
This checkin reimplements the io-request priority hack in a way
that works in the new threaded kernel. It was commented out of
the disksort routine earlier this year for the reasons given in
kern/subr_disklabel.c (which is where this code used to reside
before it moved to kern/subr_disk.c):
----------------------------
revision 1.65
date: 2002/04/22 06:53:20; author: phk; state: Exp; lines: +5 -0
Comment out Kirks io-request priority hack until we can do this in a
civilized way which doesn't cause grief.
The problem is that it is not generally safe to cast a "struct bio
*" to a "struct buf *". Things like ccd, vinum, ata-raid and GEOM
constructs bio's which are not entrails of a struct buf.
Also, curthread may or may not have anything to do with the I/O request
at hand.
The correct solution can either be to tag struct bio's with a
priority derived from the requesting threads nice and have disksort
act on this field, this wouldn't address the "silly-seek syndrome"
where two equal processes bang the diskheads from one edge to the
other of the disk repeatedly.
Alternatively, and probably better: a sleep should be introduced
either at the time the I/O is requested or at the time it is completed
where we can be sure to sleep in the right thread.
The sleep also needs to be in constant timeunits, 1/hz can be practicaly
any sub-second size, at high HZ the current code practically doesn't
do anything.
----------------------------
As suggested in this comment, it is no longer located in the disk sort
routine, but rather now resides in spec_strategy where the disk operations
are being queued by the thread that is associated with the process that
is really requesting the I/O. At that point, the disk queues are not
visible, so the I/O for positively niced processes is always slowed
down whether or not there is other activity on the disk.
On the issue of scaling HZ, I believe that the current scheme is
better than using a fixed quantum of time. As machines and I/O
subsystems get faster, the resolution on the clock also rises.
So, ten years from now we will be slowing things down for shorter
periods of time, but the proportional effect on the system will
be about the same as it is today. So, I view this as a feature
rather than a drawback. Hence this patch sticks with using HZ.
Sponsored by: DARPA & NAI Labs.
Reviewed by: Poul-Henning Kamp <phk@critter.freebsd.dk>
2002-10-22 00:59:49 +00:00
|
|
|
#include <sys/mutex.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <sys/conf.h>
|
2000-05-05 09:59:14 +00:00
|
|
|
#include <sys/bio.h>
|
1994-05-24 10:09:53 +00:00
|
|
|
#include <sys/buf.h>
|
|
|
|
#include <sys/mount.h>
|
|
|
|
#include <sys/vnode.h>
|
|
|
|
#include <sys/stat.h>
|
1996-09-03 14:25:27 +00:00
|
|
|
#include <sys/fcntl.h>
|
1995-12-07 12:48:31 +00:00
|
|
|
#include <sys/vmmeter.h>
|
2002-11-04 07:29:20 +00:00
|
|
|
#include <sys/sysctl.h>
|
1999-09-25 16:21:39 +00:00
|
|
|
#include <sys/tty.h>
|
1995-11-18 12:49:14 +00:00
|
|
|
#include <vm/vm.h>
|
1995-12-07 12:48:31 +00:00
|
|
|
#include <vm/vm_object.h>
|
1995-12-05 21:51:45 +00:00
|
|
|
#include <vm/vm_page.h>
|
1995-11-18 12:49:14 +00:00
|
|
|
#include <vm/vm_pager.h>
|
|
|
|
|
2002-09-28 17:15:38 +00:00
|
|
|
|
2002-03-19 22:20:14 +00:00
|
|
|
static int spec_advlock(struct vop_advlock_args *);
|
|
|
|
static int spec_bmap(struct vop_bmap_args *);
|
|
|
|
static int spec_close(struct vop_close_args *);
|
|
|
|
static int spec_freeblks(struct vop_freeblks_args *);
|
|
|
|
static int spec_fsync(struct vop_fsync_args *);
|
|
|
|
static int spec_getpages(struct vop_getpages_args *);
|
|
|
|
static int spec_ioctl(struct vop_ioctl_args *);
|
|
|
|
static int spec_kqfilter(struct vop_kqfilter_args *);
|
|
|
|
static int spec_open(struct vop_open_args *);
|
|
|
|
static int spec_poll(struct vop_poll_args *);
|
|
|
|
static int spec_print(struct vop_print_args *);
|
|
|
|
static int spec_read(struct vop_read_args *);
|
|
|
|
static int spec_strategy(struct vop_strategy_args *);
|
|
|
|
static int spec_write(struct vop_write_args *);
|
1995-12-14 09:55:16 +00:00
|
|
|
|
1995-11-09 08:17:23 +00:00
|
|
|
vop_t **spec_vnodeop_p;
|
1995-12-11 09:24:58 +00:00
|
|
|
static struct vnodeopv_entry_desc spec_vnodeop_entries[] = {
|
1997-10-26 20:55:39 +00:00
|
|
|
{ &vop_default_desc, (vop_t *) vop_defaultop },
|
|
|
|
{ &vop_access_desc, (vop_t *) vop_ebadf },
|
1997-10-15 09:22:02 +00:00
|
|
|
{ &vop_advlock_desc, (vop_t *) spec_advlock },
|
2001-04-30 14:35:35 +00:00
|
|
|
{ &vop_bmap_desc, (vop_t *) spec_bmap },
|
1997-10-15 09:22:02 +00:00
|
|
|
{ &vop_close_desc, (vop_t *) spec_close },
|
1999-11-07 15:09:59 +00:00
|
|
|
{ &vop_create_desc, (vop_t *) vop_panic },
|
1998-09-12 20:21:54 +00:00
|
|
|
{ &vop_freeblks_desc, (vop_t *) spec_freeblks },
|
1997-10-15 09:22:02 +00:00
|
|
|
{ &vop_fsync_desc, (vop_t *) spec_fsync },
|
1997-10-15 10:05:29 +00:00
|
|
|
{ &vop_getpages_desc, (vop_t *) spec_getpages },
|
2000-08-18 10:01:02 +00:00
|
|
|
{ &vop_getwritemount_desc, (vop_t *) vop_stdgetwritemount },
|
1997-10-15 09:22:02 +00:00
|
|
|
{ &vop_ioctl_desc, (vop_t *) spec_ioctl },
|
2002-02-10 22:00:20 +00:00
|
|
|
{ &vop_kqfilter_desc, (vop_t *) spec_kqfilter },
|
1997-10-26 20:55:39 +00:00
|
|
|
{ &vop_lease_desc, (vop_t *) vop_null },
|
1999-11-07 15:09:59 +00:00
|
|
|
{ &vop_link_desc, (vop_t *) vop_panic },
|
|
|
|
{ &vop_mkdir_desc, (vop_t *) vop_panic },
|
|
|
|
{ &vop_mknod_desc, (vop_t *) vop_panic },
|
1997-10-15 09:22:02 +00:00
|
|
|
{ &vop_open_desc, (vop_t *) spec_open },
|
1997-10-16 20:32:40 +00:00
|
|
|
{ &vop_pathconf_desc, (vop_t *) vop_stdpathconf },
|
1997-10-15 09:22:02 +00:00
|
|
|
{ &vop_poll_desc, (vop_t *) spec_poll },
|
|
|
|
{ &vop_print_desc, (vop_t *) spec_print },
|
|
|
|
{ &vop_read_desc, (vop_t *) spec_read },
|
1999-11-07 15:09:59 +00:00
|
|
|
{ &vop_readdir_desc, (vop_t *) vop_panic },
|
|
|
|
{ &vop_readlink_desc, (vop_t *) vop_panic },
|
|
|
|
{ &vop_reallocblks_desc, (vop_t *) vop_panic },
|
1997-10-26 20:55:39 +00:00
|
|
|
{ &vop_reclaim_desc, (vop_t *) vop_null },
|
1999-11-07 15:09:59 +00:00
|
|
|
{ &vop_remove_desc, (vop_t *) vop_panic },
|
|
|
|
{ &vop_rename_desc, (vop_t *) vop_panic },
|
|
|
|
{ &vop_rmdir_desc, (vop_t *) vop_panic },
|
1997-10-26 20:55:39 +00:00
|
|
|
{ &vop_setattr_desc, (vop_t *) vop_ebadf },
|
1997-10-15 10:05:29 +00:00
|
|
|
{ &vop_strategy_desc, (vop_t *) spec_strategy },
|
1999-11-07 15:09:59 +00:00
|
|
|
{ &vop_symlink_desc, (vop_t *) vop_panic },
|
1997-10-15 09:22:02 +00:00
|
|
|
{ &vop_write_desc, (vop_t *) spec_write },
|
2002-07-27 05:14:59 +00:00
|
|
|
{ &vop_lock_desc, (vop_t *) vop_nolock },
|
|
|
|
{ &vop_unlock_desc, (vop_t *) vop_nounlock },
|
|
|
|
{ &vop_islocked_desc, (vop_t *) vop_noislocked },
|
1995-11-09 08:17:23 +00:00
|
|
|
{ NULL, NULL }
|
1994-05-24 10:09:53 +00:00
|
|
|
};
|
1995-12-11 09:24:58 +00:00
|
|
|
static struct vnodeopv_desc spec_vnodeop_opv_desc =
|
1994-05-24 10:09:53 +00:00
|
|
|
{ &spec_vnodeop_p, spec_vnodeop_entries };
|
|
|
|
|
1994-09-21 03:47:43 +00:00
|
|
|
VNODEOP_SET(spec_vnodeop_opv_desc);
|
|
|
|
|
1997-10-15 13:24:07 +00:00
|
|
|
int
|
|
|
|
spec_vnoperate(ap)
|
|
|
|
struct vop_generic_args /* {
|
|
|
|
struct vnodeop_desc *a_desc;
|
|
|
|
<other random data follows, presumably>
|
|
|
|
} */ *ap;
|
|
|
|
{
|
|
|
|
return (VOCALL(spec_vnodeop_p, ap->a_desc->vdesc_offset, ap));
|
|
|
|
}
|
|
|
|
|
2002-03-19 22:20:14 +00:00
|
|
|
static void spec_getpages_iodone(struct buf *bp);
|
1995-10-23 02:23:29 +00:00
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
|
|
|
* Open a special file.
|
|
|
|
*/
|
|
|
|
/* ARGSUSED */
|
1997-10-15 13:24:07 +00:00
|
|
|
static int
|
1994-05-24 10:09:53 +00:00
|
|
|
spec_open(ap)
|
|
|
|
struct vop_open_args /* {
|
|
|
|
struct vnode *a_vp;
|
|
|
|
int a_mode;
|
|
|
|
struct ucred *a_cred;
|
2001-09-12 08:38:13 +00:00
|
|
|
struct thread *a_td;
|
1994-05-24 10:09:53 +00:00
|
|
|
} */ *ap;
|
|
|
|
{
|
2001-09-12 08:38:13 +00:00
|
|
|
struct thread *td = ap->a_td;
|
1999-11-09 14:15:33 +00:00
|
|
|
struct vnode *vp = ap->a_vp;
|
|
|
|
dev_t dev = vp->v_rdev;
|
1999-10-04 11:23:10 +00:00
|
|
|
int error;
|
1999-05-08 06:40:31 +00:00
|
|
|
struct cdevsw *dsw;
|
1999-10-04 12:33:05 +00:00
|
|
|
const char *cp;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2002-02-10 22:00:20 +00:00
|
|
|
if (vp->v_type == VBLK)
|
|
|
|
return (ENXIO);
|
|
|
|
|
|
|
|
/* Don't allow open if fs is mounted -nodev. */
|
1994-05-24 10:09:53 +00:00
|
|
|
if (vp->v_mount && (vp->v_mount->mnt_flag & MNT_NODEV))
|
|
|
|
return (ENXIO);
|
|
|
|
|
1999-09-25 16:21:39 +00:00
|
|
|
dsw = devsw(dev);
|
2002-02-10 22:00:20 +00:00
|
|
|
if (dsw == NULL || dsw->d_open == NULL)
|
|
|
|
return (ENXIO);
|
1999-09-25 16:21:39 +00:00
|
|
|
|
2002-02-10 22:00:20 +00:00
|
|
|
/* Make this field valid before any I/O in d_open. */
|
|
|
|
if (dev->si_iosize_max == 0)
|
1999-09-22 19:56:14 +00:00
|
|
|
dev->si_iosize_max = DFLTPHYS;
|
|
|
|
|
1999-11-09 14:15:33 +00:00
|
|
|
/*
|
|
|
|
* XXX: Disks get special billing here, but it is mostly wrong.
|
2002-02-10 22:00:20 +00:00
|
|
|
* XXX: Disk partitions can overlap and the real checks should
|
1999-11-09 14:15:33 +00:00
|
|
|
* XXX: take this into account, and consequently they need to
|
2002-02-10 22:00:20 +00:00
|
|
|
* XXX: live in the disk slice code. Some checks do.
|
1999-11-09 14:15:33 +00:00
|
|
|
*/
|
2002-02-10 22:00:20 +00:00
|
|
|
if (vn_isdisk(vp, NULL) && ap->a_cred != FSCRED &&
|
2000-01-10 12:04:27 +00:00
|
|
|
(ap->a_mode & FWRITE)) {
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
2002-02-10 22:00:20 +00:00
|
|
|
* Never allow opens for write if the disk is mounted R/W.
|
1999-11-09 14:15:33 +00:00
|
|
|
*/
|
2000-10-09 17:31:39 +00:00
|
|
|
if (vp->v_rdev->si_mountpoint != NULL &&
|
|
|
|
!(vp->v_rdev->si_mountpoint->mnt_flag & MNT_RDONLY))
|
2002-02-10 22:00:20 +00:00
|
|
|
return (EBUSY);
|
1999-11-09 14:15:33 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* When running in secure mode, do not allow opens
|
2002-02-10 22:00:20 +00:00
|
|
|
* for writing if the disk is mounted.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
2002-02-27 18:32:23 +00:00
|
|
|
error = securelevel_ge(td->td_ucred, 1);
|
2002-02-10 22:00:20 +00:00
|
|
|
if (error && vfs_mountedon(vp))
|
|
|
|
return (error);
|
Make our v_usecount vnode reference count work identically to the
original BSD code. The association between the vnode and the vm_object
no longer includes reference counts. The major difference is that
vm_object's are no longer freed gratuitiously from the vnode, and so
once an object is created for the vnode, it will last as long as the
vnode does.
When a vnode object reference count is incremented, then the underlying
vnode reference count is incremented also. The two "objects" are now
more intimately related, and so the interactions are now much less
complex.
When vnodes are now normally placed onto the free queue with an object still
attached. The rundown of the object happens at vnode rundown time, and
happens with exactly the same filesystem semantics of the original VFS
code. There is absolutely no need for vnode_pager_uncache and other
travesties like that anymore.
A side-effect of these changes is that SMP locking should be much simpler,
the I/O copyin/copyout optimizations work, NFS should be more ponderable,
and further work on layered filesystems should be less frustrating, because
of the totally coherent management of the vnode objects and vnodes.
Please be careful with your system while running this code, but I would
greatly appreciate feedback as soon a reasonably possible.
1998-01-06 05:26:17 +00:00
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
1999-11-09 14:15:33 +00:00
|
|
|
* When running in very secure mode, do not allow
|
2002-02-10 22:00:20 +00:00
|
|
|
* opens for writing of any disks.
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
2002-02-27 18:32:23 +00:00
|
|
|
error = securelevel_ge(td->td_ucred, 2);
|
2001-09-26 20:18:26 +00:00
|
|
|
if (error)
|
|
|
|
return (error);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
1999-09-03 08:26:46 +00:00
|
|
|
|
2002-02-10 22:00:20 +00:00
|
|
|
/* XXX: Special casing of ttys for deadfs. Probably redundant. */
|
1999-11-09 14:15:33 +00:00
|
|
|
if (dsw->d_flags & D_TTY)
|
2002-08-04 10:29:36 +00:00
|
|
|
vp->v_vflag |= VV_ISTTY;
|
1999-11-09 14:15:33 +00:00
|
|
|
|
2001-09-12 08:38:13 +00:00
|
|
|
VOP_UNLOCK(vp, 0, td);
|
2002-09-27 19:47:59 +00:00
|
|
|
if(dsw->d_flags & D_NOGIANT) {
|
|
|
|
DROP_GIANT();
|
|
|
|
error = dsw->d_open(dev, ap->a_mode, S_IFCHR, td);
|
|
|
|
PICKUP_GIANT();
|
|
|
|
} else
|
|
|
|
error = dsw->d_open(dev, ap->a_mode, S_IFCHR, td);
|
2001-09-12 08:38:13 +00:00
|
|
|
vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, td);
|
1999-11-09 14:15:33 +00:00
|
|
|
|
1999-09-25 16:21:39 +00:00
|
|
|
if (error)
|
|
|
|
return (error);
|
|
|
|
|
|
|
|
if (dsw->d_flags & D_TTY) {
|
1999-09-25 18:52:03 +00:00
|
|
|
if (dev->si_tty) {
|
1999-09-25 16:21:39 +00:00
|
|
|
struct tty *tp;
|
|
|
|
tp = dev->si_tty;
|
|
|
|
if (!tp->t_stop) {
|
1999-09-25 18:24:47 +00:00
|
|
|
printf("Warning:%s: no t_stop, using nottystop\n", devtoname(dev));
|
|
|
|
tp->t_stop = nottystop;
|
1999-09-25 16:21:39 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2000-01-10 12:04:27 +00:00
|
|
|
if (vn_isdisk(vp, NULL)) {
|
1999-09-03 08:26:46 +00:00
|
|
|
if (!dev->si_bsize_phys)
|
|
|
|
dev->si_bsize_phys = DEV_BSIZE;
|
|
|
|
}
|
1999-10-04 12:33:05 +00:00
|
|
|
if ((dsw->d_flags & D_DISK) == 0) {
|
|
|
|
cp = devtoname(dev);
|
|
|
|
if (*cp == '#' && (dsw->d_flags & D_NAGGED) == 0) {
|
|
|
|
printf("WARNING: driver %s should register devices with make_dev() (dev_t = \"%s\")\n",
|
|
|
|
dsw->d_name, cp);
|
2002-02-10 22:00:20 +00:00
|
|
|
dsw->d_flags |= D_NAGGED;
|
1999-10-04 12:33:05 +00:00
|
|
|
}
|
|
|
|
}
|
1999-09-03 08:26:46 +00:00
|
|
|
return (error);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Vnode op for read
|
|
|
|
*/
|
|
|
|
/* ARGSUSED */
|
1997-10-15 13:24:07 +00:00
|
|
|
static int
|
1994-05-24 10:09:53 +00:00
|
|
|
spec_read(ap)
|
|
|
|
struct vop_read_args /* {
|
|
|
|
struct vnode *a_vp;
|
|
|
|
struct uio *a_uio;
|
1999-11-09 14:15:33 +00:00
|
|
|
int a_ioflag;
|
1999-10-04 11:23:10 +00:00
|
|
|
struct ucred *a_cred;
|
|
|
|
} */ *ap;
|
|
|
|
{
|
1999-11-09 14:15:33 +00:00
|
|
|
struct vnode *vp;
|
2001-09-12 08:38:13 +00:00
|
|
|
struct thread *td;
|
1999-11-09 14:15:33 +00:00
|
|
|
struct uio *uio;
|
1999-10-04 11:23:10 +00:00
|
|
|
dev_t dev;
|
2000-08-24 15:36:55 +00:00
|
|
|
int error, resid;
|
2002-09-28 13:42:04 +00:00
|
|
|
struct cdevsw *dsw;
|
1999-09-03 05:16:59 +00:00
|
|
|
|
1999-11-09 14:15:33 +00:00
|
|
|
vp = ap->a_vp;
|
1999-10-04 11:23:10 +00:00
|
|
|
dev = vp->v_rdev;
|
1999-11-09 14:15:33 +00:00
|
|
|
uio = ap->a_uio;
|
2001-09-12 08:38:13 +00:00
|
|
|
td = uio->uio_td;
|
2000-08-24 15:36:55 +00:00
|
|
|
resid = uio->uio_resid;
|
1999-02-25 05:22:30 +00:00
|
|
|
|
2000-08-24 15:36:55 +00:00
|
|
|
if (resid == 0)
|
1999-11-09 14:15:33 +00:00
|
|
|
return (0);
|
1999-10-04 11:23:10 +00:00
|
|
|
|
2002-09-27 19:47:59 +00:00
|
|
|
dsw = devsw(dev);
|
2001-09-12 08:38:13 +00:00
|
|
|
VOP_UNLOCK(vp, 0, td);
|
2002-09-27 19:47:59 +00:00
|
|
|
if (dsw->d_flags & D_NOGIANT) {
|
|
|
|
DROP_GIANT();
|
|
|
|
error = dsw->d_read(dev, uio, ap->a_ioflag);
|
|
|
|
PICKUP_GIANT();
|
|
|
|
} else
|
|
|
|
error = dsw->d_read(dev, uio, ap->a_ioflag);
|
2001-09-12 08:38:13 +00:00
|
|
|
vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, td);
|
2000-08-24 15:36:55 +00:00
|
|
|
if (uio->uio_resid != resid || (error == 0 && resid != 0))
|
2002-02-10 22:00:20 +00:00
|
|
|
vfs_timestamp(&dev->si_atime);
|
1999-10-04 11:23:10 +00:00
|
|
|
return (error);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Vnode op for write
|
|
|
|
*/
|
|
|
|
/* ARGSUSED */
|
1997-10-15 13:24:07 +00:00
|
|
|
static int
|
1994-05-24 10:09:53 +00:00
|
|
|
spec_write(ap)
|
|
|
|
struct vop_write_args /* {
|
|
|
|
struct vnode *a_vp;
|
|
|
|
struct uio *a_uio;
|
1999-11-09 14:15:33 +00:00
|
|
|
int a_ioflag;
|
1994-05-24 10:09:53 +00:00
|
|
|
struct ucred *a_cred;
|
|
|
|
} */ *ap;
|
|
|
|
{
|
1999-11-09 14:15:33 +00:00
|
|
|
struct vnode *vp;
|
2001-09-12 08:38:13 +00:00
|
|
|
struct thread *td;
|
1999-11-09 14:15:33 +00:00
|
|
|
struct uio *uio;
|
|
|
|
dev_t dev;
|
2000-08-24 15:36:55 +00:00
|
|
|
int error, resid;
|
2002-09-27 19:47:59 +00:00
|
|
|
struct cdevsw *dsw;
|
1999-02-25 05:22:30 +00:00
|
|
|
|
1999-11-09 14:15:33 +00:00
|
|
|
vp = ap->a_vp;
|
|
|
|
dev = vp->v_rdev;
|
2002-09-27 19:47:59 +00:00
|
|
|
dsw = devsw(dev);
|
1999-11-09 14:15:33 +00:00
|
|
|
uio = ap->a_uio;
|
2001-09-12 08:38:13 +00:00
|
|
|
td = uio->uio_td;
|
2000-08-24 15:36:55 +00:00
|
|
|
resid = uio->uio_resid;
|
1999-09-20 23:17:47 +00:00
|
|
|
|
2001-09-12 08:38:13 +00:00
|
|
|
VOP_UNLOCK(vp, 0, td);
|
2002-09-27 19:47:59 +00:00
|
|
|
if (dsw->d_flags & D_NOGIANT) {
|
|
|
|
DROP_GIANT();
|
|
|
|
error = dsw->d_write(dev, uio, ap->a_ioflag);
|
|
|
|
PICKUP_GIANT();
|
|
|
|
} else
|
|
|
|
error = dsw->d_write(dev, uio, ap->a_ioflag);
|
2001-09-12 08:38:13 +00:00
|
|
|
vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, td);
|
2000-08-24 15:36:55 +00:00
|
|
|
if (uio->uio_resid != resid || (error == 0 && resid != 0)) {
|
2002-02-10 22:00:20 +00:00
|
|
|
vfs_timestamp(&dev->si_ctime);
|
2000-08-24 15:36:55 +00:00
|
|
|
dev->si_mtime = dev->si_ctime;
|
|
|
|
}
|
1999-10-04 11:23:10 +00:00
|
|
|
return (error);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Device ioctl operation.
|
|
|
|
*/
|
|
|
|
/* ARGSUSED */
|
1997-10-15 13:24:07 +00:00
|
|
|
static int
|
1994-05-24 10:09:53 +00:00
|
|
|
spec_ioctl(ap)
|
|
|
|
struct vop_ioctl_args /* {
|
|
|
|
struct vnode *a_vp;
|
2002-10-16 08:04:11 +00:00
|
|
|
u_long a_command;
|
1994-05-24 10:09:53 +00:00
|
|
|
caddr_t a_data;
|
|
|
|
int a_fflag;
|
|
|
|
struct ucred *a_cred;
|
2001-09-12 08:38:13 +00:00
|
|
|
struct thread *a_td;
|
1994-05-24 10:09:53 +00:00
|
|
|
} */ *ap;
|
|
|
|
{
|
1999-11-09 14:15:33 +00:00
|
|
|
dev_t dev;
|
2002-09-26 17:25:22 +00:00
|
|
|
int error;
|
2002-09-27 19:47:59 +00:00
|
|
|
struct cdevsw *dsw;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
1999-11-09 14:15:33 +00:00
|
|
|
dev = ap->a_vp->v_rdev;
|
2002-09-27 19:47:59 +00:00
|
|
|
dsw = devsw(dev);
|
|
|
|
if (dsw->d_flags & D_NOGIANT) {
|
|
|
|
DROP_GIANT();
|
|
|
|
error = dsw->d_ioctl(dev, ap->a_command,
|
|
|
|
ap->a_data, ap->a_fflag, ap->a_td);
|
|
|
|
PICKUP_GIANT();
|
|
|
|
} else
|
|
|
|
error = dsw->d_ioctl(dev, ap->a_command,
|
|
|
|
ap->a_data, ap->a_fflag, ap->a_td);
|
2002-09-26 14:11:49 +00:00
|
|
|
if (error == ENOIOCTL)
|
|
|
|
error = ENOTTY;
|
|
|
|
return (error);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/* ARGSUSED */
|
1997-10-15 13:24:07 +00:00
|
|
|
static int
|
1997-09-14 02:58:12 +00:00
|
|
|
spec_poll(ap)
|
|
|
|
struct vop_poll_args /* {
|
1994-05-24 10:09:53 +00:00
|
|
|
struct vnode *a_vp;
|
1997-09-14 02:58:12 +00:00
|
|
|
int a_events;
|
1994-05-24 10:09:53 +00:00
|
|
|
struct ucred *a_cred;
|
2001-09-12 08:38:13 +00:00
|
|
|
struct thread *a_td;
|
1994-05-24 10:09:53 +00:00
|
|
|
} */ *ap;
|
|
|
|
{
|
1999-11-08 09:59:34 +00:00
|
|
|
dev_t dev;
|
2002-09-27 19:47:59 +00:00
|
|
|
struct cdevsw *dsw;
|
|
|
|
int error;
|
1997-09-14 02:58:12 +00:00
|
|
|
|
1999-11-08 09:59:34 +00:00
|
|
|
dev = ap->a_vp->v_rdev;
|
2002-09-27 19:47:59 +00:00
|
|
|
dsw = devsw(dev);
|
|
|
|
if (dsw->d_flags & D_NOGIANT) {
|
|
|
|
DROP_GIANT();
|
|
|
|
error = dsw->d_poll(dev, ap->a_events, ap->a_td);
|
|
|
|
PICKUP_GIANT();
|
|
|
|
} else
|
|
|
|
error = dsw->d_poll(dev, ap->a_events, ap->a_td);
|
|
|
|
return(error);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
2001-02-15 16:34:11 +00:00
|
|
|
|
|
|
|
/* ARGSUSED */
|
|
|
|
static int
|
|
|
|
spec_kqfilter(ap)
|
|
|
|
struct vop_kqfilter_args /* {
|
|
|
|
struct vnode *a_vp;
|
|
|
|
struct knote *a_kn;
|
|
|
|
} */ *ap;
|
|
|
|
{
|
|
|
|
dev_t dev;
|
2002-09-27 19:47:59 +00:00
|
|
|
struct cdevsw *dsw;
|
|
|
|
int error;
|
2001-02-15 16:34:11 +00:00
|
|
|
|
|
|
|
dev = ap->a_vp->v_rdev;
|
2002-09-27 19:47:59 +00:00
|
|
|
dsw = devsw(dev);
|
|
|
|
if (!(dsw->d_flags & D_KQFILTER))
|
|
|
|
return (1);
|
|
|
|
if (dsw->d_flags & D_NOGIANT) {
|
|
|
|
DROP_GIANT();
|
|
|
|
error = dsw->d_kqfilter(dev, ap->a_kn);
|
|
|
|
PICKUP_GIANT();
|
|
|
|
} else
|
|
|
|
error = dsw->d_kqfilter(dev, ap->a_kn);
|
|
|
|
return (error);
|
2001-02-15 16:34:11 +00:00
|
|
|
}
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
|
|
|
* Synch buffers associated with a block device
|
|
|
|
*/
|
|
|
|
/* ARGSUSED */
|
1997-10-15 13:24:07 +00:00
|
|
|
static int
|
1994-05-24 10:09:53 +00:00
|
|
|
spec_fsync(ap)
|
|
|
|
struct vop_fsync_args /* {
|
|
|
|
struct vnode *a_vp;
|
|
|
|
struct ucred *a_cred;
|
|
|
|
int a_waitfor;
|
2001-09-12 08:38:13 +00:00
|
|
|
struct thread *a_td;
|
1994-05-24 10:09:53 +00:00
|
|
|
} */ *ap;
|
|
|
|
{
|
1999-11-08 09:59:34 +00:00
|
|
|
struct vnode *vp = ap->a_vp;
|
|
|
|
struct buf *bp;
|
1994-05-24 10:09:53 +00:00
|
|
|
struct buf *nbp;
|
2002-10-25 00:20:37 +00:00
|
|
|
int s, error = 0;
|
|
|
|
int maxretry = 100; /* large, arbitrarily chosen */
|
1994-05-24 10:09:53 +00:00
|
|
|
|
2000-01-10 12:04:27 +00:00
|
|
|
if (!vn_isdisk(vp, NULL))
|
1994-05-24 10:09:53 +00:00
|
|
|
return (0);
|
1999-11-08 09:59:34 +00:00
|
|
|
|
2002-09-25 02:29:49 +00:00
|
|
|
VI_LOCK(vp);
|
2001-01-29 08:19:28 +00:00
|
|
|
loop1:
|
2000-12-30 23:32:24 +00:00
|
|
|
/*
|
2002-02-10 22:00:20 +00:00
|
|
|
* MARK/SCAN initialization to avoid infinite loops.
|
2000-12-30 23:32:24 +00:00
|
|
|
*/
|
|
|
|
s = splbio();
|
2001-02-04 16:08:18 +00:00
|
|
|
TAILQ_FOREACH(bp, &vp->v_dirtyblkhd, b_vnbufs) {
|
2000-12-30 23:32:24 +00:00
|
|
|
bp->b_flags &= ~B_SCANNED;
|
2002-10-25 00:20:37 +00:00
|
|
|
bp->b_error = 0;
|
2000-12-30 23:32:24 +00:00
|
|
|
}
|
|
|
|
splx(s);
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
|
|
|
* Flush all dirty buffers associated with a block device.
|
|
|
|
*/
|
2001-01-29 08:19:28 +00:00
|
|
|
loop2:
|
1994-05-24 10:09:53 +00:00
|
|
|
s = splbio();
|
2002-02-10 22:00:20 +00:00
|
|
|
for (bp = TAILQ_FIRST(&vp->v_dirtyblkhd); bp != NULL; bp = nbp) {
|
1998-10-31 15:31:29 +00:00
|
|
|
nbp = TAILQ_NEXT(bp, b_vnbufs);
|
2000-12-30 23:32:24 +00:00
|
|
|
if ((bp->b_flags & B_SCANNED) != 0)
|
|
|
|
continue;
|
2002-09-25 02:29:49 +00:00
|
|
|
VI_UNLOCK(vp);
|
2000-12-30 23:32:24 +00:00
|
|
|
bp->b_flags |= B_SCANNED;
|
2002-09-25 02:29:49 +00:00
|
|
|
if (BUF_LOCK(bp, LK_EXCLUSIVE | LK_NOWAIT)) {
|
|
|
|
VI_LOCK(vp);
|
1994-05-24 10:09:53 +00:00
|
|
|
continue;
|
2002-09-25 02:29:49 +00:00
|
|
|
}
|
1994-05-24 10:09:53 +00:00
|
|
|
if ((bp->b_flags & B_DELWRI) == 0)
|
|
|
|
panic("spec_fsync: not dirty");
|
2002-08-04 10:29:36 +00:00
|
|
|
if ((vp->v_vflag & VV_OBJBUF) && (bp->b_flags & B_CLUSTEROK)) {
|
1999-06-26 02:47:16 +00:00
|
|
|
BUF_UNLOCK(bp);
|
Make our v_usecount vnode reference count work identically to the
original BSD code. The association between the vnode and the vm_object
no longer includes reference counts. The major difference is that
vm_object's are no longer freed gratuitiously from the vnode, and so
once an object is created for the vnode, it will last as long as the
vnode does.
When a vnode object reference count is incremented, then the underlying
vnode reference count is incremented also. The two "objects" are now
more intimately related, and so the interactions are now much less
complex.
When vnodes are now normally placed onto the free queue with an object still
attached. The rundown of the object happens at vnode rundown time, and
happens with exactly the same filesystem semantics of the original VFS
code. There is absolutely no need for vnode_pager_uncache and other
travesties like that anymore.
A side-effect of these changes is that SMP locking should be much simpler,
the I/O copyin/copyout optimizations work, NFS should be more ponderable,
and further work on layered filesystems should be less frustrating, because
of the totally coherent management of the vnode objects and vnodes.
Please be careful with your system while running this code, but I would
greatly appreciate feedback as soon a reasonably possible.
1998-01-06 05:26:17 +00:00
|
|
|
vfs_bio_awrite(bp);
|
|
|
|
splx(s);
|
|
|
|
} else {
|
|
|
|
bremfree(bp);
|
|
|
|
splx(s);
|
|
|
|
bawrite(bp);
|
|
|
|
}
|
2002-09-25 02:29:49 +00:00
|
|
|
VI_LOCK(vp);
|
2001-01-29 08:19:28 +00:00
|
|
|
goto loop2;
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
2001-01-29 08:19:28 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If synchronous the caller expects us to completely resolve all
|
|
|
|
* dirty buffers in the system. Wait for in-progress I/O to
|
|
|
|
* complete (which could include background bitmap writes), then
|
|
|
|
* retry if dirty blocks still exist.
|
|
|
|
*/
|
1994-05-24 10:09:53 +00:00
|
|
|
if (ap->a_waitfor == MNT_WAIT) {
|
|
|
|
while (vp->v_numoutput) {
|
2002-08-04 10:29:36 +00:00
|
|
|
vp->v_iflag |= VI_BWAIT;
|
|
|
|
msleep((caddr_t)&vp->v_numoutput, VI_MTX(vp),
|
|
|
|
PRIBIO + 1, "spfsyn", 0);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
1998-10-31 15:31:29 +00:00
|
|
|
if (!TAILQ_EMPTY(&vp->v_dirtyblkhd)) {
|
2002-10-25 00:20:37 +00:00
|
|
|
/*
|
|
|
|
* If we are unable to write any of these buffers
|
|
|
|
* then we fail now rather than trying endlessly
|
|
|
|
* to write them out.
|
|
|
|
*/
|
|
|
|
TAILQ_FOREACH(bp, &vp->v_dirtyblkhd, b_vnbufs)
|
|
|
|
if ((error = bp->b_error) == 0)
|
|
|
|
continue;
|
|
|
|
if (error == 0 && --maxretry >= 0) {
|
2001-01-29 08:19:28 +00:00
|
|
|
splx(s);
|
|
|
|
goto loop1;
|
|
|
|
}
|
|
|
|
vprint("spec_fsync: giving up on dirty", vp);
|
2002-10-25 00:20:37 +00:00
|
|
|
error = EAGAIN;
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
}
|
2002-09-25 02:29:49 +00:00
|
|
|
VI_UNLOCK(vp);
|
1994-05-24 10:09:53 +00:00
|
|
|
splx(s);
|
2002-10-25 00:20:37 +00:00
|
|
|
return (error);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
This checkin reimplements the io-request priority hack in a way
that works in the new threaded kernel. It was commented out of
the disksort routine earlier this year for the reasons given in
kern/subr_disklabel.c (which is where this code used to reside
before it moved to kern/subr_disk.c):
----------------------------
revision 1.65
date: 2002/04/22 06:53:20; author: phk; state: Exp; lines: +5 -0
Comment out Kirks io-request priority hack until we can do this in a
civilized way which doesn't cause grief.
The problem is that it is not generally safe to cast a "struct bio
*" to a "struct buf *". Things like ccd, vinum, ata-raid and GEOM
constructs bio's which are not entrails of a struct buf.
Also, curthread may or may not have anything to do with the I/O request
at hand.
The correct solution can either be to tag struct bio's with a
priority derived from the requesting threads nice and have disksort
act on this field, this wouldn't address the "silly-seek syndrome"
where two equal processes bang the diskheads from one edge to the
other of the disk repeatedly.
Alternatively, and probably better: a sleep should be introduced
either at the time the I/O is requested or at the time it is completed
where we can be sure to sleep in the right thread.
The sleep also needs to be in constant timeunits, 1/hz can be practicaly
any sub-second size, at high HZ the current code practically doesn't
do anything.
----------------------------
As suggested in this comment, it is no longer located in the disk sort
routine, but rather now resides in spec_strategy where the disk operations
are being queued by the thread that is associated with the process that
is really requesting the I/O. At that point, the disk queues are not
visible, so the I/O for positively niced processes is always slowed
down whether or not there is other activity on the disk.
On the issue of scaling HZ, I believe that the current scheme is
better than using a fixed quantum of time. As machines and I/O
subsystems get faster, the resolution on the clock also rises.
So, ten years from now we will be slowing things down for shorter
periods of time, but the proportional effect on the system will
be about the same as it is today. So, I view this as a feature
rather than a drawback. Hence this patch sticks with using HZ.
Sponsored by: DARPA & NAI Labs.
Reviewed by: Poul-Henning Kamp <phk@critter.freebsd.dk>
2002-10-22 00:59:49 +00:00
|
|
|
/*
|
|
|
|
* Mutex to use when delaying niced I/O bound processes in spec_strategy().
|
|
|
|
*/
|
|
|
|
static struct mtx strategy_mtx;
|
|
|
|
static void
|
|
|
|
strategy_init(void)
|
|
|
|
{
|
|
|
|
|
|
|
|
mtx_init(&strategy_mtx, "strategy", NULL, MTX_DEF);
|
|
|
|
}
|
|
|
|
SYSINIT(strategy, SI_SUB_DRIVERS, SI_ORDER_MIDDLE, strategy_init, NULL)
|
|
|
|
|
2002-11-04 07:29:20 +00:00
|
|
|
static int doslowdown = 0;
|
|
|
|
SYSCTL_INT(_debug, OID_AUTO, doslowdown, CTLFLAG_RW, &doslowdown, 0, "");
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
|
|
|
* Just call the device strategy routine
|
|
|
|
*/
|
1997-10-15 13:24:07 +00:00
|
|
|
static int
|
1994-05-24 10:09:53 +00:00
|
|
|
spec_strategy(ap)
|
|
|
|
struct vop_strategy_args /* {
|
1999-08-25 00:26:34 +00:00
|
|
|
struct vnode *a_vp;
|
1994-05-24 10:09:53 +00:00
|
|
|
struct buf *a_bp;
|
|
|
|
} */ *ap;
|
|
|
|
{
|
1998-03-08 09:59:44 +00:00
|
|
|
struct buf *bp;
|
1999-12-01 02:09:30 +00:00
|
|
|
struct vnode *vp;
|
|
|
|
struct mount *mp;
|
2000-07-11 22:07:57 +00:00
|
|
|
int error;
|
2002-09-27 19:47:59 +00:00
|
|
|
struct cdevsw *dsw;
|
This checkin reimplements the io-request priority hack in a way
that works in the new threaded kernel. It was commented out of
the disksort routine earlier this year for the reasons given in
kern/subr_disklabel.c (which is where this code used to reside
before it moved to kern/subr_disk.c):
----------------------------
revision 1.65
date: 2002/04/22 06:53:20; author: phk; state: Exp; lines: +5 -0
Comment out Kirks io-request priority hack until we can do this in a
civilized way which doesn't cause grief.
The problem is that it is not generally safe to cast a "struct bio
*" to a "struct buf *". Things like ccd, vinum, ata-raid and GEOM
constructs bio's which are not entrails of a struct buf.
Also, curthread may or may not have anything to do with the I/O request
at hand.
The correct solution can either be to tag struct bio's with a
priority derived from the requesting threads nice and have disksort
act on this field, this wouldn't address the "silly-seek syndrome"
where two equal processes bang the diskheads from one edge to the
other of the disk repeatedly.
Alternatively, and probably better: a sleep should be introduced
either at the time the I/O is requested or at the time it is completed
where we can be sure to sleep in the right thread.
The sleep also needs to be in constant timeunits, 1/hz can be practicaly
any sub-second size, at high HZ the current code practically doesn't
do anything.
----------------------------
As suggested in this comment, it is no longer located in the disk sort
routine, but rather now resides in spec_strategy where the disk operations
are being queued by the thread that is associated with the process that
is really requesting the I/O. At that point, the disk queues are not
visible, so the I/O for positively niced processes is always slowed
down whether or not there is other activity on the disk.
On the issue of scaling HZ, I believe that the current scheme is
better than using a fixed quantum of time. As machines and I/O
subsystems get faster, the resolution on the clock also rises.
So, ten years from now we will be slowing things down for shorter
periods of time, but the proportional effect on the system will
be about the same as it is today. So, I view this as a feature
rather than a drawback. Hence this patch sticks with using HZ.
Sponsored by: DARPA & NAI Labs.
Reviewed by: Poul-Henning Kamp <phk@critter.freebsd.dk>
2002-10-22 00:59:49 +00:00
|
|
|
struct thread *td = curthread;
|
|
|
|
|
2002-11-01 15:32:12 +00:00
|
|
|
bp = ap->a_bp;
|
|
|
|
vp = ap->a_vp;
|
|
|
|
|
|
|
|
KASSERT(bp->b_iocmd == BIO_READ ||
|
|
|
|
bp->b_iocmd == BIO_WRITE ||
|
|
|
|
bp->b_iocmd == BIO_DELETE,
|
|
|
|
("Wrong b_iocmd buf=%p cmd=%d", bp, bp->b_iocmd));
|
|
|
|
|
This checkin reimplements the io-request priority hack in a way
that works in the new threaded kernel. It was commented out of
the disksort routine earlier this year for the reasons given in
kern/subr_disklabel.c (which is where this code used to reside
before it moved to kern/subr_disk.c):
----------------------------
revision 1.65
date: 2002/04/22 06:53:20; author: phk; state: Exp; lines: +5 -0
Comment out Kirks io-request priority hack until we can do this in a
civilized way which doesn't cause grief.
The problem is that it is not generally safe to cast a "struct bio
*" to a "struct buf *". Things like ccd, vinum, ata-raid and GEOM
constructs bio's which are not entrails of a struct buf.
Also, curthread may or may not have anything to do with the I/O request
at hand.
The correct solution can either be to tag struct bio's with a
priority derived from the requesting threads nice and have disksort
act on this field, this wouldn't address the "silly-seek syndrome"
where two equal processes bang the diskheads from one edge to the
other of the disk repeatedly.
Alternatively, and probably better: a sleep should be introduced
either at the time the I/O is requested or at the time it is completed
where we can be sure to sleep in the right thread.
The sleep also needs to be in constant timeunits, 1/hz can be practicaly
any sub-second size, at high HZ the current code practically doesn't
do anything.
----------------------------
As suggested in this comment, it is no longer located in the disk sort
routine, but rather now resides in spec_strategy where the disk operations
are being queued by the thread that is associated with the process that
is really requesting the I/O. At that point, the disk queues are not
visible, so the I/O for positively niced processes is always slowed
down whether or not there is other activity on the disk.
On the issue of scaling HZ, I believe that the current scheme is
better than using a fixed quantum of time. As machines and I/O
subsystems get faster, the resolution on the clock also rises.
So, ten years from now we will be slowing things down for shorter
periods of time, but the proportional effect on the system will
be about the same as it is today. So, I view this as a feature
rather than a drawback. Hence this patch sticks with using HZ.
Sponsored by: DARPA & NAI Labs.
Reviewed by: Poul-Henning Kamp <phk@critter.freebsd.dk>
2002-10-22 00:59:49 +00:00
|
|
|
/*
|
|
|
|
* Slow down disk requests for niced processes.
|
|
|
|
*/
|
2002-11-04 07:29:20 +00:00
|
|
|
if (doslowdown && td && td->td_ksegrp->kg_nice > 0) {
|
This checkin reimplements the io-request priority hack in a way
that works in the new threaded kernel. It was commented out of
the disksort routine earlier this year for the reasons given in
kern/subr_disklabel.c (which is where this code used to reside
before it moved to kern/subr_disk.c):
----------------------------
revision 1.65
date: 2002/04/22 06:53:20; author: phk; state: Exp; lines: +5 -0
Comment out Kirks io-request priority hack until we can do this in a
civilized way which doesn't cause grief.
The problem is that it is not generally safe to cast a "struct bio
*" to a "struct buf *". Things like ccd, vinum, ata-raid and GEOM
constructs bio's which are not entrails of a struct buf.
Also, curthread may or may not have anything to do with the I/O request
at hand.
The correct solution can either be to tag struct bio's with a
priority derived from the requesting threads nice and have disksort
act on this field, this wouldn't address the "silly-seek syndrome"
where two equal processes bang the diskheads from one edge to the
other of the disk repeatedly.
Alternatively, and probably better: a sleep should be introduced
either at the time the I/O is requested or at the time it is completed
where we can be sure to sleep in the right thread.
The sleep also needs to be in constant timeunits, 1/hz can be practicaly
any sub-second size, at high HZ the current code practically doesn't
do anything.
----------------------------
As suggested in this comment, it is no longer located in the disk sort
routine, but rather now resides in spec_strategy where the disk operations
are being queued by the thread that is associated with the process that
is really requesting the I/O. At that point, the disk queues are not
visible, so the I/O for positively niced processes is always slowed
down whether or not there is other activity on the disk.
On the issue of scaling HZ, I believe that the current scheme is
better than using a fixed quantum of time. As machines and I/O
subsystems get faster, the resolution on the clock also rises.
So, ten years from now we will be slowing things down for shorter
periods of time, but the proportional effect on the system will
be about the same as it is today. So, I view this as a feature
rather than a drawback. Hence this patch sticks with using HZ.
Sponsored by: DARPA & NAI Labs.
Reviewed by: Poul-Henning Kamp <phk@critter.freebsd.dk>
2002-10-22 00:59:49 +00:00
|
|
|
mtx_lock(&strategy_mtx);
|
|
|
|
msleep(&strategy_mtx, &strategy_mtx,
|
|
|
|
PPAUSE | PCATCH | PDROP, "ioslow",
|
|
|
|
td->td_ksegrp->kg_nice);
|
|
|
|
}
|
2002-02-10 22:00:20 +00:00
|
|
|
if (bp->b_iocmd == BIO_WRITE) {
|
2000-07-24 05:28:33 +00:00
|
|
|
if ((bp->b_flags & B_VALIDSUSPWRT) == 0 &&
|
|
|
|
bp->b_vp != NULL && bp->b_vp->v_mount != NULL &&
|
|
|
|
(bp->b_vp->v_mount->mnt_kern_flag & MNTK_SUSPENDED) != 0)
|
2000-07-11 22:07:57 +00:00
|
|
|
panic("spec_strategy: bad I/O");
|
2000-07-24 05:28:33 +00:00
|
|
|
bp->b_flags &= ~B_VALIDSUSPWRT;
|
2000-07-11 22:07:57 +00:00
|
|
|
if (LIST_FIRST(&bp->b_dep) != NULL)
|
|
|
|
buf_start(bp);
|
2002-08-04 10:29:36 +00:00
|
|
|
mp_fixme("This should require the vnode lock.");
|
|
|
|
if ((vp->v_vflag & VV_COPYONWRITE) &&
|
|
|
|
vp->v_rdev->si_copyonwrite &&
|
2001-03-07 07:09:55 +00:00
|
|
|
(error = (*vp->v_rdev->si_copyonwrite)(vp, bp)) != 0 &&
|
2000-07-11 22:07:57 +00:00
|
|
|
error != EOPNOTSUPP) {
|
|
|
|
bp->b_io.bio_error = error;
|
|
|
|
bp->b_io.bio_flags |= BIO_ERROR;
|
|
|
|
biodone(&bp->b_io);
|
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
}
|
1999-12-01 02:09:30 +00:00
|
|
|
/*
|
|
|
|
* Collect statistics on synchronous and asynchronous read
|
|
|
|
* and write counts for disks that have associated filesystems.
|
|
|
|
*/
|
2000-10-09 17:31:39 +00:00
|
|
|
if (vn_isdisk(vp, NULL) && (mp = vp->v_rdev->si_mountpoint) != NULL) {
|
2000-03-20 10:44:49 +00:00
|
|
|
if (bp->b_iocmd == BIO_WRITE) {
|
1999-12-01 02:09:30 +00:00
|
|
|
if (bp->b_lock.lk_lockholder == LK_KERNPROC)
|
|
|
|
mp->mnt_stat.f_asyncwrites++;
|
|
|
|
else
|
|
|
|
mp->mnt_stat.f_syncwrites++;
|
|
|
|
} else {
|
|
|
|
if (bp->b_lock.lk_lockholder == LK_KERNPROC)
|
|
|
|
mp->mnt_stat.f_asyncreads++;
|
|
|
|
else
|
|
|
|
mp->mnt_stat.f_syncreads++;
|
|
|
|
}
|
|
|
|
}
|
2002-03-05 13:25:57 +00:00
|
|
|
if (devsw(bp->b_dev) == NULL) {
|
|
|
|
bp->b_io.bio_error = ENXIO;
|
|
|
|
bp->b_io.bio_flags |= BIO_ERROR;
|
|
|
|
biodone(&bp->b_io);
|
|
|
|
return (0);
|
|
|
|
}
|
2002-09-27 19:47:59 +00:00
|
|
|
dsw = devsw(bp->b_dev);
|
|
|
|
KASSERT(dsw->d_strategy != NULL,
|
2002-02-10 22:00:20 +00:00
|
|
|
("No strategy on dev %s responsible for buffer %p\n",
|
1999-10-08 19:07:23 +00:00
|
|
|
devtoname(bp->b_dev), bp));
|
2002-09-27 19:47:59 +00:00
|
|
|
|
|
|
|
if (dsw->d_flags & D_NOGIANT) {
|
|
|
|
DROP_GIANT();
|
2003-01-03 05:57:35 +00:00
|
|
|
DEV_STRATEGY(bp);
|
2002-09-27 19:47:59 +00:00
|
|
|
PICKUP_GIANT();
|
|
|
|
} else
|
2003-01-03 05:57:35 +00:00
|
|
|
DEV_STRATEGY(bp);
|
2002-09-27 19:47:59 +00:00
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
1998-09-12 20:21:54 +00:00
|
|
|
static int
|
1998-09-05 14:13:12 +00:00
|
|
|
spec_freeblks(ap)
|
|
|
|
struct vop_freeblks_args /* {
|
|
|
|
struct vnode *a_vp;
|
|
|
|
daddr_t a_addr;
|
|
|
|
daddr_t a_length;
|
|
|
|
} */ *ap;
|
|
|
|
{
|
|
|
|
struct cdevsw *bsw;
|
|
|
|
struct buf *bp;
|
|
|
|
|
1999-11-09 14:15:33 +00:00
|
|
|
/*
|
|
|
|
* XXX: This assumes that strategy does the deed right away.
|
|
|
|
* XXX: this may not be TRTTD.
|
|
|
|
*/
|
1999-08-13 10:29:38 +00:00
|
|
|
bsw = devsw(ap->a_vp->v_rdev);
|
1998-09-05 14:13:12 +00:00
|
|
|
if ((bsw->d_flags & D_CANFREE) == 0)
|
1998-09-12 20:21:54 +00:00
|
|
|
return (0);
|
1998-09-05 14:13:12 +00:00
|
|
|
bp = geteblk(ap->a_length);
|
2000-03-20 10:44:49 +00:00
|
|
|
bp->b_iocmd = BIO_DELETE;
|
1998-09-05 14:13:12 +00:00
|
|
|
bp->b_dev = ap->a_vp->v_rdev;
|
|
|
|
bp->b_blkno = ap->a_addr;
|
|
|
|
bp->b_offset = dbtob(ap->a_addr);
|
|
|
|
bp->b_bcount = ap->a_length;
|
2001-01-30 10:06:08 +00:00
|
|
|
BUF_KERNPROC(bp);
|
2003-01-03 05:57:35 +00:00
|
|
|
DEV_STRATEGY(bp);
|
1998-09-12 20:21:54 +00:00
|
|
|
return (0);
|
1998-09-05 14:13:12 +00:00
|
|
|
}
|
|
|
|
|
1994-05-24 10:09:53 +00:00
|
|
|
/*
|
1999-09-20 23:17:47 +00:00
|
|
|
* Implement degenerate case where the block requested is the block
|
|
|
|
* returned, and assume that the entire device is contiguous in regards
|
|
|
|
* to the contiguous block range (runp and runb).
|
1994-05-24 10:09:53 +00:00
|
|
|
*/
|
1997-10-15 13:24:07 +00:00
|
|
|
static int
|
1994-05-24 10:09:53 +00:00
|
|
|
spec_bmap(ap)
|
|
|
|
struct vop_bmap_args /* {
|
|
|
|
struct vnode *a_vp;
|
|
|
|
daddr_t a_bn;
|
|
|
|
struct vnode **a_vpp;
|
|
|
|
daddr_t *a_bnp;
|
1995-09-04 00:21:16 +00:00
|
|
|
int *a_runp;
|
|
|
|
int *a_runb;
|
1994-05-24 10:09:53 +00:00
|
|
|
} */ *ap;
|
|
|
|
{
|
1999-09-20 23:17:47 +00:00
|
|
|
struct vnode *vp = ap->a_vp;
|
|
|
|
int runp = 0;
|
|
|
|
int runb = 0;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
|
|
|
if (ap->a_vpp != NULL)
|
1999-09-20 23:17:47 +00:00
|
|
|
*ap->a_vpp = vp;
|
1994-05-24 10:09:53 +00:00
|
|
|
if (ap->a_bnp != NULL)
|
|
|
|
*ap->a_bnp = ap->a_bn;
|
1999-11-09 14:15:33 +00:00
|
|
|
if (vp->v_mount != NULL)
|
1999-09-20 23:17:47 +00:00
|
|
|
runp = runb = MAXBSIZE / vp->v_mount->mnt_stat.f_iosize;
|
1995-02-03 06:46:28 +00:00
|
|
|
if (ap->a_runp != NULL)
|
1999-09-20 23:17:47 +00:00
|
|
|
*ap->a_runp = runp;
|
1995-09-04 00:21:16 +00:00
|
|
|
if (ap->a_runb != NULL)
|
1999-09-20 23:17:47 +00:00
|
|
|
*ap->a_runb = runb;
|
1994-05-24 10:09:53 +00:00
|
|
|
return (0);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Device close routine
|
|
|
|
*/
|
|
|
|
/* ARGSUSED */
|
1997-10-15 13:24:07 +00:00
|
|
|
static int
|
1994-05-24 10:09:53 +00:00
|
|
|
spec_close(ap)
|
|
|
|
struct vop_close_args /* {
|
|
|
|
struct vnode *a_vp;
|
|
|
|
int a_fflag;
|
|
|
|
struct ucred *a_cred;
|
2001-09-12 08:38:13 +00:00
|
|
|
struct thread *a_td;
|
1994-05-24 10:09:53 +00:00
|
|
|
} */ *ap;
|
|
|
|
{
|
2002-02-23 11:12:57 +00:00
|
|
|
struct vnode *vp = ap->a_vp, *oldvp;
|
2001-09-12 08:38:13 +00:00
|
|
|
struct thread *td = ap->a_td;
|
1994-05-24 10:09:53 +00:00
|
|
|
dev_t dev = vp->v_rdev;
|
2002-09-27 19:47:59 +00:00
|
|
|
struct cdevsw *dsw;
|
|
|
|
int error;
|
1994-05-24 10:09:53 +00:00
|
|
|
|
1999-11-09 14:15:33 +00:00
|
|
|
/*
|
|
|
|
* Hack: a tty device that is a controlling terminal
|
|
|
|
* has a reference from the session structure.
|
|
|
|
* We cannot easily tell that a character device is
|
|
|
|
* a controlling terminal, unless it is the closing
|
|
|
|
* process' controlling terminal. In that case,
|
|
|
|
* if the reference count is 2 (this last descriptor
|
|
|
|
* plus the session), release the reference from the session.
|
|
|
|
*/
|
2002-08-04 10:29:36 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* This needs to be rewritten to take the vp interlock into
|
|
|
|
* consideration.
|
|
|
|
*/
|
|
|
|
|
2002-09-27 19:47:59 +00:00
|
|
|
dsw = devsw(dev);
|
2002-02-23 11:12:57 +00:00
|
|
|
oldvp = NULL;
|
2002-04-16 17:11:34 +00:00
|
|
|
sx_xlock(&proctree_lock);
|
2002-09-25 02:29:49 +00:00
|
|
|
if (td && vp == td->td_proc->p_session->s_ttyvp) {
|
2002-02-23 11:12:57 +00:00
|
|
|
SESS_LOCK(td->td_proc->p_session);
|
2002-09-25 02:29:49 +00:00
|
|
|
VI_LOCK(vp);
|
2002-09-26 02:54:30 +00:00
|
|
|
if (vcount(vp) == 2 && (vp->v_iflag & VI_XLOCK) == 0) {
|
2002-09-25 02:29:49 +00:00
|
|
|
td->td_proc->p_session->s_ttyvp = NULL;
|
2002-09-26 02:54:30 +00:00
|
|
|
oldvp = vp;
|
|
|
|
}
|
2002-09-25 02:29:49 +00:00
|
|
|
VI_UNLOCK(vp);
|
2002-02-23 11:12:57 +00:00
|
|
|
SESS_UNLOCK(td->td_proc->p_session);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
2002-04-16 17:11:34 +00:00
|
|
|
sx_xunlock(&proctree_lock);
|
2002-02-23 11:12:57 +00:00
|
|
|
if (oldvp != NULL)
|
|
|
|
vrele(oldvp);
|
1999-08-13 16:29:27 +00:00
|
|
|
/*
|
|
|
|
* We do not want to really close the device if it
|
|
|
|
* is still in use unless we are trying to close it
|
|
|
|
* forcibly. Since every use (buffer, vnode, swap, cmap)
|
|
|
|
* holds a reference to the vnode, and because we mark
|
|
|
|
* any other vnodes that alias this device, when the
|
|
|
|
* sum of the reference counts on all the aliased
|
|
|
|
* vnodes descends to one, we are on last close.
|
|
|
|
*/
|
2002-09-25 02:29:49 +00:00
|
|
|
VI_LOCK(vp);
|
2002-08-04 10:29:36 +00:00
|
|
|
if (vp->v_iflag & VI_XLOCK) {
|
2002-02-10 22:00:20 +00:00
|
|
|
/* Forced close. */
|
2002-09-27 19:47:59 +00:00
|
|
|
} else if (dsw->d_flags & D_TRACKCLOSE) {
|
2002-02-10 22:00:20 +00:00
|
|
|
/* Keep device updated on status. */
|
1999-08-13 16:29:27 +00:00
|
|
|
} else if (vcount(vp) > 1) {
|
2002-09-25 02:29:49 +00:00
|
|
|
VI_UNLOCK(vp);
|
1999-08-13 16:29:27 +00:00
|
|
|
return (0);
|
|
|
|
}
|
2002-09-25 02:29:49 +00:00
|
|
|
VI_UNLOCK(vp);
|
2002-09-27 19:47:59 +00:00
|
|
|
if (dsw->d_flags & D_NOGIANT) {
|
|
|
|
DROP_GIANT();
|
|
|
|
error = dsw->d_close(dev, ap->a_fflag, S_IFCHR, td);
|
|
|
|
PICKUP_GIANT();
|
|
|
|
} else
|
|
|
|
error = dsw->d_close(dev, ap->a_fflag, S_IFCHR, td);
|
|
|
|
return (error);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Print out the contents of a special device vnode.
|
|
|
|
*/
|
1997-10-15 13:24:07 +00:00
|
|
|
static int
|
1994-05-24 10:09:53 +00:00
|
|
|
spec_print(ap)
|
|
|
|
struct vop_print_args /* {
|
|
|
|
struct vnode *a_vp;
|
|
|
|
} */ *ap;
|
|
|
|
{
|
|
|
|
|
2002-09-14 09:02:28 +00:00
|
|
|
printf("tag %s, dev %s\n", ap->a_vp->v_tag,
|
|
|
|
devtoname(ap->a_vp->v_rdev));
|
1994-05-25 09:21:21 +00:00
|
|
|
return (0);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Special device advisory byte-level locks.
|
|
|
|
*/
|
|
|
|
/* ARGSUSED */
|
1997-10-15 13:24:07 +00:00
|
|
|
static int
|
1994-05-24 10:09:53 +00:00
|
|
|
spec_advlock(ap)
|
|
|
|
struct vop_advlock_args /* {
|
|
|
|
struct vnode *a_vp;
|
|
|
|
caddr_t a_id;
|
|
|
|
int a_op;
|
|
|
|
struct flock *a_fl;
|
|
|
|
int a_flags;
|
|
|
|
} */ *ap;
|
|
|
|
{
|
|
|
|
|
1996-12-19 18:16:33 +00:00
|
|
|
return (ap->a_flags & F_FLOCK ? EOPNOTSUPP : EINVAL);
|
1994-05-24 10:09:53 +00:00
|
|
|
}
|
|
|
|
|
1995-11-18 12:49:14 +00:00
|
|
|
static void
|
|
|
|
spec_getpages_iodone(bp)
|
|
|
|
struct buf *bp;
|
|
|
|
{
|
|
|
|
|
1995-10-23 02:23:29 +00:00
|
|
|
bp->b_flags |= B_DONE;
|
|
|
|
wakeup(bp);
|
|
|
|
}
|
|
|
|
|
1997-10-15 13:24:07 +00:00
|
|
|
static int
|
1995-10-23 02:23:29 +00:00
|
|
|
spec_getpages(ap)
|
|
|
|
struct vop_getpages_args *ap;
|
|
|
|
{
|
|
|
|
vm_offset_t kva;
|
1995-11-18 12:49:14 +00:00
|
|
|
int error;
|
|
|
|
int i, pcount, size, s;
|
1995-10-23 02:23:29 +00:00
|
|
|
daddr_t blkno;
|
|
|
|
struct buf *bp;
|
1998-03-08 08:46:18 +00:00
|
|
|
vm_page_t m;
|
1996-10-06 21:19:33 +00:00
|
|
|
vm_ooffset_t offset;
|
This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
|
|
|
int toff, nextoff, nread;
|
1997-05-01 19:12:22 +00:00
|
|
|
struct vnode *vp = ap->a_vp;
|
|
|
|
int blksiz;
|
This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
|
|
|
int gotreqpage;
|
1995-10-23 02:23:29 +00:00
|
|
|
|
2001-07-04 16:20:28 +00:00
|
|
|
GIANT_REQUIRED;
|
|
|
|
|
1995-11-18 12:49:14 +00:00
|
|
|
error = 0;
|
1995-10-23 02:23:29 +00:00
|
|
|
pcount = round_page(ap->a_count) / PAGE_SIZE;
|
1995-11-18 12:49:14 +00:00
|
|
|
|
1995-10-23 02:23:29 +00:00
|
|
|
/*
|
2002-02-10 22:00:20 +00:00
|
|
|
* Calculate the offset of the transfer and do a sanity check.
|
1999-02-25 05:22:30 +00:00
|
|
|
* FreeBSD currently only supports an 8 TB range due to b_blkno
|
|
|
|
* being in DEV_BSIZE ( usually 512 ) byte chunks on call to
|
|
|
|
* VOP_STRATEGY. XXX
|
1995-10-23 02:23:29 +00:00
|
|
|
*/
|
1996-10-06 21:19:33 +00:00
|
|
|
offset = IDX_TO_OFF(ap->a_m[0]->pindex) + ap->a_offset;
|
|
|
|
blkno = btodb(offset);
|
|
|
|
|
1995-10-23 02:23:29 +00:00
|
|
|
/*
|
1999-02-25 05:22:30 +00:00
|
|
|
* Round up physical size for real devices. We cannot round using
|
|
|
|
* v_mount's block size data because v_mount has nothing to do with
|
|
|
|
* the device. i.e. it's usually '/dev'. We need the physical block
|
|
|
|
* size for the device itself.
|
|
|
|
*
|
2000-10-09 17:31:39 +00:00
|
|
|
* We can't use v_rdev->si_mountpoint because it only exists when the
|
1999-08-13 10:10:12 +00:00
|
|
|
* block device is mounted. However, we can use v_rdev.
|
1995-10-23 02:23:29 +00:00
|
|
|
*/
|
1999-02-25 05:22:30 +00:00
|
|
|
|
2000-01-10 12:04:27 +00:00
|
|
|
if (vn_isdisk(vp, NULL))
|
1999-08-13 10:10:12 +00:00
|
|
|
blksiz = vp->v_rdev->si_bsize_phys;
|
1997-05-01 19:12:22 +00:00
|
|
|
else
|
|
|
|
blksiz = DEV_BSIZE;
|
1999-02-25 05:22:30 +00:00
|
|
|
|
1997-05-01 19:12:22 +00:00
|
|
|
size = (ap->a_count + blksiz - 1) & ~(blksiz - 1);
|
1995-10-23 02:23:29 +00:00
|
|
|
|
1999-01-21 08:29:12 +00:00
|
|
|
bp = getpbuf(NULL);
|
1995-11-18 12:49:14 +00:00
|
|
|
kva = (vm_offset_t)bp->b_data;
|
1995-10-23 02:23:29 +00:00
|
|
|
|
|
|
|
/*
|
1995-11-18 12:49:14 +00:00
|
|
|
* Map the pages to be read into the kva.
|
1995-10-23 02:23:29 +00:00
|
|
|
*/
|
|
|
|
pmap_qenter(kva, ap->a_m, pcount);
|
|
|
|
|
1995-11-18 12:49:14 +00:00
|
|
|
/* Build a minimal buffer header. */
|
2000-03-20 10:44:49 +00:00
|
|
|
bp->b_iocmd = BIO_READ;
|
1995-10-23 02:23:29 +00:00
|
|
|
bp->b_iodone = spec_getpages_iodone;
|
1995-11-18 12:49:14 +00:00
|
|
|
|
|
|
|
/* B_PHYS is not set, but it is nice to fill this in. */
|
2001-10-11 23:38:17 +00:00
|
|
|
KASSERT(bp->b_rcred == NOCRED, ("leaking read ucred"));
|
|
|
|
KASSERT(bp->b_wcred == NOCRED, ("leaking write ucred"));
|
2002-02-27 18:32:23 +00:00
|
|
|
bp->b_rcred = crhold(curthread->td_ucred);
|
|
|
|
bp->b_wcred = crhold(curthread->td_ucred);
|
1995-10-23 02:23:29 +00:00
|
|
|
bp->b_blkno = blkno;
|
|
|
|
bp->b_lblkno = blkno;
|
|
|
|
pbgetvp(ap->a_vp, bp);
|
|
|
|
bp->b_bcount = size;
|
|
|
|
bp->b_bufsize = size;
|
1998-03-08 08:46:18 +00:00
|
|
|
bp->b_resid = 0;
|
2000-12-26 19:41:38 +00:00
|
|
|
bp->b_runningbufspace = bp->b_bufsize;
|
|
|
|
runningbufspace += bp->b_runningbufspace;
|
1995-10-23 02:23:29 +00:00
|
|
|
|
|
|
|
cnt.v_vnodein++;
|
|
|
|
cnt.v_vnodepgsin += pcount;
|
|
|
|
|
1995-11-18 12:49:14 +00:00
|
|
|
/* Do the input. */
|
2003-01-03 06:32:15 +00:00
|
|
|
VOP_STRATEGY(bp->b_vp, bp);
|
1995-10-23 02:23:29 +00:00
|
|
|
|
|
|
|
s = splbio();
|
|
|
|
|
1995-11-18 12:49:14 +00:00
|
|
|
/* We definitely need to be at splbio here. */
|
|
|
|
while ((bp->b_flags & B_DONE) == 0)
|
1997-12-29 00:25:11 +00:00
|
|
|
tsleep(bp, PVM, "spread", 0);
|
1995-11-18 12:49:14 +00:00
|
|
|
|
1995-10-23 02:23:29 +00:00
|
|
|
splx(s);
|
1995-11-18 12:49:14 +00:00
|
|
|
|
2000-04-02 15:24:56 +00:00
|
|
|
if ((bp->b_ioflags & BIO_ERROR) != 0) {
|
This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
|
|
|
if (bp->b_error)
|
|
|
|
error = bp->b_error;
|
|
|
|
else
|
|
|
|
error = EIO;
|
|
|
|
}
|
|
|
|
|
|
|
|
nread = size - bp->b_resid;
|
1995-10-23 02:23:29 +00:00
|
|
|
|
This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
|
|
|
if (nread < ap->a_count) {
|
|
|
|
bzero((caddr_t)kva + nread,
|
|
|
|
ap->a_count - nread);
|
|
|
|
}
|
1995-10-23 02:23:29 +00:00
|
|
|
pmap_qremove(kva, pcount);
|
|
|
|
|
This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
|
|
|
gotreqpage = 0;
|
2002-07-27 05:08:49 +00:00
|
|
|
vm_page_lock_queues();
|
This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
|
|
|
for (i = 0, toff = 0; i < pcount; i++, toff = nextoff) {
|
|
|
|
nextoff = toff + PAGE_SIZE;
|
|
|
|
m = ap->a_m[i];
|
|
|
|
|
|
|
|
m->flags &= ~PG_ZERO;
|
1995-10-23 02:23:29 +00:00
|
|
|
|
This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
|
|
|
if (nextoff <= nread) {
|
|
|
|
m->valid = VM_PAGE_BITS_ALL;
|
1999-08-17 04:02:34 +00:00
|
|
|
vm_page_undirty(m);
|
This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
|
|
|
} else if (toff < nread) {
|
1999-04-05 19:38:30 +00:00
|
|
|
/*
|
|
|
|
* Since this is a VM request, we have to supply the
|
|
|
|
* unaligned offset to allow vm_page_set_validclean()
|
|
|
|
* to zero sub-DEV_BSIZE'd portions of the page.
|
|
|
|
*/
|
|
|
|
vm_page_set_validclean(m, 0, nread - toff);
|
This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
|
|
|
} else {
|
|
|
|
m->valid = 0;
|
1999-08-17 04:02:34 +00:00
|
|
|
vm_page_undirty(m);
|
This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
if (i != ap->a_reqpage) {
|
1995-10-23 02:23:29 +00:00
|
|
|
/*
|
1995-11-18 12:49:14 +00:00
|
|
|
* Just in case someone was asking for this page we
|
|
|
|
* now tell them that it is ok to use.
|
1995-10-23 02:23:29 +00:00
|
|
|
*/
|
This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
|
|
|
if (!error || (m->valid == VM_PAGE_BITS_ALL)) {
|
|
|
|
if (m->valid) {
|
|
|
|
if (m->flags & PG_WANTED) {
|
|
|
|
vm_page_activate(m);
|
|
|
|
} else {
|
|
|
|
vm_page_deactivate(m);
|
|
|
|
}
|
1998-09-04 08:06:57 +00:00
|
|
|
vm_page_wakeup(m);
|
This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
|
|
|
} else {
|
|
|
|
vm_page_free(m);
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
vm_page_free(m);
|
|
|
|
}
|
|
|
|
} else if (m->valid) {
|
|
|
|
gotreqpage = 1;
|
1999-04-05 19:38:30 +00:00
|
|
|
/*
|
|
|
|
* Since this is a VM request, we need to make the
|
|
|
|
* entire page presentable by zeroing invalid sections.
|
|
|
|
*/
|
|
|
|
if (m->valid != VM_PAGE_BITS_ALL)
|
2002-02-10 22:00:20 +00:00
|
|
|
vm_page_zero_invalid(m, FALSE);
|
1995-10-23 02:23:29 +00:00
|
|
|
}
|
|
|
|
}
|
2002-07-27 05:08:49 +00:00
|
|
|
vm_page_unlock_queues();
|
This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
|
|
|
if (!gotreqpage) {
|
1998-03-08 08:46:18 +00:00
|
|
|
m = ap->a_m[ap->a_reqpage];
|
1998-07-11 07:46:16 +00:00
|
|
|
printf(
|
1999-09-03 09:14:36 +00:00
|
|
|
"spec_getpages:(%s) I/O read failure: (error=%d) bp %p vp %p\n",
|
|
|
|
devtoname(bp->b_dev), error, bp, bp->b_vp);
|
1998-07-11 07:46:16 +00:00
|
|
|
printf(
|
|
|
|
" size: %d, resid: %ld, a_count: %d, valid: 0x%x\n",
|
|
|
|
size, bp->b_resid, ap->a_count, m->valid);
|
|
|
|
printf(
|
|
|
|
" nread: %d, reqpage: %d, pindex: %lu, pcount: %d\n",
|
|
|
|
nread, ap->a_reqpage, (u_long)m->pindex, pcount);
|
1998-03-08 08:46:18 +00:00
|
|
|
/*
|
|
|
|
* Free the buffer header back to the swap buffer pool.
|
|
|
|
*/
|
1999-01-21 08:29:12 +00:00
|
|
|
relpbuf(bp, NULL);
|
This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
|
|
|
return VM_PAGER_ERROR;
|
|
|
|
}
|
1998-03-08 08:46:18 +00:00
|
|
|
/*
|
|
|
|
* Free the buffer header back to the swap buffer pool.
|
|
|
|
*/
|
1999-01-21 08:29:12 +00:00
|
|
|
relpbuf(bp, NULL);
|
This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
|
|
|
return VM_PAGER_OK;
|
1995-10-23 02:23:29 +00:00
|
|
|
}
|