Commit Graph

93 Commits

Author SHA1 Message Date
gibbs
6833acab2d Correct bioq_disksort so that bioq_insert_tail() offers barrier semantic.
Add the BIO_ORDERED flag for struct bio and update bio clients to use it.

The barrier semantics of bioq_insert_tail() were broken in two ways:

 o In bioq_disksort(), an added bio could be inserted at the head of
   the queue, even when a barrier was present, if the sort key for
   the new entry was less than that of the last queued barrier bio.

 o The last_offset used to generate the sort key for newly queued bios
   did not stay at the position of the barrier until either the
   barrier was de-queued, or a new barrier (which updates last_offset)
   was queued.  When a barrier is in effect, we know that the disk
   will pass through the barrier position just before the
   "blocked bios" are released, so using the barrier's offset for
   last_offset is the optimal choice.

sys/geom/sched/subr_disk.c:
sys/kern/subr_disk.c:
	o Update last_offset in bioq_insert_tail().

	o Only update last_offset in bioq_remove() if the removed bio is
	  at the head of the queue (typically due to a call via
	  bioq_takefirst()) and no barrier is active.

	o In bioq_disksort(), if we have a barrier (insert_point is non-NULL),
	  set prev to the barrier and cur to it's next element.  Now that
	  last_offset is kept at the barrier position, this change isn't
	  strictly necessary, but since we have to take a decision branch
	  anyway, it does avoid one, no-op, loop iteration in the while
	  loop that immediately follows.

	o In bioq_disksort(), bypass the normal sort for bios with the
	  BIO_ORDERED attribute and instead insert them into the queue
	  with bioq_insert_tail().  bioq_insert_tail() not only gives
	  the desired command order during insertion, but also provides
	  barrier semantics so that commands disksorted in the future
	  cannot pass the just enqueued transaction.

sys/sys/bio.h:
	Add BIO_ORDERED as bit 4 of the bio_flags field in struct bio.

sys/cam/ata/ata_da.c:
sys/cam/scsi/scsi_da.c
	Use an ordered command for SCSI/ATA-NCQ commands issued in
	response to bios with the BIO_ORDERED flag set.

sys/cam/scsi/scsi_da.c
	Use an ordered tag when issuing a synchronize cache command.

	Wrap some lines to 80 columns.

sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
sys/geom/geom_io.c
	Mark bios with the BIO_FLUSH command as BIO_ORDERED.

Sponsored by:	Spectra Logic Corporation
MFC after:	1 month
2010-09-02 19:40:28 +00:00
trasz
3e54021797 Revert r210225 - turns out I was wrong; the "/*-" is not license-only
thing; it's also used to indicate that the comment should not be automatically
rewrapped.

Explained by:	cperciva@
2010-07-18 20:57:53 +00:00
trasz
935237a66a The "/*-" comment marker is supposed to denote copyrights. Remove non-copyright
occurences from sys/sys/ and sys/kern/.
2010-07-18 20:23:10 +00:00
luigi
8faaf7bcde Clarify and reimplement the bioq API so that bioq_disksort() has
the correct behaviour (sorting by distance from the current head position
in the scan direction) and bioq_insert_head() and bioq_insert_tail()
have a well defined (and useful) behaviour, especially when intermixed
with calls to bioq_disksort().

In particular:
- fix a bug in the existing bioq_disksort() that did not use the
  current head position correctly;
- redefine semantics of bioq_insert_head() and bioq_insert_tail().
  bioq_insert_tail() can now be used as a barrier
  between previous and subsequent calls to bioq_disksort().

The code is heavily documented in the source code so please refer
to that for the details.

Much of this code comes from Fabio Checconi. Also thanks to Kirk
for feedback on the (re)definition of bioq_insert_tail().

NOTE: in the current tree there is only a handful of files which
intermix calls to bioq_disksort() with bioq_insert_head() and
bioq_insert_tail(). The ordering of the queue in these situation
was not specified (nor easy to figure out) before, so I doubt any
of that code could be affected by the specification of the API.

Also note that the current implementation is significantly simpler
than the previous one (also used in ata_sort_queue()).
It would be useful to reimplement ata_sort_queue() using
the same code used in bioq_disksort().

MFC after:	1 week
2009-02-13 11:36:32 +00:00
imp
02f34b7ded Make bioq_disksort have a ANSI-C definition rather than a K&R definition. 2009-02-03 07:53:51 +00:00
pjd
d5cc909451 Add a new I/O request - BIO_FLUSH, which basically tells providers below to
flush their caches. For now will mostly be used by disks to flush their
write cache.

Sponsored by:	home.pl
2006-10-31 21:11:21 +00:00
delphij
5835e6264a Unexpand TAILQ_FIRST(foo) == NULL to TAILQ_EMPTY(foo). 2006-05-29 05:43:26 +00:00
rwatson
bcba87c522 When calling bioq_first() to see if a queue is empty in bioq_disksort(),
don't save the return value as we won't use it.

Noticed by:	Coverity Prevent analysis tool
MFC after:	3 days
2006-01-13 23:27:12 +00:00
jeff
dec8b83b9d - Fix insertions of bios which represent data earlier than anything else
in the queue.  The insertion sort assumed this had already been taken
   care of.

Spotted by:	Antoine Brodin
Approved by:	re (scottl)
2005-06-15 23:32:07 +00:00
jeff
2599199edf - Dramatically simplify bioqdisksort(). We no longer do ordered bios so
most of the code to deal with them has been dead for sometime.  Simplify
   the code by doing an insert sort hinted by the current head position.

Met with apathy by:	arch@
2005-06-12 22:32:29 +00:00
imp
20280f1431 /* -> /*- for copyright notices, minor format tweaks as necessary 2005-01-06 23:35:40 +00:00
pjd
77ebddb33c Add bioq_insert_head() function.
OK'd by:	phk
2004-12-13 12:57:21 +00:00
phk
59d327838d Add bioq_takefirst().
If the bioq is empty, NULL is returned.  Otherwise the front element
is removed and returned.

This can simplify locking in many drivers from:

	lock()
	bp = bioq_first(bq);
	if (bp == NULL) {
		unlock()
		return
	}
	bioq_remove(bp, bq)
	unlock
to:
	lock()
	bp = bioq_takefirst(bq);
	unlock()
	if (bp == NULL)
		return;
2004-08-19 19:51:51 +00:00
phk
adc5cdd07a Report bio_pblkbo instead of bio_blkno. 2003-10-18 17:27:10 +00:00
phk
b60969a35e Make bioq_disksort() sort on the bio_offset field instead of bio_pblkno. 2003-10-18 15:50:56 +00:00
phk
637477251d Made use of 'error' argument, which was unused (by mistake) before.
Submitted by:    Pawel Jakub Dawidek <nick@garage.freebsd.pl>
2003-10-14 08:09:43 +00:00
obrien
3b8fff9e4c Use __FBSDID(). 2003-06-11 00:56:59 +00:00
phk
6dd4776ecc Don't include <sys/disklabel.h> 2003-04-16 20:57:35 +00:00
phk
8207e9e353 Remove BIO_SETATTR from non-GEOM part of kernel as well. 2003-04-03 19:22:32 +00:00
phk
1becd36845 #include <geom/geom_disk.h> 2003-04-01 19:00:38 +00:00
phk
78929984b3 Introduce bioq_flush() function. 2003-04-01 12:49:40 +00:00
phk
bb8188c895 retire the "busy" field in bioqueues, it's served it's purpose. 2003-03-30 10:16:31 +00:00
phk
a0fbf93755 Preparation commit before I start on the bioqueue lockdown:
Collect all the bits of bioqueue handing in subr_disk.c, vfs_bio.c is big
enough as it is and disksort already lives in subr_disk.c.
2003-03-30 08:51:23 +00:00
phk
e059b79437 Including <sys/stdint.h> is (almost?) universally only to be able to use
%j in printfs, so put a newsted include in <sys/systm.h> where the printf
prototype lives and save everybody else the trouble.
2003-03-18 08:45:25 +00:00
phk
31cc9c5b3f Don't pick up a name from the dev_t if it is not there. 2003-03-03 11:14:36 +00:00
phk
406e155335 NO_GEOM cleanup: remove #ifdef 2003-01-30 12:36:30 +00:00
alfred
bf8e8a6e8f Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
phk
1879cc94a8 Only include <sys/diskslice.h> ifdef NO_GEOM 2003-01-20 11:28:37 +00:00
mckusick
305e5868f3 This checkin reimplements the io-request priority hack in a way
that works in the new threaded kernel. It was commented out of
the disksort routine earlier this year for the reasons given in
kern/subr_disklabel.c (which is where this code used to reside
before it moved to kern/subr_disk.c):

----------------------------
revision 1.65
date: 2002/04/22 06:53:20;  author: phk;  state: Exp;  lines: +5 -0
Comment out Kirks io-request priority hack until we can do this in a
civilized way which doesn't cause grief.

The problem is that it is not generally safe to cast a "struct bio
*" to a "struct buf *".  Things like ccd, vinum, ata-raid and GEOM
constructs bio's which are not entrails of a struct buf.

Also, curthread may or may not have anything to do with the I/O request
at hand.

The correct solution can either be to tag struct bio's with a
priority derived from the requesting threads nice and have disksort
act on this field, this wouldn't address the "silly-seek syndrome"
where two equal processes bang the diskheads from one edge to the
other of the disk repeatedly.

Alternatively, and probably better: a sleep should be introduced
either at the time the I/O is requested or at the time it is completed
where we can be sure to sleep in the right thread.

The sleep also needs to be in constant timeunits, 1/hz can be practicaly
any sub-second size, at high HZ the current code practically doesn't
do anything.
----------------------------

As suggested in this comment, it is no longer located in the disk sort
routine, but rather now resides in spec_strategy where the disk operations
are being queued by the thread that is associated with the process that
is really requesting the I/O. At that point, the disk queues are not
visible, so the I/O for positively niced processes is always slowed
down whether or not there is other activity on the disk.

On the issue of scaling HZ, I believe that the current scheme is
better than using a fixed quantum of time. As machines and I/O
subsystems get faster, the resolution on the clock also rises.
So, ten years from now we will be slowing things down for shorter
periods of time, but the proportional effect on the system will
be about the same as it is today. So, I view this as a feature
rather than a drawback. Hence this patch sticks with using HZ.

Sponsored by:	DARPA & NAI Labs.
Reviewed by:	Poul-Henning Kamp <phk@critter.freebsd.dk>
2002-10-22 00:59:49 +00:00
cognet
54f5e2ef60 One #include <sys/sysctl.h> should be enough.
Approved by:	mux (mentor)
2002-10-21 18:40:40 +00:00
sobomax
e6dad384fb Separate fiels reported by disk_err() with spaces, so that output doesn't
look cryptic.

MFC after:	1 week
2002-10-17 23:48:29 +00:00
phk
8762cef261 Populate more fields of the disklabel for PC98.
Submitted by:	Kawanobe Koh <kawanobe@st.rim.or.jp>
2002-10-14 14:22:29 +00:00
phk
951c3e53b2 NB: This commit does *NOT* make GEOM the default in FreeBSD
NB: But it will enable it in all kernels not having options "NO_GEOM"

Put the GEOM related options into the intended order.

Add "options NO_GEOM" to all kernel configs apart from NOTES.

In some order of controlled fashion, the NO_GEOM options will be
removed, architecture by architecture in the coming days.

There are currently three known issues which may force people to
need the NO_GEOM option:

boot0cfg/fdisk:
        Tries to update the MBR while it is being used to control
        slices.  GEOM does not allow this as a direct operation.

SCSI floppy drives:
        Appearantly the scsi-da driver return "EBUSY" if no media
        is inserted.  This is wrong, it should return ENXIO.

PC98:
        It is unclear if GEOM correctly recognizes all variants of
        PC98 disklabels.  (Help Wanted!  I have neither docs nor HW)

These issues are all being worked.

Sponsored by:	DARPA & NAI Labs.
2002-10-05 16:35:33 +00:00
brian
89d8005f01 If dsgetlabel() returns a label with a size of zero in diskdumpconf(),
treat it as an invalid partition.

This fixes a bug where ``dumpon <device>'' will configure the dump
device at a random offset on the disk if <device> isn't a valid
partition.

Reviewed by: phk
2002-10-05 11:24:21 +00:00
phk
57a346a213 (This commit touches about 15 disk device drivers in a very consistent
and predictable way, and I apologize if I have gotten it wrong anywhere,
getting prior review on a patch like this is not feasible, considering
the number of people involved and hardware availability etc.)

If struct disklabel is the messenger: kill the messenger.

Inside struct disk we had a struct disklabel which disk drivers used to
communicate certain metrics to the disklayer above (GEOM or the disk
mini-layer).  This commit changes this communication to use four
explicit fields instead.

Amongst the benefits is that the fields do not get overwritten by
wrong or bogus on-disk disklabels.

Once that is clear, <sys/disk.h> which is included in the drivers
no longer need to pull <sys/disklabel.h> and <sys/diskslice.h> in,
the few places that needs them, have gotten explicit #includes for
them.

The disklabel inside struct disk is now only for internal use in
the disk mini-layer, so instead of embedding it, we malloc it as
we need it.

This concludes (modulus any mistakes) the series of disklabel related
commits.

I belive it all amounts to a NOP for all the rest of you :-)

Sponsored by:   DARPA & NAI Labs.
2002-09-20 19:36:05 +00:00
phk
7fd38535ea Make FreeBSD "struct disklabel" agnostic, step 312 of 723:
Rename bioqdisksort() to bioq_disksort().
Keep a #define around to avoid changing all diskdrivers right now.

Move it from subr_disklabel.c to subr_disk.c.
Move prototype from <sys/disklabel.h> to <sys/bio.h>

Sponsored by:   DARPA and NAI Labs.
2002-09-20 14:14:37 +00:00
phk
1919170e90 Make FreeBSD "struct disklabel" agnostic, step 311 of 723:
Rename diskerr() to disk_err() for naming consistency.

Drop the by now entirely useless struct disklabel argument.

Add a flag argument for new-line termination.

Fix a couple of printf-format-casts to %j instead of %l.

Correctly print the name of all bio commands.

Move the function from subr_disklabel.c to subr_disk.c,
and from <sys/disklabel.h> to <sys/disk.h>.

Use the new disk_err() throughout, #include <sys/disk.h> as needed.

Bump __FreeBSD_version for the sake of the aac disk drivers #ifdefs.

Remove unused disklabel members of softc for aac, amr and mlx, which seem
to originally have been intended for diskerr() use, but which only rotted
and got Copy&Pasted at least two times to many.

Sponsored by:   DARPA & NAI Labs.
2002-09-20 12:52:03 +00:00
archie
5ea3052c0e Don't use "NULL" when "0" is really meant. 2002-08-21 23:39:52 +00:00
phk
a90e28ebbb Implement DIOCGFRONTSTUFF ioctl which reports how many bytes from the start
of the device magic stuff might occupy.

Sponsored by: DARPA & NAI Labs.
2002-04-09 15:43:32 +00:00
phk
5b960672bf Rename DIOCGKERNELDUMP to DIOCSKERNELDUMP as it strictly speaking
is a "set" not a "get" operation.

Sponsored by:	DARPA & NAI Labs.
2002-04-09 10:04:09 +00:00
phk
ef82a51634 Here follows the new kernel dumping infrastructure.
Caveats:

The new savecore program is not complete in the sense that it emulates
enough of the old savecores features to do the job, but implements none
of the options yet.

I would appreciate if a userland hacker could help me out getting savecore
to do what we want it to do from a users point of view, compression,
email-notification, space reservation etc etc.  (send me email if
you are interested).

Currently, savecore will scan all devices marked as "swap" or "dump" in
/etc/fstab _or_ any devices specified on the command-line.

All architectures but i386 lack an implementation of dumpsys(), but
looking at the i386 version it should be trivial for anybody familiar
with the platform(s) to provide this function.

Documentation is quite sparse at this time, more to come.

Details:

ATA and SCSI drivers should work as the dump formatting code has been
removed.  The IDA, TWE and AAC have not yet been converted.

Dumpon now opens the device and uses ioctl(DIOCGKERNELDUMP) to set
the device as dumpdev.  To implement the "off" argument, /dev/null
is used as the device.

Savecore will fail if handed any options since they are not (yet)
implemented.  All devices marked "dump" or "swap" in /etc/fstab
will be scanned and dumps found will be saved to diskfiles
named from the MD5 hash of the header record.  The header record
is dumped in readable format in the .info file.  The kernel
is not saved.  Only complete dumps will be saved.

All maintainer rights for this code are disclaimed: feel free to
improve and extend.

Sponsored by:   DARPA, NAI Labs
2002-03-31 22:37:00 +00:00
phk
ad998ff108 Make the disk_clone() routine more robust for abuse.
Sneak in a trivial bit of the GEOM stuff while we're here anyway.
2002-03-11 08:08:02 +00:00
robert
9df5bc6dbf Fix a warning. 2002-03-05 15:19:33 +00:00
phk
b102b404f9 Don't call cdevsw_add(). 2001-11-04 11:56:22 +00:00
phk
c665837dfd Rename the top 7 bits if disk minors to spare bits, rather than type bits. 2001-11-04 09:01:07 +00:00
phk
076014359d Don't choke on old sd%d.ctl devices.
Tripped over by:	Jos Backus <josb@cncdsl.com>
2001-11-03 23:21:00 +00:00
phk
e3965fc866 Turn the symlinks around, instead of ad0s1 -> ad0s1c, make it ad0s1c -> ad0s1.
Requested by:	peter
2001-11-02 09:16:25 +00:00
phk
8c752a11c0 Fix a problem in the disk related hack where device nodes for a physically
non-existent disk in a legacy /dev on a DEVFS system would panic the system
if stat(2)'ed.

Do not whine about anonymous device nodes not having a si_devsw, they're
not supposed to.
2001-10-28 09:39:28 +00:00
phk
a298958ff1 Nudge the axe a bit closer to cdevsw[]:
Make it a panic to repeat make_dev() or destroy_dev(), this check
   should maybe be neutered when -current goes -stable.

   Whine if devsw() is called on anon dev_t's in a devfs system.

   Make a hack to avoid our lazy-eval disk code triggering the above whine.

   Fix the multiple make_dev() in disk code by making ${disk}${unit}s${slice}
   an alias/symlink to ${disk}${unit}s${slice}c
2001-10-27 17:44:21 +00:00
phk
211e34739b disk_clone() was a bit too eager to please: "md0s1ec" is not a valid
device.

Noticed by:	Chad David <davidc@acns.ab.ca>
2001-10-22 10:18:45 +00:00