167 Commits

Author SHA1 Message Date
alc
6961e315f8 Utilize sf_buf_alloc() and sf_buf_free() to implement the ephemeral
mappings required by mdstart_swap().  On i386, if the ephemeral mapping
is already in the sf_buf mapping cache, a swap-backed md performs
similarly to a malloc-backed md.  Even if the ephemeral mapping is not
cached, this implementation is still faster.  On 64-bit platforms, this
change has the effect of using the direct virtual-to-physical mapping,
avoiding ephemeral mapping overheads, such as TLB shootdowns on SMPs.

On a 2.4GHz, 400MHz FSB P4 Xeon configured with 64K sf_bufs and
"mdmfs -S -o async -s 128m md /mnt"

before:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.430923 secs (311465697 bytes/sec)

after with cold sf_buf cache:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.367948 secs (364773576 bytes/sec)

after with warm sf_buf cache:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.252826 secs (530870010 bytes/sec)

malloc-backed md:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.253126 secs (530240978 bytes/sec)
2004-03-18 18:23:37 +00:00
alc
10b8b45873 Allow swap-backed devices to run without Giant. 2004-03-14 00:24:30 +00:00
phk
0f56e66e2f Fix a long-standing deadlock issue with vnode backed md(4) devices:
On vnode backed md(4) devices over a certain, currently undetermined
size relative to the buffer cache our "lemming-syncer" can provoke
a buffer starvation which puts the md thread to sleep on wdrain.

This generally tends to grind the entire system to a stop because the
event that is supposed to wake up the thread will not happen until a fair
bit of the piled up I/O requests in the system finish, and since a lot
of those are on a md(4) vnode backed device which is currently waiting
on wdrain until a fair amount of the piled up ... you get the picture.

The cure is to issue all VOP_WRITES on the vnode backing the device
with IO_SYNC.

In addition to more closely emulating a real disk device with a
non-lying write-cache, this makes the writes exempt from rate-limited
(there to avoid starving the buffer cache) and consequently prevents
the deadlock.

Unfortunately performance takes a hit.

Add "async" option to give people who know what they are doing the
old behaviour.
2004-03-10 20:41:09 +00:00
jhb
2642ed4029 kthread_exit() no longer requires Giant, so don't force callers to acquire
Giant just to call kthread_exit().

Requested by:	many
2004-03-05 22:42:17 +00:00
phk
4c53114daa Make swapbacked md(4) devices respect the -x and -y emulation arguments. 2004-03-02 20:13:23 +00:00
cperciva
3b41514956 Use DEV_BSIZE byte sectors instead of PAGE_SIZE byte sectors for
swap-backed memory disks.  This reduces filesystem allocation overhead
and makes swap-backed memory disks compatible with broken code (dd,
for example) which expects to see 512 byte sectors.  The size of a
swap-backed memory disk must still be a multiple of the page size.

When performing page-aligned operations, this change has zero
performance impact.

Reviewed by:	phk
Approved by:	rwatson (mentor)
2004-02-29 15:58:54 +00:00
phk
ad925439e0 Device megapatch 4/6:
Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.

Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.
2004-02-21 21:10:55 +00:00
phk
85e17886eb Allow specification of a geometry for vnode backed devices as well as
for malloc backed devices.
2004-01-12 10:52:00 +00:00
phk
1d294b5b46 Fix a locking problem with MD_ROOT_SIZE.
Retire md(4)'s static major number.
2003-12-13 18:12:58 +00:00
phk
89aeb7d1df Use the class->init() to hitch up preload devices, rather than rely on
the "old" SYSINIT.  This makes sure things happen in the right order.

XXX: md(4) needs to be fully geom-ified and in particluar /dev/md.ctl
should be abandonded for the GEOM OaM api.

Approved by:	re@
2003-11-18 18:19:26 +00:00
phk
d985a757f9 Don't initialize unused bio_blkno field. 2003-10-18 11:25:42 +00:00
phk
7099deadda The present defaults for the open and close for device drivers which
provide no methods does not make any sense, and is not used by any
driver.

It is a pretty hard to come up with even a theoretical concept of
a device driver which would always fail open and close with ENODEV.

Change the defaults to be nullopen() and nullclose() which simply
does nothing.

Remove explicit initializations to these from the drivers which
already used them.
2003-09-27 12:01:01 +00:00
jhb
37641f86f1 Consistently use the BSD u_int and u_short instead of the SYSV uint and
ushort.  In most of these files, there was a mixture of both styles and
this change just makes them self-consistent.

Requested by:	bde (kern_ktrace.c)
2003-08-07 15:04:27 +00:00
phk
5997dd2e8b Change the implementation of swap backing to use the VM system in normal
ways, and drop the need for vm_pager_strategy().
2003-08-05 06:54:44 +00:00
phk
d4d7ca154a Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout. 2003-07-27 17:04:56 +00:00
phk
5ad6b62509 Remove 256 unit limit, there is no evil minor number encoding to
deal with any more.

Spotted by:	"Darren Freestone" <df@cops.org>
2003-06-22 11:31:38 +00:00
phk
e2298826ec Remove the G_CLASS_INITIALIZER, we do not need it anymore. 2003-05-31 16:59:27 +00:00
phk
0129a20107 The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to prevent
deadlocks with vnode backed md(4) devices because md now uses a
kthread to run the bio requests instead of doing it directly from
the bio down path.
2003-05-31 16:42:45 +00:00
alc
760e456c35 Use vm_object_deallocate(), not vm_pager_deallocate(), to destroy a
vm object.  (vm_pager_deallocate() does not, in fact, destroy a vm object.)

Approved by:	re (scottl)
Reviewed by:	phk
2003-05-16 07:28:27 +00:00
phk
887418fd11 Call g_wither_geom(), instead of just setting the flag. 2003-05-02 06:18:58 +00:00
phk
f3a79aea41 Add a couple of undocumented test options to MD(4) to aid in regression
testting of GEOM.
2003-04-09 11:59:29 +00:00
phk
cbe207a30e Remove all references to BIO_SETATTR. We will not be using it. 2003-04-03 19:19:36 +00:00
phk
c235e25328 Use bioq_flush() to drain a bio queue with a specific error code.
Retain the mistake of not updating the devstat API for now.

Spell bioq_disksort() consistently with the remaining bioq_*().

#include <geom/geom_disk.h> where this is more appropriate.
2003-04-01 15:06:26 +00:00
phk
78452f5f28 Don't include <sys/disk.h>. 2003-04-01 13:33:28 +00:00
phk
7ae16f68d8 remove a blank line. 2003-03-29 22:13:32 +00:00
phk
8db28f0543 Allocate the toplevel indir with M_WAITOK to avoid complicating things
needlessly.

Detected by:	rwatsons EvilMalloc(9)
2003-03-27 10:14:36 +00:00
phk
1016075ea8 Change g_class initialization to sparse format. 2003-03-24 19:46:26 +00:00
phk
e059b79437 Including <sys/stdint.h> is (almost?) universally only to be able to use
%j in printfs, so put a newsted include in <sys/systm.h> where the printf
prototype lives and save everybody else the trouble.
2003-03-18 08:45:25 +00:00
phk
e01fc931cf Centralize the devstat handling for all GEOM disk device drivers
in geom_disk.c.

As a side effect this makes a lot of #include <sys/devicestat.h>
lines not needed and some biofinish() calls can be reduced to
biodone() again.
2003-03-08 08:01:31 +00:00
phk
7499df5457 Add a "-S sectorsize" option to enable Kirk to find a bug :-) 2003-03-03 13:05:00 +00:00
phk
0ae911eb0e Gigacommit to improve device-driver source compatibility between
branches:

Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.

This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.

Approved by:    re(scottl)
2003-03-03 12:15:54 +00:00
imp
cf874b345d Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
phk
6fa71cfe58 Mark our provider with G_PF_CANDELETE in the cases where this is actually
the case.
2003-02-11 12:35:44 +00:00
phk
98d64b505b NO_GEOM cleanup: unifdef 2003-01-30 13:12:31 +00:00
phk
c822610d8f Implement MDIOCLIST which returns the unit numbers of configured md(4)
devices.

We use the md_pad[] array and if there are more units than its size the
last returned unit number will be -1, but the number of units returned
is correct.
2003-01-27 07:58:18 +00:00
alfred
bf8e8a6e8f Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
phk
141eb3ac36 OK Ok, so I didn't check the NO_GEOM case for the final version...
Stumbled on by:	bde
2003-01-13 20:19:04 +00:00
phk
a8a456d886 Enable the new h0h0magic code which on GEOM kernels make the md(4)
driver a _real_ GEOM driver.
2003-01-13 17:31:46 +00:00
phk
8d09710e31 Add a mutex around the per unit bioqueue.
Only grab giant in the per unit kthread for SWAP and VNODE backed devices.

Initialize the bioq before the kthread gets a chance to study it.

Don't lock Giant in mddone_swap, we shouldn't need it.
2003-01-13 08:50:23 +00:00
phk
5e19774d31 Remove the printf which announces the creation of malloc disks: it is
inconsistent when we do not do it for swap or vnode.

We still printf for preloaded disks because of the weak debugging
options people have in embedded/tiny environments where this is
usually used.
2003-01-13 08:01:09 +00:00
phk
b9c52945f7 Add code to make md(4) a GEOM device driver instead of relying in
the disk mini-layer.

This is currently not enabled.
2003-01-12 21:16:49 +00:00
phk
11b1d67a02 Shift things around a bit in preparation for future evilness. 2003-01-12 17:39:29 +00:00
iedowse
947ff5e0c7 Move the check for the MD_SHUTDOWN flag to before the tsleep() call
in the per-device kthread. This ensures that synchronisation with
mddestroy() succeeds even if the kthread was not waiting in tsleep()
at the time of the wakeup(). Among other things, this fixes the
problem of mdconfig getting stuck when an attempt is made to use a
zero-length file as a vnode-type backing store.

Approved by:	re
2002-11-30 22:03:53 +00:00
phk
9cf59043a1 We want /dev/md0 for ramdisk roots, not /dev/md0c.
Sponsored by:	DARPA & NAI Labs
2002-10-21 20:08:28 +00:00
phk
d9cf8d5478 Use ENOSPC error return, not ENOMEM.
Use %jd rather than %lld.
2002-10-20 20:50:31 +00:00
jake
99f37adfdd MODINFO_SIZE metadata has type size_t, not unsigned. This makes preloaded
md root work on sparc64.
2002-10-13 18:19:22 +00:00
scottl
3a150bca9c Some kernel threads try to do significant work, and the default KSTACK_PAGES
doesn't give them enough stack to do much before blowing away the pcb.
This adds MI and MD code to allow the allocation of an alternate kstack
who's size can be speficied when calling kthread_create.  Passing the
value 0 prevents the alternate kstack from being created.  Note that the
ia64 MD code is missing for now, and PowerPC was only partially written
due to the pmap.c being incomplete there.
Though this patch does not modify anything to make use of the alternate
kstack, acpi and usb are good candidates.

Reviewed by:	jake, peter, jhb
2002-10-02 07:44:29 +00:00
phk
e274a128a9 Put the casts on the right hand side of =. 2002-09-28 14:17:24 +00:00
grehan
0429db7240 Initialize fwsectors/fwheads to allow the DIOCGFWSECTORS and
DIOCGFWHEADS ioctls to return meaningful values to disklabel/newfs

Approved by: phk
2002-09-22 10:07:18 +00:00
phk
57a346a213 (This commit touches about 15 disk device drivers in a very consistent
and predictable way, and I apologize if I have gotten it wrong anywhere,
getting prior review on a patch like this is not feasible, considering
the number of people involved and hardware availability etc.)

If struct disklabel is the messenger: kill the messenger.

Inside struct disk we had a struct disklabel which disk drivers used to
communicate certain metrics to the disklayer above (GEOM or the disk
mini-layer).  This commit changes this communication to use four
explicit fields instead.

Amongst the benefits is that the fields do not get overwritten by
wrong or bogus on-disk disklabels.

Once that is clear, <sys/disk.h> which is included in the drivers
no longer need to pull <sys/disklabel.h> and <sys/diskslice.h> in,
the few places that needs them, have gotten explicit #includes for
them.

The disklabel inside struct disk is now only for internal use in
the disk mini-layer, so instead of embedding it, we malloc it as
we need it.

This concludes (modulus any mistakes) the series of disklabel related
commits.

I belive it all amounts to a NOP for all the rest of you :-)

Sponsored by:   DARPA & NAI Labs.
2002-09-20 19:36:05 +00:00