On vnode backed md(4) devices over a certain, currently undetermined
size relative to the buffer cache our "lemming-syncer" can provoke
a buffer starvation which puts the md thread to sleep on wdrain.
This generally tends to grind the entire system to a stop because the
event that is supposed to wake up the thread will not happen until a fair
bit of the piled up I/O requests in the system finish, and since a lot
of those are on a md(4) vnode backed device which is currently waiting
on wdrain until a fair amount of the piled up ... you get the picture.
The cure is to issue all VOP_WRITES on the vnode backing the device
with IO_SYNC.
In addition to more closely emulating a real disk device with a
non-lying write-cache, this makes the writes exempt from rate-limited
(there to avoid starving the buffer cache) and consequently prevents
the deadlock.
Unfortunately performance takes a hit.
Add "async" option to give people who know what they are doing the
old behaviour.
swap-backed memory disks. This reduces filesystem allocation overhead
and makes swap-backed memory disks compatible with broken code (dd,
for example) which expects to see 512 byte sectors. The size of a
swap-backed memory disk must still be a multiple of the page size.
When performing page-aligned operations, this change has zero
performance impact.
Reviewed by: phk
Approved by: rwatson (mentor)
Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.
Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.
the "old" SYSINIT. This makes sure things happen in the right order.
XXX: md(4) needs to be fully geom-ified and in particluar /dev/md.ctl
should be abandonded for the GEOM OaM api.
Approved by: re@
provide no methods does not make any sense, and is not used by any
driver.
It is a pretty hard to come up with even a theoretical concept of
a device driver which would always fail open and close with ENODEV.
Change the defaults to be nullopen() and nullclose() which simply
does nothing.
Remove explicit initializations to these from the drivers which
already used them.
Retain the mistake of not updating the devstat API for now.
Spell bioq_disksort() consistently with the remaining bioq_*().
#include <geom/geom_disk.h> where this is more appropriate.
in geom_disk.c.
As a side effect this makes a lot of #include <sys/devicestat.h>
lines not needed and some biofinish() calls can be reduced to
biodone() again.
branches:
Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.
This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.
Approved by: re(scottl)
devices.
We use the md_pad[] array and if there are more units than its size the
last returned unit number will be -1, but the number of units returned
is correct.
Only grab giant in the per unit kthread for SWAP and VNODE backed devices.
Initialize the bioq before the kthread gets a chance to study it.
Don't lock Giant in mddone_swap, we shouldn't need it.
inconsistent when we do not do it for swap or vnode.
We still printf for preloaded disks because of the weak debugging
options people have in embedded/tiny environments where this is
usually used.
in the per-device kthread. This ensures that synchronisation with
mddestroy() succeeds even if the kthread was not waiting in tsleep()
at the time of the wakeup(). Among other things, this fixes the
problem of mdconfig getting stuck when an attempt is made to use a
zero-length file as a vnode-type backing store.
Approved by: re
doesn't give them enough stack to do much before blowing away the pcb.
This adds MI and MD code to allow the allocation of an alternate kstack
who's size can be speficied when calling kthread_create. Passing the
value 0 prevents the alternate kstack from being created. Note that the
ia64 MD code is missing for now, and PowerPC was only partially written
due to the pmap.c being incomplete there.
Though this patch does not modify anything to make use of the alternate
kstack, acpi and usb are good candidates.
Reviewed by: jake, peter, jhb
and predictable way, and I apologize if I have gotten it wrong anywhere,
getting prior review on a patch like this is not feasible, considering
the number of people involved and hardware availability etc.)
If struct disklabel is the messenger: kill the messenger.
Inside struct disk we had a struct disklabel which disk drivers used to
communicate certain metrics to the disklayer above (GEOM or the disk
mini-layer). This commit changes this communication to use four
explicit fields instead.
Amongst the benefits is that the fields do not get overwritten by
wrong or bogus on-disk disklabels.
Once that is clear, <sys/disk.h> which is included in the drivers
no longer need to pull <sys/disklabel.h> and <sys/diskslice.h> in,
the few places that needs them, have gotten explicit #includes for
them.
The disklabel inside struct disk is now only for internal use in
the disk mini-layer, so instead of embedding it, we malloc it as
we need it.
This concludes (modulus any mistakes) the series of disklabel related
commits.
I belive it all amounts to a NOP for all the rest of you :-)
Sponsored by: DARPA & NAI Labs.