Commit Graph

201 Commits

Author SHA1 Message Date
pjd
5b525d8ac1 - Don't destroy UMA zone on error in mdcreate_malloc(), because we need it
in mddestroy() to properly free already allocated memory.
  This fixes a panic when we want to create too big memory backed device
  with preallocate memory (-o reserve).
- Remove redundant { }.

MFC after:	1 week
2005-01-22 19:56:03 +00:00
phk
d2f418bf2c Add a couple of mtx_asserts() to try to narrow down the window on
a bug repeatedly reported.
2005-01-22 19:08:50 +00:00
imp
4b319958e7 Start each of the license/copyright comments with /*-, minor shuffle of lines 2005-01-06 01:43:34 +00:00
alc
ca27c29006 Add needed synchronization to the error handling code that was introduced
in revision 1.141.

Lock assertion failures reported by: Kris Kennaway
2005-01-05 05:32:52 +00:00
jhb
7b611b0cb2 Stop explicitly touching td_base_pri outside of the scheduler and simply
set a thread's priority via sched_prio() when that is the desired action.
The schedulers will start managing td_base_pri internally shortly.
2004-12-30 20:29:58 +00:00
pjd
b19cea505c Rewrite piece of code which I committed some time ago that allows to
show file name for 'mdconfig -l -u <x>' command.
This allows to preserve API/ABI compatibility with version 0 (that's why
I changed version number back to 0) and will allow to merge this change
to RELENG_5.

MFC after:	5 days
2004-12-27 17:20:06 +00:00
marcel
6961594095 Fix the MDIOCDETACH ioctl() for md(4). Now that the md_file field in
the mdio structure is an array and not a pointer, we cannot test for
it to be NULL. It never is. Instead, test for md_file[0] to be '\0'.
2004-11-13 05:00:12 +00:00
pjd
13567c29de Be consistent and use 'if (error != 0)' instead of 'if (error)' everywhere. 2004-11-06 13:16:35 +00:00
pjd
d18ec52c75 For file backed md(4) devices output their source file via
'mdconfig -l -u <unit>'.
Bump version number, as this change breaks ABI/API.
2004-11-06 13:07:02 +00:00
phk
c93e5e1f96 Don't explicitly call g_waitidle(), it happens automagically now. 2004-10-23 20:50:06 +00:00
green
a3341cc87b Account for failure in vm_pager_allocate() or vm_pager_get_pages() in
md(8).  The former is generally not going to fail, but the latter can
fail when the underlying swap device returns an error.

There are still plenty of other places where vm_pager_get_pages() failing
will lead directly to crashes, so it's a good idea to put your swap on
RAID if you care enough to put any of your disks on RAID....
2004-10-12 04:47:16 +00:00
pjd
06ac376bea Actually this order (unlock, wakeup) in this case is race-safe and can
save us 2 context switches.

Explained by:	njl
2004-09-18 09:16:19 +00:00
pjd
67f6d06709 - Make md(4) 64-bit clean.
After this change it should be possible to use very big md(4) devices.
- Clean up and simplify the code a bit.
- Use humanize_number(3) to print size of md(4) devices.
- Add 't' suffix which stands for terabyte.
- Make '-S' to really work with all types of devices.
- Other minor changes.
2004-09-16 21:32:13 +00:00
pjd
aea0843869 There is no need to keep 'npage' value inside our softc structure,
it is only used in one function. While doing so, change its type to
vm_ooffset_t.
We are still limited for swap-backed devices to 16TB on 32-bit architectures
where PAGE_SIZE is 4096 bytes.
2004-09-16 20:38:11 +00:00
pjd
7009758c01 - Do not use bio_pblkno as it is going away anyway.
- Prefer bio_length than bio_bcount.
2004-09-16 19:42:17 +00:00
pjd
34730e1e31 First wakeup, then unlock. 2004-09-16 18:59:19 +00:00
pjd
fd329e76c3 Type 'int' is too small for 'i' and 'lastp' variables. Use proper type,
which is vm_pindex_t (unsigned 64bit on i386).
2004-09-16 18:56:20 +00:00
pjd
e32c315aa7 Deallocate VM object on failure. 2004-09-14 19:55:07 +00:00
pjd
4e7ebd879c One more missing NDFREE(9). 2004-09-14 19:27:59 +00:00
pjd
64433e6598 - Don't forget about NDFREE() in case of vn_open() failure.
- Don't forget about vn_close() in case of failure.
2004-09-14 18:43:24 +00:00
pjd
f129890dbf Fix UMA zone leak. 2004-09-14 18:32:05 +00:00
phk
fb5fe8618b Use bioq_takefirst() 2004-09-07 07:54:45 +00:00
cperciva
94c7897b9f Don't g_waitidle() when initializing a preloaded md. This fixes a
deadlock which otherwise occurs during the boot process.

Reported by:	kensmith
MFC after:	3 days
		(assuming that re@ approves)
2004-08-30 08:38:30 +00:00
cperciva
763aa6bdef When creating a new md, wait for geom's event queue to become empty
before returning.  Device nodes are created via the "taste" mechanism,
so this is necessary in order to make sure that devfs entries are
created before mdconfig(8) returns.

This may be a MFC candidate for 5.3.

Suggested by:	phk
2004-08-22 19:44:24 +00:00
phk
d8d2b01380 Tag all geom classes in the tree with a version number. 2004-08-08 07:57:53 +00:00
phk
cb84366718 Use a ->fini() from the geom class to destroy the control device.
Use default initialization of geom methods.
2004-08-08 06:47:43 +00:00
phk
5c95d686a1 Do a pass over all modules in the kernel and make them return EOPNOTSUPP
for unknown events.

A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".
2004-07-15 08:26:07 +00:00
phk
dfd1f7fd50 Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.
2004-06-16 09:47:26 +00:00
pjd
6f95f8aca4 Fix panic which occurs when given sector size for memory-backed device
is less than DEV_BSIZE (512) bytes.

Reported by:	Mike Bristow <mike@urgle.com>
Approved by:	phk
2004-05-18 07:30:04 +00:00
imp
34d7e07971 Ooops, removed this acknowledgement bogusly.
Eagle Eyes: bde
2004-04-09 05:12:47 +00:00
imp
b49b7fe799 Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson
2004-04-07 20:46:16 +00:00
alc
1ec4d75266 In some cases, sf_buf_alloc() should sleep with pri PCATCH; in others, it
should not.  Add a new parameter so that the caller can specify which is
the case.

Reported by:	dillon
2004-04-03 09:16:27 +00:00
luigi
f1e67ce243 Fix a bug with preloaded image -- for some reason [that i don't
completely understand], md_takeroot() runs before md_preloaded(),
rendering both useless.
As a fix, move the body (effectively one line!) of md_takeroot()
into md_preloaded(), and get rid of the stuff that has become useless.

Bug and fix reported 10 days ago on -current, no reply.
2004-03-31 21:48:02 +00:00
alc
edf0b18239 - Remove some unused #includes.
- Apply some style fixes to mdstart_swap().
2004-03-19 21:19:15 +00:00
alc
6961e315f8 Utilize sf_buf_alloc() and sf_buf_free() to implement the ephemeral
mappings required by mdstart_swap().  On i386, if the ephemeral mapping
is already in the sf_buf mapping cache, a swap-backed md performs
similarly to a malloc-backed md.  Even if the ephemeral mapping is not
cached, this implementation is still faster.  On 64-bit platforms, this
change has the effect of using the direct virtual-to-physical mapping,
avoiding ephemeral mapping overheads, such as TLB shootdowns on SMPs.

On a 2.4GHz, 400MHz FSB P4 Xeon configured with 64K sf_bufs and
"mdmfs -S -o async -s 128m md /mnt"

before:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.430923 secs (311465697 bytes/sec)

after with cold sf_buf cache:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.367948 secs (364773576 bytes/sec)

after with warm sf_buf cache:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.252826 secs (530870010 bytes/sec)

malloc-backed md:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.253126 secs (530240978 bytes/sec)
2004-03-18 18:23:37 +00:00
alc
10b8b45873 Allow swap-backed devices to run without Giant. 2004-03-14 00:24:30 +00:00
phk
0f56e66e2f Fix a long-standing deadlock issue with vnode backed md(4) devices:
On vnode backed md(4) devices over a certain, currently undetermined
size relative to the buffer cache our "lemming-syncer" can provoke
a buffer starvation which puts the md thread to sleep on wdrain.

This generally tends to grind the entire system to a stop because the
event that is supposed to wake up the thread will not happen until a fair
bit of the piled up I/O requests in the system finish, and since a lot
of those are on a md(4) vnode backed device which is currently waiting
on wdrain until a fair amount of the piled up ... you get the picture.

The cure is to issue all VOP_WRITES on the vnode backing the device
with IO_SYNC.

In addition to more closely emulating a real disk device with a
non-lying write-cache, this makes the writes exempt from rate-limited
(there to avoid starving the buffer cache) and consequently prevents
the deadlock.

Unfortunately performance takes a hit.

Add "async" option to give people who know what they are doing the
old behaviour.
2004-03-10 20:41:09 +00:00
jhb
2642ed4029 kthread_exit() no longer requires Giant, so don't force callers to acquire
Giant just to call kthread_exit().

Requested by:	many
2004-03-05 22:42:17 +00:00
phk
4c53114daa Make swapbacked md(4) devices respect the -x and -y emulation arguments. 2004-03-02 20:13:23 +00:00
cperciva
3b41514956 Use DEV_BSIZE byte sectors instead of PAGE_SIZE byte sectors for
swap-backed memory disks.  This reduces filesystem allocation overhead
and makes swap-backed memory disks compatible with broken code (dd,
for example) which expects to see 512 byte sectors.  The size of a
swap-backed memory disk must still be a multiple of the page size.

When performing page-aligned operations, this change has zero
performance impact.

Reviewed by:	phk
Approved by:	rwatson (mentor)
2004-02-29 15:58:54 +00:00
phk
ad925439e0 Device megapatch 4/6:
Introduce d_version field in struct cdevsw, this must always be
initialized to D_VERSION.

Flip sense of D_NOGIANT flag to D_NEEDGIANT, this involves removing
four D_NOGIANT flags and adding 145 D_NEEDGIANT flags.
2004-02-21 21:10:55 +00:00
phk
85e17886eb Allow specification of a geometry for vnode backed devices as well as
for malloc backed devices.
2004-01-12 10:52:00 +00:00
phk
1d294b5b46 Fix a locking problem with MD_ROOT_SIZE.
Retire md(4)'s static major number.
2003-12-13 18:12:58 +00:00
phk
89aeb7d1df Use the class->init() to hitch up preload devices, rather than rely on
the "old" SYSINIT.  This makes sure things happen in the right order.

XXX: md(4) needs to be fully geom-ified and in particluar /dev/md.ctl
should be abandonded for the GEOM OaM api.

Approved by:	re@
2003-11-18 18:19:26 +00:00
phk
d985a757f9 Don't initialize unused bio_blkno field. 2003-10-18 11:25:42 +00:00
phk
7099deadda The present defaults for the open and close for device drivers which
provide no methods does not make any sense, and is not used by any
driver.

It is a pretty hard to come up with even a theoretical concept of
a device driver which would always fail open and close with ENODEV.

Change the defaults to be nullopen() and nullclose() which simply
does nothing.

Remove explicit initializations to these from the drivers which
already used them.
2003-09-27 12:01:01 +00:00
jhb
37641f86f1 Consistently use the BSD u_int and u_short instead of the SYSV uint and
ushort.  In most of these files, there was a mixture of both styles and
this change just makes them self-consistent.

Requested by:	bde (kern_ktrace.c)
2003-08-07 15:04:27 +00:00
phk
5997dd2e8b Change the implementation of swap backing to use the VM system in normal
ways, and drop the need for vm_pager_strategy().
2003-08-05 06:54:44 +00:00
phk
d4d7ca154a Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout. 2003-07-27 17:04:56 +00:00
phk
5ad6b62509 Remove 256 unit limit, there is no evil minor number encoding to
deal with any more.

Spotted by:	"Darren Freestone" <df@cops.org>
2003-06-22 11:31:38 +00:00
phk
e2298826ec Remove the G_CLASS_INITIALIZER, we do not need it anymore. 2003-05-31 16:59:27 +00:00
phk
0129a20107 The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to prevent
deadlocks with vnode backed md(4) devices because md now uses a
kthread to run the bio requests instead of doing it directly from
the bio down path.
2003-05-31 16:42:45 +00:00
alc
760e456c35 Use vm_object_deallocate(), not vm_pager_deallocate(), to destroy a
vm object.  (vm_pager_deallocate() does not, in fact, destroy a vm object.)

Approved by:	re (scottl)
Reviewed by:	phk
2003-05-16 07:28:27 +00:00
phk
887418fd11 Call g_wither_geom(), instead of just setting the flag. 2003-05-02 06:18:58 +00:00
phk
f3a79aea41 Add a couple of undocumented test options to MD(4) to aid in regression
testting of GEOM.
2003-04-09 11:59:29 +00:00
phk
cbe207a30e Remove all references to BIO_SETATTR. We will not be using it. 2003-04-03 19:19:36 +00:00
phk
c235e25328 Use bioq_flush() to drain a bio queue with a specific error code.
Retain the mistake of not updating the devstat API for now.

Spell bioq_disksort() consistently with the remaining bioq_*().

#include <geom/geom_disk.h> where this is more appropriate.
2003-04-01 15:06:26 +00:00
phk
78452f5f28 Don't include <sys/disk.h>. 2003-04-01 13:33:28 +00:00
phk
7ae16f68d8 remove a blank line. 2003-03-29 22:13:32 +00:00
phk
8db28f0543 Allocate the toplevel indir with M_WAITOK to avoid complicating things
needlessly.

Detected by:	rwatsons EvilMalloc(9)
2003-03-27 10:14:36 +00:00
phk
1016075ea8 Change g_class initialization to sparse format. 2003-03-24 19:46:26 +00:00
phk
e059b79437 Including <sys/stdint.h> is (almost?) universally only to be able to use
%j in printfs, so put a newsted include in <sys/systm.h> where the printf
prototype lives and save everybody else the trouble.
2003-03-18 08:45:25 +00:00
phk
e01fc931cf Centralize the devstat handling for all GEOM disk device drivers
in geom_disk.c.

As a side effect this makes a lot of #include <sys/devicestat.h>
lines not needed and some biofinish() calls can be reduced to
biodone() again.
2003-03-08 08:01:31 +00:00
phk
7499df5457 Add a "-S sectorsize" option to enable Kirk to find a bug :-) 2003-03-03 13:05:00 +00:00
phk
0ae911eb0e Gigacommit to improve device-driver source compatibility between
branches:

Initialize struct cdevsw using C99 sparse initializtion and remove
all initializations to default values.

This patch is automatically generated and has been tested by compiling
LINT with all the fields in struct cdevsw in reverse order on alpha,
sparc64 and i386.

Approved by:    re(scottl)
2003-03-03 12:15:54 +00:00
imp
cf874b345d Back out M_* changes, per decision of the TRB.
Approved by: trb
2003-02-19 05:47:46 +00:00
phk
6fa71cfe58 Mark our provider with G_PF_CANDELETE in the cases where this is actually
the case.
2003-02-11 12:35:44 +00:00
phk
98d64b505b NO_GEOM cleanup: unifdef 2003-01-30 13:12:31 +00:00
phk
c822610d8f Implement MDIOCLIST which returns the unit numbers of configured md(4)
devices.

We use the md_pad[] array and if there are more units than its size the
last returned unit number will be -1, but the number of units returned
is correct.
2003-01-27 07:58:18 +00:00
alfred
bf8e8a6e8f Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2003-01-21 08:56:16 +00:00
phk
141eb3ac36 OK Ok, so I didn't check the NO_GEOM case for the final version...
Stumbled on by:	bde
2003-01-13 20:19:04 +00:00
phk
a8a456d886 Enable the new h0h0magic code which on GEOM kernels make the md(4)
driver a _real_ GEOM driver.
2003-01-13 17:31:46 +00:00
phk
8d09710e31 Add a mutex around the per unit bioqueue.
Only grab giant in the per unit kthread for SWAP and VNODE backed devices.

Initialize the bioq before the kthread gets a chance to study it.

Don't lock Giant in mddone_swap, we shouldn't need it.
2003-01-13 08:50:23 +00:00
phk
5e19774d31 Remove the printf which announces the creation of malloc disks: it is
inconsistent when we do not do it for swap or vnode.

We still printf for preloaded disks because of the weak debugging
options people have in embedded/tiny environments where this is
usually used.
2003-01-13 08:01:09 +00:00
phk
b9c52945f7 Add code to make md(4) a GEOM device driver instead of relying in
the disk mini-layer.

This is currently not enabled.
2003-01-12 21:16:49 +00:00
phk
11b1d67a02 Shift things around a bit in preparation for future evilness. 2003-01-12 17:39:29 +00:00
iedowse
947ff5e0c7 Move the check for the MD_SHUTDOWN flag to before the tsleep() call
in the per-device kthread. This ensures that synchronisation with
mddestroy() succeeds even if the kthread was not waiting in tsleep()
at the time of the wakeup(). Among other things, this fixes the
problem of mdconfig getting stuck when an attempt is made to use a
zero-length file as a vnode-type backing store.

Approved by:	re
2002-11-30 22:03:53 +00:00
phk
9cf59043a1 We want /dev/md0 for ramdisk roots, not /dev/md0c.
Sponsored by:	DARPA & NAI Labs
2002-10-21 20:08:28 +00:00
phk
d9cf8d5478 Use ENOSPC error return, not ENOMEM.
Use %jd rather than %lld.
2002-10-20 20:50:31 +00:00
jake
99f37adfdd MODINFO_SIZE metadata has type size_t, not unsigned. This makes preloaded
md root work on sparc64.
2002-10-13 18:19:22 +00:00
scottl
3a150bca9c Some kernel threads try to do significant work, and the default KSTACK_PAGES
doesn't give them enough stack to do much before blowing away the pcb.
This adds MI and MD code to allow the allocation of an alternate kstack
who's size can be speficied when calling kthread_create.  Passing the
value 0 prevents the alternate kstack from being created.  Note that the
ia64 MD code is missing for now, and PowerPC was only partially written
due to the pmap.c being incomplete there.
Though this patch does not modify anything to make use of the alternate
kstack, acpi and usb are good candidates.

Reviewed by:	jake, peter, jhb
2002-10-02 07:44:29 +00:00
phk
e274a128a9 Put the casts on the right hand side of =. 2002-09-28 14:17:24 +00:00
grehan
0429db7240 Initialize fwsectors/fwheads to allow the DIOCGFWSECTORS and
DIOCGFWHEADS ioctls to return meaningful values to disklabel/newfs

Approved by: phk
2002-09-22 10:07:18 +00:00
phk
57a346a213 (This commit touches about 15 disk device drivers in a very consistent
and predictable way, and I apologize if I have gotten it wrong anywhere,
getting prior review on a patch like this is not feasible, considering
the number of people involved and hardware availability etc.)

If struct disklabel is the messenger: kill the messenger.

Inside struct disk we had a struct disklabel which disk drivers used to
communicate certain metrics to the disklayer above (GEOM or the disk
mini-layer).  This commit changes this communication to use four
explicit fields instead.

Amongst the benefits is that the fields do not get overwritten by
wrong or bogus on-disk disklabels.

Once that is clear, <sys/disk.h> which is included in the drivers
no longer need to pull <sys/disklabel.h> and <sys/diskslice.h> in,
the few places that needs them, have gotten explicit #includes for
them.

The disklabel inside struct disk is now only for internal use in
the disk mini-layer, so instead of embedding it, we malloc it as
we need it.

This concludes (modulus any mistakes) the series of disklabel related
commits.

I belive it all amounts to a NOP for all the rest of you :-)

Sponsored by:   DARPA & NAI Labs.
2002-09-20 19:36:05 +00:00
archie
7a233d4c9f Replace (ab)uses of "NULL" where "0" is really meant. 2002-08-22 21:24:01 +00:00
mux
28c66edd3e Yet another warning fix for 64 bits platforms.
Reviewed by:	phk
2002-06-24 12:07:02 +00:00
phk
06a5147771 mdcreate_vnode() isn't correctly clearing things out of the linked
list if the file is of 0 size or mdsetcred() fails.

Submitted by:	Martin Faxer <gmh003532@brfmasthugget.se>
2002-06-15 19:18:43 +00:00
sobomax
920300b834 - Whitespace only: use return statement consistentlt (return (foo), not
return(foo)), kill extra blank names between function names;
- fix format string in printf(): devtoname() returns string, not pointer.
2002-06-10 19:25:21 +00:00
iedowse
60f926334d Use a per-device worker thread to avoid blocking in mdstrategy()
until the I/O completes. This fixes some easily reproducable deadlocks
that occur when using md(4) with GEOM.

Reviewed by:	phk
2002-06-03 22:09:04 +00:00
phk
cd78b1f556 Mis-edit in last commit. 2002-05-26 09:57:59 +00:00
phk
3a79eba71d Be a bit smarter about rewriting data so we don't loose too much performance.
Sponsored by: DARPA & NAI Labs.
2002-05-26 09:38:51 +00:00
phk
bf00330394 Use an umazone per unit for allocating the sectors for malloc backing.
Clean up things properly when we unconfigure malloc backed units.

Sponsored by:	DARPA & NAI Labs.
2002-05-26 06:48:55 +00:00
phk
45b9277876 Give the "malloc" backing of md(4) an adaptive multilevel index tree to
remove the need for a contiguous array with pointers to all the sectors.

Try to make failure to malloc(9) memory a non-hang situation.

Eventually this will allow us to test the 64bit cleanness of the disk
I/O patch, but more work is outstanding here and elsewhere.

Sponsored by:	DARPA & NAI Labs.
2002-05-25 20:44:20 +00:00
phk
569efacf66 Fix a memory-leak when configuring a vnode backed md(4) device fails.
Submitted by:	Martin Faxér <gmh003532@brfmasthugget.se>
MFC after:	4 weeks
2002-05-03 17:55:10 +00:00
jeff
f552448b8a Remove unused include. 2002-03-20 09:55:07 +00:00
bde
c75458a004 The previous commit missed fixing 2 old printf format errors and
introduced a format printf error.
2002-03-19 04:07:29 +00:00
gallatin
56e86a0a0b Fix printf warning caused by recent changes in bio_pblkno's type. 2002-03-19 01:45:04 +00:00
mckusick
e929f2e4f0 Introduce the new 64-bit size disk block, daddr64_t. Change
the bio and buffer structures to have daddr64_t bio_pblkno,
b_blkno, and b_lblkno fields which allows access to disks
larger than a Terabyte in size. This change also requires
that the VOP_BMAP vnode operation accept and return daddr64_t
blocks. This delta should not affect system operation in
any way. It merely sets up the necessary interfaces to allow
the development of disk drivers that work with these larger
disk block addresses. It also allows for the development of
UFS2 which will use 64-bit block addresses.
2002-03-15 18:49:47 +00:00
jhb
3706cd3509 Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.
2002-02-27 18:32:23 +00:00
phk
ef612b7ce4 Staticize the malloc definitions.
Obtained from:	~bde/sys.dif.gz
2002-02-10 21:42:44 +00:00