Commit Graph

164 Commits

Author SHA1 Message Date
John Baldwin
a08d2e7fe1 - Conditionally acquire Giant in mdstart_vnode(), mdcreate_vnode(), and
mddestroy() only if the file is from a non-MPSAFE VFS.
- No longer unconditionally hold Giant in the md kthread for vnode-backed
  kthreads.
- Improve the handling of the thread exit race when destroying an md
  device.
2006-03-28 21:25:11 +00:00
Wojciech A. Koszek
c27a895433 Teach md(4) and mdconfig(8) how to understand XML. Right now there won't be
a problem with listing large number of md(4) devices. Either 'list' or
'query' mode uses XML.

Additionally, new functionality was introduced. It's possible to pass
multiple devices to -u:

	# ./mdconfig -l -u md0,md1

Approved by:	cognet (mentor)
2006-03-26 23:21:11 +00:00
Luigi Rizzo
de64f22aa4 make sure that the start and end preloaded MFS markers are
in contiguous strings, and that the compiler does not optimize them
away because it thinks they are unused.
2006-01-31 13:35:30 +00:00
Pawel Jakub Dawidek
b322d85d53 Call NDFREE() only when vn_open() succeeded.
MFC after:	3 days
2006-01-27 11:27:55 +00:00
Maxim Konovalov
6c3cd0e2f6 o Fix typos in the comments.
Submitted by:	Wojciech A. Koszek
2005-12-28 15:18:18 +00:00
Robert Watson
5bb84bc84b Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in
  memory monitoring tools such as vmstat.

- Remove punctuation that is incompatible with using memory type names
  as file names, such as '/' characters.

- Disambiguate some collisions by adding subsystem prefixes to some
  memory types.

- Generally prefer lower case to upper case.

- If the same type is defined in multiple architecture directories,
  attempt to use the same name in additional cases.

Not all instances were caught in this change, so more work is required to
finish this conversion.  Similar changes are required for UMA zone names.
2005-10-31 15:41:29 +00:00
Poul-Henning Kamp
947fc8de03 Make sure that the worker thread knows the type early enough to
grab Giant for vnode backing.

Found by:	pho & tegge
2005-10-06 19:47:04 +00:00
Poul-Henning Kamp
9b00ca1961 Fix configuration locking in MD.
Remove  md_mtx.

Remove GIANT from the mdctl device driver and avoid DROP_GIANT,
PICKUP_GIANT and geom events since we can call into GEOM directly
now.

Pick up Giant around vn_close().

Apply an exclusive sx around mdctls ioctl and preloading to protect
lists etc..

Don't initialize our lock (md_mtx or md_sx) from a
SYSINIT when there is a perfectly good pair of _fini/_init
functions to do it from.

Prune any final fractional sector from the mediasize to
keep GEOM happy.

Cleanups:

Unify MDIOVERSION check in (x)mdctlioctl()

Add pointer to start() routine to softc to eliminate a switch{}

Inline guts of mddetach().

Always pass error pointer to mdnew(), simplify implementation.
2005-09-19 06:55:27 +00:00
Poul-Henning Kamp
9fbea3e365 Do not destroy the queue mutex until the thread is done with it. 2005-09-11 12:35:32 +00:00
Pawel Jakub Dawidek
7ee3c044d0 - Add md_mtx lock to protect ID number and list of devices.
- Always check mdnew() return value, as even in !autounit case
  kthread_create() can fail.

Those two changes fix serval panics provked by simple stress test.

Tested by:	Kris The BugMagnet
MFC after:	3 days
2005-08-31 19:45:11 +00:00
Christian S.J. Peron
8677689134 Ensure that file flags such as schg, sappnd (and others) are honored
by md(4). Before this change, it was possible to by-pass these flags
by creating memory disks which used a file as a backing store and
writing to the device.

This was discussed by the security team, and although this is problematic,
it was decided that it was not critical as we never guarantee that root will
be restricted.

This change implements the following behavior changes:

-If the user specifies the readonly flag, unset write operations before
 opening the file. If the FWRITE mask is unset, the device will be
 created with the MD_READONLY mask set. (readonly)
-Add a check in g_md_access which checks to see if the MD_READONLY mask
 is set, if so return EROFS
-Do not gracefully downgrade access modes without telling the user. Instead
 make the user specify their intentions for the device (assuming the file is
 read only). This seems like the more correct way to handle things.

This is a RELENG_6 candidate.

PR:		kern/84635
Reviewed by:	phk
2005-08-17 01:24:55 +00:00
Alan Cox
e340fc602b Request a CPU private mapping from sf_buf_alloc(). If the swap-backed
memory disk is larger than the number of available sf_bufs, this improves
performance on SMPs by eliminating interprocessor TLB shootdowns.  For
example, with 6656 sf_bufs, the default on my test machine, and a 256MB
swap-backed memory disk, I see the command
"dd if=/dev/md0 of=/dev/null bs=64k" achieve ~489MB/sec with the default,
shared mappings, and ~587MB/sec with CPU private mappings.
2005-02-13 21:51:50 +00:00
Poul-Henning Kamp
d9aaa28f63 Use MAXMINOR 2005-01-29 16:50:04 +00:00
Pawel Jakub Dawidek
1db17c6db2 - Don't destroy UMA zone on error in mdcreate_malloc(), because we need it
in mddestroy() to properly free already allocated memory.
  This fixes a panic when we want to create too big memory backed device
  with preallocate memory (-o reserve).
- Remove redundant { }.

MFC after:	1 week
2005-01-22 19:56:03 +00:00
Poul-Henning Kamp
9d3a77c463 Add a couple of mtx_asserts() to try to narrow down the window on
a bug repeatedly reported.
2005-01-22 19:08:50 +00:00
Warner Losh
098ca2bda9 Start each of the license/copyright comments with /*-, minor shuffle of lines 2005-01-06 01:43:34 +00:00
Alan Cox
c935314fae Add needed synchronization to the error handling code that was introduced
in revision 1.141.

Lock assertion failures reported by: Kris Kennaway
2005-01-05 05:32:52 +00:00
John Baldwin
63710c4d35 Stop explicitly touching td_base_pri outside of the scheduler and simply
set a thread's priority via sched_prio() when that is the desired action.
The schedulers will start managing td_base_pri internally shortly.
2004-12-30 20:29:58 +00:00
Pawel Jakub Dawidek
88b5b78d59 Rewrite piece of code which I committed some time ago that allows to
show file name for 'mdconfig -l -u <x>' command.
This allows to preserve API/ABI compatibility with version 0 (that's why
I changed version number back to 0) and will allow to merge this change
to RELENG_5.

MFC after:	5 days
2004-12-27 17:20:06 +00:00
Marcel Moolenaar
8b6fc67a49 Fix the MDIOCDETACH ioctl() for md(4). Now that the md_file field in
the mdio structure is an array and not a pointer, we cannot test for
it to be NULL. It never is. Instead, test for md_file[0] to be '\0'.
2004-11-13 05:00:12 +00:00
Pawel Jakub Dawidek
e3ed29a739 Be consistent and use 'if (error != 0)' instead of 'if (error)' everywhere. 2004-11-06 13:16:35 +00:00
Pawel Jakub Dawidek
61a6eb62ec For file backed md(4) devices output their source file via
'mdconfig -l -u <unit>'.
Bump version number, as this change breaks ABI/API.
2004-11-06 13:07:02 +00:00
Poul-Henning Kamp
3b66ad07db Don't explicitly call g_waitidle(), it happens automagically now. 2004-10-23 20:50:06 +00:00
Brian Feldman
812851b6c9 Account for failure in vm_pager_allocate() or vm_pager_get_pages() in
md(8).  The former is generally not going to fail, but the latter can
fail when the underlying swap device returns an error.

There are still plenty of other places where vm_pager_get_pages() failing
will lead directly to crashes, so it's a good idea to put your swap on
RAID if you care enough to put any of your disks on RAID....
2004-10-12 04:47:16 +00:00
Pawel Jakub Dawidek
e4cdd0d4b5 Actually this order (unlock, wakeup) in this case is race-safe and can
save us 2 context switches.

Explained by:	njl
2004-09-18 09:16:19 +00:00
Pawel Jakub Dawidek
b830359bc5 - Make md(4) 64-bit clean.
After this change it should be possible to use very big md(4) devices.
- Clean up and simplify the code a bit.
- Use humanize_number(3) to print size of md(4) devices.
- Add 't' suffix which stands for terabyte.
- Make '-S' to really work with all types of devices.
- Other minor changes.
2004-09-16 21:32:13 +00:00
Pawel Jakub Dawidek
fcd57fbe6f There is no need to keep 'npage' value inside our softc structure,
it is only used in one function. While doing so, change its type to
vm_ooffset_t.
We are still limited for swap-backed devices to 16TB on 32-bit architectures
where PAGE_SIZE is 4096 bytes.
2004-09-16 20:38:11 +00:00
Pawel Jakub Dawidek
a8a58d03f6 - Do not use bio_pblkno as it is going away anyway.
- Prefer bio_length than bio_bcount.
2004-09-16 19:42:17 +00:00
Pawel Jakub Dawidek
4b07ede4a7 First wakeup, then unlock. 2004-09-16 18:59:19 +00:00
Pawel Jakub Dawidek
6ab0a0aefe Type 'int' is too small for 'i' and 'lastp' variables. Use proper type,
which is vm_pindex_t (unsigned 64bit on i386).
2004-09-16 18:56:20 +00:00
Pawel Jakub Dawidek
2eafd8b126 Deallocate VM object on failure. 2004-09-14 19:55:07 +00:00
Pawel Jakub Dawidek
7a0970111f One more missing NDFREE(9). 2004-09-14 19:27:59 +00:00
Pawel Jakub Dawidek
52c6716fee - Don't forget about NDFREE() in case of vn_open() failure.
- Don't forget about vn_close() in case of failure.
2004-09-14 18:43:24 +00:00
Pawel Jakub Dawidek
f9963bbc08 Fix UMA zone leak. 2004-09-14 18:32:05 +00:00
Poul-Henning Kamp
affa470653 Use bioq_takefirst() 2004-09-07 07:54:45 +00:00
Colin Percival
972be79add Don't g_waitidle() when initializing a preloaded md. This fixes a
deadlock which otherwise occurs during the boot process.

Reported by:	kensmith
MFC after:	3 days
		(assuming that re@ approves)
2004-08-30 08:38:30 +00:00
Colin Percival
2b004a226d When creating a new md, wait for geom's event queue to become empty
before returning.  Device nodes are created via the "taste" mechanism,
so this is necessary in order to make sure that devfs entries are
created before mdconfig(8) returns.

This may be a MFC candidate for 5.3.

Suggested by:	phk
2004-08-22 19:44:24 +00:00
Poul-Henning Kamp
5721c9c76a Tag all geom classes in the tree with a version number. 2004-08-08 07:57:53 +00:00
Poul-Henning Kamp
199456973f Use a ->fini() from the geom class to destroy the control device.
Use default initialization of geom methods.
2004-08-08 06:47:43 +00:00
Poul-Henning Kamp
3e019deaed Do a pass over all modules in the kernel and make them return EOPNOTSUPP
for unknown events.

A number of modules return EINVAL in this instance, and I have left
those alone for now and instead taught MOD_QUIESCE to accept this
as "didn't do anything".
2004-07-15 08:26:07 +00:00
Poul-Henning Kamp
89c9c53da0 Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.
2004-06-16 09:47:26 +00:00
Pawel Jakub Dawidek
6a40892929 Fix panic which occurs when given sector size for memory-backed device
is less than DEV_BSIZE (512) bytes.

Reported by:	Mike Bristow <mike@urgle.com>
Approved by:	phk
2004-05-18 07:30:04 +00:00
Warner Losh
ed010cdfa2 Ooops, removed this acknowledgement bogusly.
Eagle Eyes: bde
2004-04-09 05:12:47 +00:00
Warner Losh
f36cfd49ad Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson
2004-04-07 20:46:16 +00:00
Alan Cox
121230a40d In some cases, sf_buf_alloc() should sleep with pri PCATCH; in others, it
should not.  Add a new parameter so that the caller can specify which is
the case.

Reported by:	dillon
2004-04-03 09:16:27 +00:00
Luigi Rizzo
5d4ca75e56 Fix a bug with preloaded image -- for some reason [that i don't
completely understand], md_takeroot() runs before md_preloaded(),
rendering both useless.
As a fix, move the body (effectively one line!) of md_takeroot()
into md_preloaded(), and get rid of the stuff that has become useless.

Bug and fix reported 10 days ago on -current, no reply.
2004-03-31 21:48:02 +00:00
Alan Cox
07be617f09 - Remove some unused #includes.
- Apply some style fixes to mdstart_swap().
2004-03-19 21:19:15 +00:00
Alan Cox
7cd53fdda8 Utilize sf_buf_alloc() and sf_buf_free() to implement the ephemeral
mappings required by mdstart_swap().  On i386, if the ephemeral mapping
is already in the sf_buf mapping cache, a swap-backed md performs
similarly to a malloc-backed md.  Even if the ephemeral mapping is not
cached, this implementation is still faster.  On 64-bit platforms, this
change has the effect of using the direct virtual-to-physical mapping,
avoiding ephemeral mapping overheads, such as TLB shootdowns on SMPs.

On a 2.4GHz, 400MHz FSB P4 Xeon configured with 64K sf_bufs and
"mdmfs -S -o async -s 128m md /mnt"

before:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.430923 secs (311465697 bytes/sec)

after with cold sf_buf cache:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.367948 secs (364773576 bytes/sec)

after with warm sf_buf cache:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.252826 secs (530870010 bytes/sec)

malloc-backed md:
dd if=/dev/md0 of=/dev/null bs=64k
134217728 bytes transferred in 0.253126 secs (530240978 bytes/sec)
2004-03-18 18:23:37 +00:00
Alan Cox
33651381b7 Allow swap-backed devices to run without Giant. 2004-03-14 00:24:30 +00:00
Poul-Henning Kamp
7a6b2b6429 Fix a long-standing deadlock issue with vnode backed md(4) devices:
On vnode backed md(4) devices over a certain, currently undetermined
size relative to the buffer cache our "lemming-syncer" can provoke
a buffer starvation which puts the md thread to sleep on wdrain.

This generally tends to grind the entire system to a stop because the
event that is supposed to wake up the thread will not happen until a fair
bit of the piled up I/O requests in the system finish, and since a lot
of those are on a md(4) vnode backed device which is currently waiting
on wdrain until a fair amount of the piled up ... you get the picture.

The cure is to issue all VOP_WRITES on the vnode backing the device
with IO_SYNC.

In addition to more closely emulating a real disk device with a
non-lying write-cache, this makes the writes exempt from rate-limited
(there to avoid starving the buffer cache) and consequently prevents
the deadlock.

Unfortunately performance takes a hit.

Add "async" option to give people who know what they are doing the
old behaviour.
2004-03-10 20:41:09 +00:00