Commit Graph

1427 Commits

Author SHA1 Message Date
Luigi Rizzo
509fc65cb0 MFC r206551 (forgotten in previous commit): fix builds with ktr 2010-04-20 21:33:14 +00:00
Luigi Rizzo
31a3c43a0f MFC geom_sched code, a geom-based disk scheduling framework. 2010-04-20 15:23:12 +00:00
Pawel Jakub Dawidek
48ed64d027 MFC r206665:
Use lower priority for GELI worker threads. This improves system
responsiveness under heavy GELI load.
2010-04-18 21:26:59 +00:00
Pawel Jakub Dawidek
2b98f8400d MFC r204076,r204077,r204083,r205279:
r204076:

Please welcome HAST - Highly Avalable Storage.

HAST allows to transparently store data on two physically separated machines
connected over the TCP/IP network. HAST works in Primary-Secondary
(Master-Backup, Master-Slave) configuration, which means that only one of the
cluster nodes can be active at any given time. Only Primary node is able to
handle I/O requests to HAST-managed devices. Currently HAST is limited to two
cluster nodes in total.

HAST operates on block level - it provides disk-like devices in /dev/hast/
directory for use by file systems and/or applications. Working on block level
makes it transparent for file systems and applications. There in no difference
between using HAST-provided device and raw disk, partition, etc. All of them
are just regular GEOM providers in FreeBSD.

For more information please consult hastd(8), hastctl(8) and hast.conf(5)
manual pages, as well as http://wiki.FreeBSD.org/HAST.

Sponsored by:	FreeBSD Foundation
Sponsored by:	OMCnet Internet Service GmbH
Sponsored by:	TransIP BV

r204077:

Remove some lines left over by accident.

r204083:

Add missing KEYWORD line.

Pointed out by:	dougb

r205279 sys:

Simplify loops.
2010-04-18 21:14:49 +00:00
Andriy Gapon
7ca64047c2 MFC r206130: g_vfs_open: allow only one mount per device vnode 2010-04-17 11:57:41 +00:00
Jaakko Heinonen
e3359a8c1a MFC r205385:
Escape characters unsafe for XML output in GEOM class, instance and
provider names.

- Characters in range 0x01-0x1f except '\t', '\n', and '\r' are replaced
  with '?'. Those characters are disallowed in XML.
- '&', '<', '>', '\'', '"' and characters in range 0x7f-0xff are
  replaced with XML numeric character reference.

If the kern.geom.confxml sysctl provides invalid XML, libgeom
geom_xml2tree() fails and utilities using it do not work. Unsafe
characters are common in msdosfs and cd9660 labels.

PR:		kern/104389
2010-04-10 14:28:58 +00:00
Edward Tomasz Napierala
3fbc03cc99 MFC r199875:
Provide a set of sysctls and tunables to disable device node creation
for specific "kinds" of disk labels - for example, GPT UUIDs.  Reason
for this is that sometimes, other GEOM classes attach to these device
nodes instead of the proper ones - e.g. they attach to /dev/gptid/XXX
instead of /dev/ada0p2, which is annoying.

Reviewed by:	pjd (earlier version)
2010-03-27 18:04:33 +00:00
Xin LI
d1b9afd28a MFC r203408:
Prevent NULL deference by checking return value of
gctl_get_asciiparam.
2010-02-16 06:34:44 +00:00
Alexander Motin
981b11104a MFC r201566, r201567:
Move wakeup() out of mutex to reduce contention.
2010-02-05 11:56:12 +00:00
Alexander Motin
9185a70204 MFC r201545:
Slightly optimize XOR calculation.
2010-02-05 11:53:41 +00:00
Alexander Motin
9ab219c83f MFC r201264:
Call wakeup() only for the first request on the queue.
2010-02-05 11:52:28 +00:00
Antoine Brodin
e2b36efde5 MFC r201145 to stable/8:
(S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
  Fix some wrong usages.
  Note: this does not affect generated binaries as this argument is not used.

  PR:		137213
  Submitted by:	Eygene Ryabinkin (initial version)
2010-01-30 12:11:21 +00:00
Alexander Motin
92d8dadcf8 MFC r201139:
Add BIO_DELETE support to ada(4):
- For SSDs use TRIM feature of DATA SET MANAGEMENT command, as defined by
ACS-2 specification working draft.
- For CompactFlash use CFA ERASE command, same as ad(4) does.

With this patch, `newfs -E /dev/ada1` was able to restore write speed of
my heavily weared OCZ Vertex SSD (firmware 1.4) up to the initial level
for the most part of it's capacity.

I have no idea whether it is normal, but for some reason it takes 200ms
to handle any TRIM command on this drive, that was making delete extremely
slow. But TRIM command is able to accept long list of LBAs and the length of
that list seems doesn't affect it's execution time. Implemented request
clusting algorithm allowed me to rise delete rate up to reasonable numbers,
when many parallel DELETE requests running.
2010-01-19 12:58:29 +00:00
Alexander Motin
8e7a844b9b MFC r201645:
Change the way in which zero stripesize is handled. Instead of reporting
zero stripeoffset in such case (as if device has no stripes), report offset
from the beginning of the media (as if device has single infinite stripe).

This gives partitioning tools information, required to guess better
partition alignment, in case if hardware doesn't report it's stripe size.
For example, it should give disklabel info about odd offset made by fdisk.
2010-01-15 23:56:19 +00:00
Alexander Motin
2aa244f295 MFC r200934:
Add two disk ioctls, giving user-level tools information about disk/array
stripe (optimal access block) size and offset.
2010-01-05 13:51:23 +00:00
Alexander Motin
ae07f94f6f MFC r200942:
Make geom_concat to passthrough stripe parameters of the first component,
hoping that rest will fit.
2010-01-05 13:50:14 +00:00
Alexander Motin
3bda9adcd9 MFC r200940:
As soon as geom_raid3 reports it's own stripe as sector size, report largest
underlying provider's stripe, multiplied by number of data disks in array,
due to transformation done, as array stripe.
2010-01-05 13:49:18 +00:00
Alexander Motin
c730493bbf MFC r200935:
As soon as mirror has no own stripes, report largest stripe of unrerlying
components, hoping others fit, if they are not equal.
2010-01-05 13:47:55 +00:00
Alexander Motin
48eae87f12 MFC r200933:
Make geom_stripe report it's stripe size to upper layers.
2010-01-05 13:46:39 +00:00
Alexander Motin
1a7528a780 MFC r200821:
Make graid3 fallback to malloc() when component request size is bigger
then maximal prepared UMA zone size. This fixes crash with MAXPHYS > 128K.
2009-12-29 21:23:18 +00:00
Alexander Motin
417f972601 MFC r200086:
Change 'load' balancing mode algorithm:
- Instead of measuring last request execution time for each drive and
choosing one with smallest time, use averaged number of requests, running
on each drive. This information is more accurate and timely. It allows to
distribute load between drives in more even and predictable way.
- For each drive track offset of the last submitted request. If new request
offset matches previous one or close for some drive, prefer that drive.
It allows to significantly speedup simultaneous sequential reads.

PR:             kern/113885
2009-12-08 23:23:45 +00:00
Alexander Motin
e99e819fed MFC r196879:
Add support for changing providers priority.
2009-12-08 23:15:48 +00:00
Robert Noland
4cb9493f3f MFC r199017,199228
Fix handling of GPT headers when size is > 92 bytes.

This should address reading GPT headers written by opensolaris.
2009-11-21 14:53:08 +00:00
Rui Paulo
bbe1ae3552 MFC 199232:
Add a missing check for Apple HFS partitions.
2009-11-19 15:28:08 +00:00
Alexander Motin
c4511bef96 MFC r196964:
Do not check proper request alignment here in geom_dev in production.
It will be checked any way later by g_io_check() in g_io_schedule_down().
It is only needed here to not trigger panic from additional check, when
INVARIANTS enabled. So cover it with #ifdef INVARIANTS. It saves two
64bit divisions per request.
2009-11-17 21:45:28 +00:00
Alexander Motin
9c4a609c2d MFC r196904:
Remove msleep() timeout from g_io_schedule_up/down(). It works fine
without it, saving few percents of CPU on high request rates without
need to rearm callout twice per request.
2009-11-17 21:43:42 +00:00
Alexander Motin
8c184a2d26 MFC r196837:
Remove artificial MAX_IO_SIZE constant, equal to DFLTPHYS * 2. Use MAXPHYS
instead. It is NULL change for GENERIC kernel, but allows 'fast' mode to
work on systems with increased MAXPHYS.
2009-11-17 21:42:11 +00:00
Robert Noland
713444a957 MFC r198097
Set the active flag in the PMBR when we install bootcode on a GPT
partitioned disk.  Some BIOS require this to be set before they will
boot the device.
2009-10-30 15:45:00 +00:00
Pawel Jakub Dawidek
69882ff11d MFC r197898:
If provider is open for writing when we taste it, skip it for classes that
depend on on-disk metadata. This was we won't attach to providers that are used
by other classes. For example we don't want to configure partitions on da0 if
it is part of gmirror, what we really want is partitions on mirror/foo.

During regular work it works like this: if provider is open for writing a class
receives the spoiled event from GEOM and detaches, once provider is closed the
taste event is send again and class can rediscover its metadata if it is still
there.  This doesn't work that way when new class arrives, because GEOM gives
all existing providers for it to taste, also those open for writing. Classes
have to decided on their own if they want to deal with such providers (eg.
geom_dev) or not (classes modified by this commit).

Reported by:	des, Oliver Lehmann <lehmann@ans-netz.de>
Tested by:	des, Oliver Lehmann <lehmann@ans-netz.de>
Discussed with:	phk, marcel
Reviewed by:	marcel
Approved by:	re (kib)
2009-10-12 21:08:06 +00:00
Marcel Moolenaar
3b9790003e MFC rev 197449:
Don't create more partitions than can fit in the table by checking
that the index is within bounds.

Approved by:	re (kib)
2009-09-25 17:48:30 +00:00
Pawel Jakub Dawidek
6bd6f55621 MFC r196822, r196823, r196824:
Remove 'ad:' prefix from disk serial number. We don't want serial number
to change when we reconnect the disk in a way that it is accessible through
CAM for example.

Discussed with:	trasz

Simplify g_disk_ident_adjust() function and allow any printable character
in serial number.

Discussed with:	trasz
Obtained from:	Wheel Sp. z o.o. (http://www.wheel.pl)

Make serial numbers of daX disks visible by GEOM.

No objections from:	scottl
Obtained from:	Wheel Sp. z o.o. (http://www.wheel.pl)

Approved by:	re (kib)
2009-09-15 11:23:59 +00:00
Pawel Jakub Dawidek
264a8db4b0 MFC r196579:
Fix an obvious topology lock leak.

Approved by:	re (kib)
2009-09-07 16:25:09 +00:00
Marcel Moolenaar
87b51e539d MFC rev 196333:
The start of the EFI GPT partition in the PMBR can always be represented
by CHS addressing. Don't define these fields as 0xff, but rather define
them correctly. This prevents boot problems on PCs where GPT is being
used.

PR:             115406
Submitted by:   Kent Hauser <kent@khauser.net>
Approved by:    re (kib)
2009-08-17 16:24:50 +00:00
Ulf Lilleengen
b79cac0f92 - Fix the issue with read access count modification on RAID-5 plexes properly.
If the access counts were not increased and decreased in equal numbers by
  gvinum consumers, the read access count would be inconsistent with the write
  access count. Instead, modify the read access count with the write access
  count directly to prevent any inconsistencies.

Approved by:	re (kib)
2009-07-18 11:12:48 +00:00
Marcel Moolenaar
f43b57e32a Revert revisions 188839 and 188868. Use of the ioctl in geom_dev.c
is invalid because the ioctl happens without prior open. The ioctl
got introduced to provide backward compatibility for extended
partitions, but it ended up not being used because it didn't work
as expected. Since there are no consumers of the ioctl and the
implementation is broken, the best fix is to remove the code
entirely.

Spotted by:	phk
Approved by:	re (kensmith)
2009-07-08 05:56:14 +00:00
Edward Tomasz Napierala
8edfe76ab5 Fix a panic which (reportedly) can happen when unmounting a filesystem
with I/O requests in flight on kernels compiled with "options INVARIANTS".
Also, make it obvious it's not right to call g_valid_obj() (and macros
using it, e.g. G_VALID_CONSUMER()) without topology lock held.

Approved by:	re (kib)
Reported by:	pho
2009-07-01 20:16:29 +00:00
Edward Tomasz Napierala
fb231f3627 Make gjournal work with kernel compiled with "options DIAGNOSTIC".
Previously, it would panic immediately.

Reviewed by:	pjd
Approved by:	re (kib)
2009-06-30 14:34:06 +00:00
Ulf Lilleengen
ac2a008e69 - Apply the same naming rules of LVM names as done in the LVM code itself.
PR:		kern/135874
2009-06-24 22:09:30 +00:00
John Hay
65a4957806 Do not stop the loop when an empty or deleted directory entry is found.
Rather just skip over it.
2009-06-24 06:42:13 +00:00
Ivan Voras
63f4d880e0 Fix tabs, slightly improve comments.
Approved by:	gnn (mentor) (original)
Noticed by:	stas
2009-06-18 11:12:11 +00:00
Ivan Voras
452f657cb9 Add support for labels derived from GPT metadata.
Approved by:	gnn (mentor)
Reviewed by:	pjd
PR:		128398
Submitted by:	Marius Nuennerich < marius at nuenneri.ch >
2009-06-13 00:27:03 +00:00
Luigi Rizzo
6231f75bcf As discussed in the devsummit, introduce two fields in the
struct bio to store classification information, and a hook
for classifier functions that can be called by g_io_request().

This code is from Fabio Checconi as part of his GSOC work.
2009-06-11 09:55:26 +00:00
Pawel Jakub Dawidek
cb9b72ce4a Simplify. 2009-06-05 23:35:43 +00:00
Doug Barton
8b3bfb0509 Crank the debug level necessary to display the "Label foo is removed"
and "Label for provider ..." messages up from 0 to 1.
2009-05-30 22:31:52 +00:00
Jamie Gritton
76ca6f88da Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex.  Jails may
have their own host information, or they may inherit it from the
parent/system.  The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL.  The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by:	bz (mentor)
2009-05-29 21:27:12 +00:00
Ulf Lilleengen
4147dd02cd - Unbreak 64 bit platforms by casting off_t to intmax. 2009-05-26 14:15:06 +00:00
Ulf Lilleengen
6d66da20b7 - Fix wrong print on BIO_DONE.
- Use db_printf instead of printf. While here, apply this to other ddb commands
  as well.

Pointed out by:		pjd
2009-05-26 10:03:44 +00:00
Ulf Lilleengen
bf7d2c1797 - Add 'show bio' DDB command.
MFC after:	3 weeks
2009-05-26 07:29:17 +00:00
Edward Tomasz Napierala
916cd41c47 Check return value of gctl_get_asciiparam().
Found with:	Coverity Prevent(tm)
CID:		1118
2009-05-12 16:59:50 +00:00
Attilio Rao
dfd233edd5 Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS.  Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled.  Bump __FreeBSD_version in order to signal such
situation.
2009-05-11 15:33:26 +00:00