1429 Commits

Author SHA1 Message Date
mav
aa7d598791 Change the way in which zero stripesize is handled. Instead of reporting
zero stripeoffset in such case (as if device has no stripes), report offset
from the beginning of the media (as if device has single infinite stripe).

This gives partitioning tools information, required to guess better
partition alignment, in case if hardware doesn't report it's stripe size.
For example, it should give disklabel info about odd offset made by fdisk.
2010-01-06 13:14:37 +00:00
mav
a615da72de Move wakeup() out of mutex to reduce contention. 2010-01-05 10:52:21 +00:00
mav
1073c59bbd Move wakeup() out of mutex to reduce contention. 2010-01-05 10:30:56 +00:00
mav
6c3ad0385c Slightly optimize XOR calculation. 2010-01-05 02:06:05 +00:00
marcel
5390d15d43 Properly return the UUID represented by the alias.
PR:		142174
Submitted by:	Przemyslaw Laczynski <torindel@gmail.com>
Pointy hat to:	rpaulo
2010-01-02 01:02:59 +00:00
mav
b5e1bf6b39 Call wakeup() only for the first request on the queue. 2009-12-30 17:23:27 +00:00
antoine
bfd388c026 (S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.

PR:		137213
Submitted by:	Eygene Ryabinkin (initial version)
MFC after:	1 month
2009-12-28 22:56:30 +00:00
mav
f60568ce2e Add BIO_DELETE support to ada(4):
- For SSDs use TRIM feature of DATA SET MANAGEMENT command, as defined by
ACS-2 specification working draft.
- For CompactFlash use CFA ERASE command, same as ad(4) does.

With this patch, `newfs -E /dev/ada1` was able to restore write speed of
my heavily weared OCZ Vertex SSD (firmware 1.4) up to the initial level
for the most part of it's capacity. Previous 1.3 firmware, even reportiong
TRIM capabilty bit set, was not working, reporting ABORT error for every
DSM command.

I have no idea whether it is normal, but for some reason it takes 200ms
to handle any TRIM command on this drive, that was making delete extremely
slow. But TRIM command is able to accept long list of LBAs and the length of
that list seems doesn't affect it's execution time. Implemented request
clusting algorithm allowed me to rise delete rate up to reasonable numbers,
when many parallel DELETE requests running.
2009-12-28 20:08:01 +00:00
mav
ab355c0d66 Make geom_concat to passthrough stripe parameters of the first component,
hoping that rest will fit.
2009-12-24 14:32:21 +00:00
mav
cf5b790219 As soon as geom_raid3 reports it's own stripe as sector size, report largest
underlying provider's stripe, multiplied by number of data disks in array,
due to transformation done, as array stripe.
2009-12-24 13:38:02 +00:00
mav
e6038335e8 As soon as mirror has no own stripes, report largest stripe of unrerlying
components, hoping others fit, if they are not equal.
2009-12-24 12:17:22 +00:00
mav
f90d832436 Add two disk ioctls, giving user-level tools information about disk/array
stripe (optimal access block) size and offset.
2009-12-24 11:05:23 +00:00
mav
5908bf1b9f Make geom_stripe report it's stripe size to upper layers. 2009-12-24 10:43:44 +00:00
mav
3edd147826 Make graid3 fallback to malloc() when component request size is bigger
then maximal prepared UMA zone size. This fixes crash with MAXPHYS > 128K.
2009-12-21 23:31:03 +00:00
rpaulo
75979e374b Add Microsoft and NetBSD partition types handling. 2009-12-14 20:26:27 +00:00
rpaulo
6d6ee0c2dc Simplify partition type parsing by using a data-oriented model.
While there add more Apple and Linux partition types.
2009-12-14 20:04:06 +00:00
mav
f018f4f599 Change 'load' balancing mode algorithm:
- Instead of measuring last request execution time for each drive and
choosing one with smallest time, use averaged number of requests, running
on each drive. This information is more accurate and timely. It allows to
distribute load between drives in more even and predictable way.
- For each drive track offset of the last submitted request. If new request
offset matches previous one or close for some drive, prefer that drive.
It allows to significantly speedup simultaneous sequential reads.

PR:		kern/113885
Reviewed by:	sobomax
2009-12-03 21:47:51 +00:00
trasz
59762a5f5b Provide a set of sysctls and tunables to disable device node creation
for specific "kinds" of disk labels - for example, GPT UUIDs.  Reason
for this is that sometimes, other GEOM classes attach to these device
nodes instead of the proper ones - e.g. they attach to /dev/gptid/XXX
instead of /dev/ada0p2, which is annoying.

Reviewed by:	pjd (earlier version)
MFC after:	1 month
2009-11-28 11:57:43 +00:00
rpaulo
e51ec1f083 Add a missing check for Apple HFS partitions.
MFC after:	1 week
2009-11-12 19:30:49 +00:00
rnoland
ab54ccdf3e We need to allocate space for the header in the create path also.
This fixes a null pointer dereference with "gpart create -s GPT" after
the previous commit.

Reported by:	Yuri Pankov
Pointyhat to:	me
MFC after:	1 week
2009-11-12 16:28:39 +00:00
rnoland
ebaba37e4e Fix handling of GPT headers when size is > 92 bytes.
It is valid for an on-disk GPT header to report a header size which is
greater than 92 bytes.  Previously, we would read in the sector and copy
only the 92 bytes that we know how to deal with before calculating the
checksum for comparison.  This meant that when we did the checksum, we
overshot the buffer and took in random memory, so the checksum would fail.

We now determine the size of the header and allocate enough space to
preserve the entire on-disk contents.  This allows us to be correctly
calculate the checksum and be able to modify and write the header back
to the disk, while preserving data that we might not understand.

Reported by:	Kris Weston
Approved by:	marcel@
MFC after:	2 weeks
2009-11-07 17:29:03 +00:00
rnoland
8dda941da3 Set the active flag in the PMBR when we install bootcode on a GPT
partitioned disk.  Some BIOS require this to be set before they will
boot the device.

Approved by:	marcel
MFC after:	2 weeks
2009-10-14 19:24:01 +00:00
pjd
f5413acb70 If provider is open for writing when we taste it, skip it for classes that
depend on on-disk metadata. This was we won't attach to providers that are used
by other classes. For example we don't want to configure partitions on da0 if
it is part of gmirror, what we really want is partitions on mirror/foo.

During regular work it works like this: if provider is open for writing a class
receives the spoiled event from GEOM and detaches, once provider is closed the
taste event is send again and class can rediscover its metadata if it is still
there.  This doesn't work that way when new class arrives, because GEOM gives
all existing providers for it to taste, also those open for writing. Classes
have to decided on their own if they want to deal with such providers (eg.
geom_dev) or not (classes modified by this commit).

Reported by:	des, Oliver Lehmann <lehmann@ans-netz.de>
Tested by:	des, Oliver Lehmann <lehmann@ans-netz.de>
Discussed with:	phk, marcel
Reviewed by:	marcel
MFC after:	3 days
2009-10-09 09:42:22 +00:00
lulf
b3a1193d86 - Improve error message consistency and wording. 2009-10-05 08:44:31 +00:00
marcel
063c1246e2 The first 96 bytes may not be zeroes. It can contain trivial boot
code that merely emits an error and waits for a key press before
rebooting. The error being that extended partitions are not
bootable. The origin is presumed to be Windows 2000; Windows XP
does not do this...

For now, ignore the first 96 bytes when checking that the EBR is
(for the most part) all zeroes.

Tested by:	Mario Lobo <mlobo@digiart.art.br>
MFC after:	1 week
2009-09-28 23:52:47 +00:00
marcel
e9f89a2ebc Don't create more partitions than can fit in the table by checking
that the index is within bounds.
2009-09-24 06:00:49 +00:00
trasz
a0489928cd Remove unused variable. 2009-09-08 17:20:17 +00:00
mav
676cac231c Do not check proper request alignment here in geom_dev in production.
It will be checked any way later by g_io_check() in g_io_schedule_down().
It is only needed here to not trigger panic from additional check, when
INVARIANTS enabled. So cover it with #ifdef INVARIANTS. It saves two
64bit divisions per request.
2009-09-08 05:46:38 +00:00
mav
9980b82769 MFp4:
Remove msleep() timeout from g_io_schedule_up/down(). It works fine
without it, saving few percents of CPU on high request rates without
need to rearm callout twice per request.
2009-09-06 19:33:13 +00:00
pjd
e816f77286 Add support for changing providers priority.
Submitted by:	Mel Flynn
2009-09-06 06:52:06 +00:00
mav
ff8e63fcdf Remove artificial MAX_IO_SIZE constant, equal to DFLTPHYS * 2. Use MAXPHYS
instead. It is NULL change for GENERIC kernel, but allows 'fast' mode to
work on systems with increased MAXPHYS.
2009-09-04 19:20:46 +00:00
pjd
39353b796a Simplify g_disk_ident_adjust() function and allow any printable character
in serial number.

Discussed with:	trasz
Obtained from:	Wheel Sp. z o.o. (http://www.wheel.pl)
2009-09-04 09:39:06 +00:00
pjd
728f7b16d7 There's no need for checking result of M_WAITOK allocation. 2009-08-27 08:40:51 +00:00
pjd
d461caca3f Fix an obvious topology lock leak.
MFC after:	3 days
2009-08-27 08:28:34 +00:00
marcel
61d6cdbea1 The start of the EFI GPT partition in the PMBR can always be represented
by CHS addressing. Don't define these fields as 0xff, but rather define
them correctly. This prevents boot problems on PCs where GPT is being
used.

PR:		115406
Submitted by:	Kent Hauser <kent@khauser.net>
Approved by:	re (kib)
2009-08-17 16:16:46 +00:00
lulf
08a0ba34aa - Fix the issue with read access count modification on RAID-5 plexes properly.
If the access counts were not increased and decreased in equal numbers by
  gvinum consumers, the read access count would be inconsistent with the write
  access count. Instead, modify the read access count with the write access
  count directly to prevent any inconsistencies.

Approved by:	re (kib)
2009-07-18 11:12:48 +00:00
marcel
8fa709769a Revert revisions 188839 and 188868. Use of the ioctl in geom_dev.c
is invalid because the ioctl happens without prior open. The ioctl
got introduced to provide backward compatibility for extended
partitions, but it ended up not being used because it didn't work
as expected. Since there are no consumers of the ioctl and the
implementation is broken, the best fix is to remove the code
entirely.

Spotted by:	phk
Approved by:	re (kensmith)
2009-07-08 05:56:14 +00:00
trasz
bde8460ebe Fix a panic which (reportedly) can happen when unmounting a filesystem
with I/O requests in flight on kernels compiled with "options INVARIANTS".
Also, make it obvious it's not right to call g_valid_obj() (and macros
using it, e.g. G_VALID_CONSUMER()) without topology lock held.

Approved by:	re (kib)
Reported by:	pho
2009-07-01 20:16:29 +00:00
trasz
669f7d9712 Make gjournal work with kernel compiled with "options DIAGNOSTIC".
Previously, it would panic immediately.

Reviewed by:	pjd
Approved by:	re (kib)
2009-06-30 14:34:06 +00:00
lulf
380cb58248 - Apply the same naming rules of LVM names as done in the LVM code itself.
PR:		kern/135874
2009-06-24 22:09:30 +00:00
jhay
4067f2d3af Do not stop the loop when an empty or deleted directory entry is found.
Rather just skip over it.
2009-06-24 06:42:13 +00:00
ivoras
23d60df09c Fix tabs, slightly improve comments.
Approved by:	gnn (mentor) (original)
Noticed by:	stas
2009-06-18 11:12:11 +00:00
ivoras
79583448b4 Add support for labels derived from GPT metadata.
Approved by:	gnn (mentor)
Reviewed by:	pjd
PR:		128398
Submitted by:	Marius Nuennerich < marius at nuenneri.ch >
2009-06-13 00:27:03 +00:00
luigi
39676e4ab4 As discussed in the devsummit, introduce two fields in the
struct bio to store classification information, and a hook
for classifier functions that can be called by g_io_request().

This code is from Fabio Checconi as part of his GSOC work.
2009-06-11 09:55:26 +00:00
pjd
6796cdb325 Simplify. 2009-06-05 23:35:43 +00:00
dougb
e2703a8f9f Crank the debug level necessary to display the "Label foo is removed"
and "Label for provider ..." messages up from 0 to 1.
2009-05-30 22:31:52 +00:00
jamie
572db1408a Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex.  Jails may
have their own host information, or they may inherit it from the
parent/system.  The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL.  The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by:	bz (mentor)
2009-05-29 21:27:12 +00:00
lulf
8765020747 - Unbreak 64 bit platforms by casting off_t to intmax. 2009-05-26 14:15:06 +00:00
lulf
66e14dfc33 - Fix wrong print on BIO_DONE.
- Use db_printf instead of printf. While here, apply this to other ddb commands
  as well.

Pointed out by:		pjd
2009-05-26 10:03:44 +00:00
lulf
6ffe643641 - Add 'show bio' DDB command.
MFC after:	3 weeks
2009-05-26 07:29:17 +00:00