Commit Graph

1523 Commits

Author SHA1 Message Date
Antoine Brodin
13e403fdea (S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.

PR:		137213
Submitted by:	Eygene Ryabinkin (initial version)
MFC after:	1 month
2009-12-28 22:56:30 +00:00
Alexander Motin
1c80ec0a6b Add BIO_DELETE support to ada(4):
- For SSDs use TRIM feature of DATA SET MANAGEMENT command, as defined by
ACS-2 specification working draft.
- For CompactFlash use CFA ERASE command, same as ad(4) does.

With this patch, `newfs -E /dev/ada1` was able to restore write speed of
my heavily weared OCZ Vertex SSD (firmware 1.4) up to the initial level
for the most part of it's capacity. Previous 1.3 firmware, even reportiong
TRIM capabilty bit set, was not working, reporting ABORT error for every
DSM command.

I have no idea whether it is normal, but for some reason it takes 200ms
to handle any TRIM command on this drive, that was making delete extremely
slow. But TRIM command is able to accept long list of LBAs and the length of
that list seems doesn't affect it's execution time. Implemented request
clusting algorithm allowed me to rise delete rate up to reasonable numbers,
when many parallel DELETE requests running.
2009-12-28 20:08:01 +00:00
Alexander Motin
5f9b1143ac Make geom_concat to passthrough stripe parameters of the first component,
hoping that rest will fit.
2009-12-24 14:32:21 +00:00
Alexander Motin
113d8e5046 As soon as geom_raid3 reports it's own stripe as sector size, report largest
underlying provider's stripe, multiplied by number of data disks in array,
due to transformation done, as array stripe.
2009-12-24 13:38:02 +00:00
Alexander Motin
92f60381d9 As soon as mirror has no own stripes, report largest stripe of unrerlying
components, hoping others fit, if they are not equal.
2009-12-24 12:17:22 +00:00
Alexander Motin
8b30323843 Add two disk ioctls, giving user-level tools information about disk/array
stripe (optimal access block) size and offset.
2009-12-24 11:05:23 +00:00
Alexander Motin
f00919d2fc Make geom_stripe report it's stripe size to upper layers. 2009-12-24 10:43:44 +00:00
Alexander Motin
d4060fa67d Make graid3 fallback to malloc() when component request size is bigger
then maximal prepared UMA zone size. This fixes crash with MAXPHYS > 128K.
2009-12-21 23:31:03 +00:00
Rui Paulo
33f7a4124d Add Microsoft and NetBSD partition types handling. 2009-12-14 20:26:27 +00:00
Rui Paulo
f13174303d Simplify partition type parsing by using a data-oriented model.
While there add more Apple and Linux partition types.
2009-12-14 20:04:06 +00:00
Alexander Motin
891852cc12 Change 'load' balancing mode algorithm:
- Instead of measuring last request execution time for each drive and
choosing one with smallest time, use averaged number of requests, running
on each drive. This information is more accurate and timely. It allows to
distribute load between drives in more even and predictable way.
- For each drive track offset of the last submitted request. If new request
offset matches previous one or close for some drive, prefer that drive.
It allows to significantly speedup simultaneous sequential reads.

PR:		kern/113885
Reviewed by:	sobomax
2009-12-03 21:47:51 +00:00
Edward Tomasz Napierala
3ce9ca8947 Provide a set of sysctls and tunables to disable device node creation
for specific "kinds" of disk labels - for example, GPT UUIDs.  Reason
for this is that sometimes, other GEOM classes attach to these device
nodes instead of the proper ones - e.g. they attach to /dev/gptid/XXX
instead of /dev/ada0p2, which is annoying.

Reviewed by:	pjd (earlier version)
MFC after:	1 month
2009-11-28 11:57:43 +00:00
Rui Paulo
f9d551f7df Add a missing check for Apple HFS partitions.
MFC after:	1 week
2009-11-12 19:30:49 +00:00
Robert Noland
a59a131093 We need to allocate space for the header in the create path also.
This fixes a null pointer dereference with "gpart create -s GPT" after
the previous commit.

Reported by:	Yuri Pankov
Pointyhat to:	me
MFC after:	1 week
2009-11-12 16:28:39 +00:00
Robert Noland
1c2dee3cc9 Fix handling of GPT headers when size is > 92 bytes.
It is valid for an on-disk GPT header to report a header size which is
greater than 92 bytes.  Previously, we would read in the sector and copy
only the 92 bytes that we know how to deal with before calculating the
checksum for comparison.  This meant that when we did the checksum, we
overshot the buffer and took in random memory, so the checksum would fail.

We now determine the size of the header and allocate enough space to
preserve the entire on-disk contents.  This allows us to be correctly
calculate the checksum and be able to modify and write the header back
to the disk, while preserving data that we might not understand.

Reported by:	Kris Weston
Approved by:	marcel@
MFC after:	2 weeks
2009-11-07 17:29:03 +00:00
Robert Noland
e80d42dda2 Set the active flag in the PMBR when we install bootcode on a GPT
partitioned disk.  Some BIOS require this to be set before they will
boot the device.

Approved by:	marcel
MFC after:	2 weeks
2009-10-14 19:24:01 +00:00
Pawel Jakub Dawidek
f8727e71d7 If provider is open for writing when we taste it, skip it for classes that
depend on on-disk metadata. This was we won't attach to providers that are used
by other classes. For example we don't want to configure partitions on da0 if
it is part of gmirror, what we really want is partitions on mirror/foo.

During regular work it works like this: if provider is open for writing a class
receives the spoiled event from GEOM and detaches, once provider is closed the
taste event is send again and class can rediscover its metadata if it is still
there.  This doesn't work that way when new class arrives, because GEOM gives
all existing providers for it to taste, also those open for writing. Classes
have to decided on their own if they want to deal with such providers (eg.
geom_dev) or not (classes modified by this commit).

Reported by:	des, Oliver Lehmann <lehmann@ans-netz.de>
Tested by:	des, Oliver Lehmann <lehmann@ans-netz.de>
Discussed with:	phk, marcel
Reviewed by:	marcel
MFC after:	3 days
2009-10-09 09:42:22 +00:00
Ulf Lilleengen
a8a3cd7d9d - Improve error message consistency and wording. 2009-10-05 08:44:31 +00:00
Marcel Moolenaar
b61808630d The first 96 bytes may not be zeroes. It can contain trivial boot
code that merely emits an error and waits for a key press before
rebooting. The error being that extended partitions are not
bootable. The origin is presumed to be Windows 2000; Windows XP
does not do this...

For now, ignore the first 96 bytes when checking that the EBR is
(for the most part) all zeroes.

Tested by:	Mario Lobo <mlobo@digiart.art.br>
MFC after:	1 week
2009-09-28 23:52:47 +00:00
Marcel Moolenaar
87f4470620 Don't create more partitions than can fit in the table by checking
that the index is within bounds.
2009-09-24 06:00:49 +00:00
Edward Tomasz Napierala
bb3fd7ff4f Remove unused variable. 2009-09-08 17:20:17 +00:00
Alexander Motin
18e42503ed Do not check proper request alignment here in geom_dev in production.
It will be checked any way later by g_io_check() in g_io_schedule_down().
It is only needed here to not trigger panic from additional check, when
INVARIANTS enabled. So cover it with #ifdef INVARIANTS. It saves two
64bit divisions per request.
2009-09-08 05:46:38 +00:00
Alexander Motin
7fc019af65 MFp4:
Remove msleep() timeout from g_io_schedule_up/down(). It works fine
without it, saving few percents of CPU on high request rates without
need to rearm callout twice per request.
2009-09-06 19:33:13 +00:00
Pawel Jakub Dawidek
b740e905a4 Add support for changing providers priority.
Submitted by:	Mel Flynn
2009-09-06 06:52:06 +00:00
Alexander Motin
af582ea7af Remove artificial MAX_IO_SIZE constant, equal to DFLTPHYS * 2. Use MAXPHYS
instead. It is NULL change for GENERIC kernel, but allows 'fast' mode to
work on systems with increased MAXPHYS.
2009-09-04 19:20:46 +00:00
Pawel Jakub Dawidek
e93f5e4d25 Simplify g_disk_ident_adjust() function and allow any printable character
in serial number.

Discussed with:	trasz
Obtained from:	Wheel Sp. z o.o. (http://www.wheel.pl)
2009-09-04 09:39:06 +00:00
Pawel Jakub Dawidek
07a93e6b3c There's no need for checking result of M_WAITOK allocation. 2009-08-27 08:40:51 +00:00
Pawel Jakub Dawidek
c16ce31b31 Fix an obvious topology lock leak.
MFC after:	3 days
2009-08-27 08:28:34 +00:00
Marcel Moolenaar
8530137252 The start of the EFI GPT partition in the PMBR can always be represented
by CHS addressing. Don't define these fields as 0xff, but rather define
them correctly. This prevents boot problems on PCs where GPT is being
used.

PR:		115406
Submitted by:	Kent Hauser <kent@khauser.net>
Approved by:	re (kib)
2009-08-17 16:16:46 +00:00
Ulf Lilleengen
b79cac0f92 - Fix the issue with read access count modification on RAID-5 plexes properly.
If the access counts were not increased and decreased in equal numbers by
  gvinum consumers, the read access count would be inconsistent with the write
  access count. Instead, modify the read access count with the write access
  count directly to prevent any inconsistencies.

Approved by:	re (kib)
2009-07-18 11:12:48 +00:00
Marcel Moolenaar
f43b57e32a Revert revisions 188839 and 188868. Use of the ioctl in geom_dev.c
is invalid because the ioctl happens without prior open. The ioctl
got introduced to provide backward compatibility for extended
partitions, but it ended up not being used because it didn't work
as expected. Since there are no consumers of the ioctl and the
implementation is broken, the best fix is to remove the code
entirely.

Spotted by:	phk
Approved by:	re (kensmith)
2009-07-08 05:56:14 +00:00
Edward Tomasz Napierala
8edfe76ab5 Fix a panic which (reportedly) can happen when unmounting a filesystem
with I/O requests in flight on kernels compiled with "options INVARIANTS".
Also, make it obvious it's not right to call g_valid_obj() (and macros
using it, e.g. G_VALID_CONSUMER()) without topology lock held.

Approved by:	re (kib)
Reported by:	pho
2009-07-01 20:16:29 +00:00
Edward Tomasz Napierala
fb231f3627 Make gjournal work with kernel compiled with "options DIAGNOSTIC".
Previously, it would panic immediately.

Reviewed by:	pjd
Approved by:	re (kib)
2009-06-30 14:34:06 +00:00
Ulf Lilleengen
ac2a008e69 - Apply the same naming rules of LVM names as done in the LVM code itself.
PR:		kern/135874
2009-06-24 22:09:30 +00:00
John Hay
65a4957806 Do not stop the loop when an empty or deleted directory entry is found.
Rather just skip over it.
2009-06-24 06:42:13 +00:00
Ivan Voras
63f4d880e0 Fix tabs, slightly improve comments.
Approved by:	gnn (mentor) (original)
Noticed by:	stas
2009-06-18 11:12:11 +00:00
Ivan Voras
452f657cb9 Add support for labels derived from GPT metadata.
Approved by:	gnn (mentor)
Reviewed by:	pjd
PR:		128398
Submitted by:	Marius Nuennerich < marius at nuenneri.ch >
2009-06-13 00:27:03 +00:00
Luigi Rizzo
6231f75bcf As discussed in the devsummit, introduce two fields in the
struct bio to store classification information, and a hook
for classifier functions that can be called by g_io_request().

This code is from Fabio Checconi as part of his GSOC work.
2009-06-11 09:55:26 +00:00
Pawel Jakub Dawidek
cb9b72ce4a Simplify. 2009-06-05 23:35:43 +00:00
Doug Barton
8b3bfb0509 Crank the debug level necessary to display the "Label foo is removed"
and "Label for provider ..." messages up from 0 to 1.
2009-05-30 22:31:52 +00:00
Jamie Gritton
76ca6f88da Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex.  Jails may
have their own host information, or they may inherit it from the
parent/system.  The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL.  The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by:	bz (mentor)
2009-05-29 21:27:12 +00:00
Ulf Lilleengen
4147dd02cd - Unbreak 64 bit platforms by casting off_t to intmax. 2009-05-26 14:15:06 +00:00
Ulf Lilleengen
6d66da20b7 - Fix wrong print on BIO_DONE.
- Use db_printf instead of printf. While here, apply this to other ddb commands
  as well.

Pointed out by:		pjd
2009-05-26 10:03:44 +00:00
Ulf Lilleengen
bf7d2c1797 - Add 'show bio' DDB command.
MFC after:	3 weeks
2009-05-26 07:29:17 +00:00
Edward Tomasz Napierala
916cd41c47 Check return value of gctl_get_asciiparam().
Found with:	Coverity Prevent(tm)
CID:		1118
2009-05-12 16:59:50 +00:00
Attilio Rao
dfd233edd5 Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS.  Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled.  Bump __FreeBSD_version in order to signal such
situation.
2009-05-11 15:33:26 +00:00
Ulf Lilleengen
d8d015cddc - Split up the BIO queue into a queue for new and one for completed requests.
This is necessary for two reasons:
  1) In order to avoid collisions with the use of a BIOs flags set by a consumer
     or a provider
  2) Because GV_BIO_DONE was used to mark a BIO as done, not enough flags was
     available, so the consumer flags of a BIO had to be misused in order to
     support enough flags. The new queue makes it possible to recycle the
     GV_BIO_DONE flag into GV_BIO_GROW.
  As a consequence, gvinum will now work with any other GEOM class under it or
  on top of it.

- Use bio_pflags for storing internal flags on downgoing BIOs, as the requests
  appear to come from a consumer of a gvinum volume. Use bio_cflags only for
  cloned BIOs.
- Move gv_post_bio to be used internally for maintenance requests.
- Remove some cases where flags where set without need.

PR:		kern/133604
2009-05-06 19:34:32 +00:00
Ulf Lilleengen
41944888fe - Fix a case where a RAID5 volume would think that it is supposed to grow a new
subdisk after a parity rebuild.
2009-05-06 19:18:19 +00:00
Ulf Lilleengen
11c4adc49e - Check if any plexes are doing internal maintenance before removing them. 2009-05-06 19:06:28 +00:00
Ulf Lilleengen
5a0fa8531c - Add forgotten KASSERT. 2009-05-06 18:37:32 +00:00
Ulf Lilleengen
1d8dfc60f4 - Fix a bug where the bio_data field of the wrong BIO is freed if an error
occurs when doing a RAID5 request.
2009-05-06 18:27:28 +00:00
Ulf Lilleengen
451b95f489 - GV_BIO_RETRY is not used, and it is actually impossible with more than 8
values for bio_cflags/bio_pflags.
2009-05-06 18:24:56 +00:00
Ulf Lilleengen
040272465d - Split the queue mutex into one for the event queue and one for the BIO queue,
as they do not really relate and to prepare for an additional queue to be
  covered by the BIO queue mutex.
- Implement wrappers for fetching the next element from the event queue as well
  as for putting a new element into the BIO queue.
2009-05-06 18:21:48 +00:00
Ulf Lilleengen
ad75dd77e0 - Make the gvinum softc invisible to userland, as it is not needed. 2009-05-04 17:30:20 +00:00
Ulf Lilleengen
697ab8be86 - Remove assertion of topology lock remaining from 7.x gvinum. It is not needed,
as the renaming only changes internal gvinum names and will not alter the geom
  topology.
- The topology lock was not held when calling g_wither_geom after renaming.
2009-04-18 16:36:27 +00:00
Marcel Moolenaar
cce94b6583 Precision '*' expects an int and strlen() returns a size_t.
Compensate.
2009-04-16 05:52:47 +00:00
Marcel Moolenaar
6ad9a99f21 Add a compat option to the EBR scheme that controls the
naming of the partitions (GEOM_PART_EBR_COMPAT).  When
compatibility is enabled, changes to the partitioning are
disallowed.

Remove the device name aliasing added previously to provide
backward compatibility, but which in practice doesn't give
us anything.

Enable compatibility on amd64 and i386.
2009-04-15 22:38:22 +00:00
Ulf Lilleengen
1de45ea74d - Move out allocation part of different gvinum objects into its own routine and
make use of it in the gvinum userland code.
2009-04-10 08:50:14 +00:00
Andrew Thompson
853a10a581 Revert r190676,190677
The geom and CAM changes for root_hold are the wrong solution for USB design
quirks.

Requested by:	scottl
2009-04-10 04:08:34 +00:00
Marcel Moolenaar
94fe30d0ca Don't use hexadecimal in the EBR partition names, because 'a'..'f'
are more commonly known as BSD partition names.

Discussed with: ivoras@
2009-04-08 16:18:16 +00:00
Andrew Thompson
31da42bab2 Add interleaving root hold tokens from the CAM probe to disk_create and geom
provider tasting. This is needed for disk attachments that happen after threads
are running in the boot process.

Tested by:	rnoland
2009-04-03 19:49:33 +00:00
Andrew Thompson
626fc9fe3d Add a how argument to root_mount_hold() so it can be passed NOWAIT and be called
in situations where sleeping isnt allowed.
2009-04-03 19:46:12 +00:00
Marcel Moolenaar
daca55549f The 9 bytes immediately prior to the partition table can contain
signatures or disk serial numbers. Don't assume those to be zero
in all cases. This fixes a false negative.

Tested by: avatar@mmlab.cse.yzu.edu.tw
2009-04-03 05:54:49 +00:00
Marcel Moolenaar
c146965cd1 Sharpen the saw:
o  PC98 uses 32-bit block numbers. Limit the scheme to 2^32-1
   blocks when the media is larger. The 32-bit block numbers
   are implicit (16-bit cylinder * 8-bit head * 8-bit sector).
2009-03-30 01:03:58 +00:00
Marcel Moolenaar
6154e492ec Sharpen the saw:
o  MBR uses 32-bit block numbers. Limit the scheme to 2^32-1
   blocks when the media is larger.
2009-03-30 00:53:46 +00:00
Marcel Moolenaar
f5f875ed84 Sharpen the saw:
o  EBR uses 32-bit block numbers. Limit the scheme to 2^32-1
   blocks when the media is larger.
o  Calculate the number of entries based on the rounded media
   size, rather than the raw media size.
2009-03-30 00:48:42 +00:00
Marcel Moolenaar
2a1c00ff2f Sharpen the saw:
o  Don't create a GPT scheme underneath another scheme when
   the probe doesn't allow it.
2009-03-30 00:33:43 +00:00
Ulf Lilleengen
bd9337ce80 - Add files that should have been added in r190507. 2009-03-28 21:06:59 +00:00
Ulf Lilleengen
c0b9797aa8 Import the gvinum work that have been done during and after Summer of Code 2007.
The work have been under testing and fixing since then, and it is mature enough
to be put into HEAD for further testing.

A lot have changed in this time, and here are the most important:
- Gvinum now uses one single workerthread instead of one thread for each
  volume and each plex. The reason for this is that the previous scheme was
  very complex, and was the cause of many of the bugs discovered in gvinum.
  Instead, gvinum now uses one worker thread with an event queue, quite
  similar to what used in gmirror.
- The rebuild/grow/initialize/parity check routines no longer runs in
  separate threads, but are run as regular I/O requests with special flags.
  This made it easier to support mounted growing and parity rebuild.
- Support for growing striped and raid5-plexes, meaning that one can extend the
  volumes for these plex types in addition to the concat type. Also works while
  the volume is mounted.
- Implementation of many of the missing commands from the old vinum:
  attach/detach, start (was partially implemented), stop (was partially
  implemented), concat, mirror, stripe, raid5 (shortcuts for creating volumes
  with one plex of these organizations).
- The parity check and rebuild no longer goes between userland/kernel, meaning
  that the gvinum command will not stay and wait forever for the rebuild to
  finish. You can instead watch the status with the list command.
- Many problems with gvinum have been reported since 5.x, and some has been hard
  to fix due to the complicated architecture. Hopefully, it should be more
  stable and better handle edge cases that previously made gvinum crash.
- Failed drives no longer disappears entirely, but now leave behind a dummy
  drive that makes sure the original state is not forgotten in case the system
  is rebooted between drive failures/swaps.
- Update manpage to reflect new commands and extend it with some examples.

Sponsored by:   Google Summer of Code 2007
Mentored by:    le
Tested by:      Rick C. Petty <rick-freebsd2008 -at- kiwi-computer.com>
2009-03-28 17:20:08 +00:00
Marcel Moolenaar
ee94c7ef01 Sharpen the saw:
o  BSD uses 32-bit block numbers. Limit the scheme to 2^32-1
   blocks when the media is larger.
2009-03-27 05:48:42 +00:00
Marcel Moolenaar
d01a198be7 Sharpen the saw:
o  Don't create an APM scheme underneath another scheme when
   the probe doesn't allow it.
o  APM uses 32-bit block numbers. Limit the scheme to 2^32-1
   blocks when the media is larger.
2009-03-27 05:35:12 +00:00
Marcel Moolenaar
f3548c023e Change the priority from high to normal. This makes sure that
the BSD or GPT schemes can take precedence as appropriate.
2009-03-26 16:42:24 +00:00
Ivan Voras
f7b16839ba Create GEOM labels from UFS IDs, e.g. /dev/ufsid/49c97b1faa2adc43. UFS IDs
are always present and can be used to identify file systems (useful if
hardware devices move often).

Actually-by:	pjd
Approved by:	gnn (mentor)
2009-03-25 20:38:57 +00:00
Ivan Voras
15c48b9a20 Be more explicit and complain if kernel dumps are perfomed on unsupported
partition types. This is to help users used to the old behaviour.

Reviewed by:	marcel
Approved by:	gnn (mentor)
2009-03-22 00:29:48 +00:00
Ivan Voras
94dd506d54 Make GEOM provider names starting with "/dev/" acceptable as well as their
"raw" names. While there, change the formatting of extended MSDOS partitions
so that the dot (".") is not used to separate two numbers (which kind of
looks like the whole is a decimal number). Use "+" instead, which also
hints that the second part of the name is the offset from the start of
the partition in the first part of the name. Also change the offset from
decimal to hexadecimal notation, simply for aesthetic reasons and future
compatibility.

GEOM_PART is the default in 8-CURRENT but not yet in 7-STABLE so this
changeset can be MFC-ed without causing major problems from the second
part.

Reviewed by:	marcel
Approved by:	gnn (mentor)
MFC after:	2 weeks
2009-03-19 14:23:17 +00:00
Pawel Jakub Dawidek
c5d387d010 Detach GELI providers on shutdown/reboot, which will allow providers underneath
to close properly.

Reported, reviewed and tested by:	guido
MFC after:	1 week
2009-03-16 19:31:08 +00:00
Guido van Rooij
921eec2694 Backout this commit whil a better solution is developed 2009-03-13 08:13:51 +00:00
Yoshihiro Takahashi
8753b93d95 Move the PC98_[MS]ID_* defines from g_part_pc98.c to diskpc98.h.
Reviewed by:	marcel
2009-03-11 13:15:42 +00:00
Sam Leffler
664c0b48bf o disallow write to RedBoot and FIS directory partitions; these are painful
to resurrect (maybe honor foot shooting bit in kern.geom_debugflags)
o fix match macro so we now recognize we want to merge FIS dir with RedBoot
  config parameters even if we don't actually do it
2009-03-11 01:12:52 +00:00
Guido van Rooij
c5f79858ff When attaching a geli on boot make sure that it is detached
upon last close. (needed for a gmirror to properly shutdown
upon reboot when a geli is on top the gmirror)
2009-03-10 15:23:43 +00:00
Yoshihiro Takahashi
9541ada9a0 Restore the return statement. It was accidentally removed by rev 188429. 2009-03-10 11:14:03 +00:00
Sam Leffler
443f1e7991 add geom_redboot, a geom module that exports RedBoot FIS partitions as named
slices in dev/redboot/*
2009-03-09 23:18:36 +00:00
Marcel Moolenaar
232c8bf888 o When creating the EBR scheme, set the number of entries
properly. Otherwise the minimum of 1 is used and you can
   only insert a single partition/slice and only at sector
   0 (index 1).
o  When adding a partition/slice, recalculate the index after
   the start and size of the partition/slice are adjusted to
   make them a multiple of the track size. Since the precheck
   method sets the index based on the start of the partition
   as provided by the user, we know that we're off by at most
   1 and adjusting the index is safe.
2009-02-21 19:25:13 +00:00
Marcel Moolenaar
4dedfc44e7 Add bootcode handling. 2009-02-21 07:01:21 +00:00
Marcel Moolenaar
59c532c500 Provide compatibility symlink for logical partitions:
1.  Extend geom_dev by having it create the symlink (i.e. call
    make_dev_alias) based on the DIOCGPROVIDERALIAS ioctl.
    In this way the functionaility is generic and thus usable
    by any geom/provider.
2.  Have g_part handle said ioctl through the devalias method,
    so that it's under control of the scheme itself. By design
    the alias will not be created for newly added partitions.
2009-02-20 04:48:40 +00:00
Marcel Moolenaar
507a0d4a6c Fix an infinite loop created when the last logical partition is
removed.
2009-02-20 04:10:31 +00:00
Marcel Moolenaar
09a278e14d Add a default implementation for pre-check. It should
always succeed if not implemented.

Pointy hat: marcel
2009-02-17 18:24:58 +00:00
Marcel Moolenaar
832cdc2ca7 Remove gpt_offset and related code. It was introduced for use
by the BSD scheme, ended up not to be needed. Remove to avoid
abuse and to keep the bloat to a minimum.
2009-02-17 04:12:10 +00:00
Marcel Moolenaar
e24c8a3fb8 Add support to add, delete and modify logical partitions, as well
as to create and destroy the extended partitioning scheme. In
other words: full support.
2009-02-16 03:54:28 +00:00
Marcel Moolenaar
7ca4fa83ec Add method precheck to the g_part interface. The precheck
method allows schemes to reject the ctl request, pre-check
the parameters and/or modify/set parameters. There are 2
use cases that triggered the addition:
1.  When implementing a R/O scheme, deletes will still
    happen to the in-memory representation. The scheme is
    not involved in that operation. The pre-check method
    can be used to fail the delete up-front. Without this
    the write to disk will typically fail, but at that
    time the delete already happened.
2.  The EBR scheme uses a linked list to record slices.
    There's no index. The EBR scheme defines the index
    as a function of the start LBA of the partition. The
    add verb picks an index for the range and then invokes
    the add method of the scheme to fill in the blanks. It
    is too late for the add method to change the index.
    The pre-check is used to set the index up-front. This
    also (silently) overrides/nullifies any (pointless)
    user-specified index value.
2009-02-15 22:18:16 +00:00
Ulf Lilleengen
af2c6a1332 - Use the correct argument when determining the buffer size.
PR:		kern/131575
MFC after:	2 days
2009-02-11 18:13:20 +00:00
Warner Losh
51f53a08e0 Fix g_part_dumpconf and g_part_name prototpyes.
Submitted by:	marcel@
2009-02-10 02:43:07 +00:00
Marcel Moolenaar
5d68db5bc8 Add the EBR scheme. The EBR scheme supports the Extended Boot Records
found inside extended partitions and used to create logical partitions.
At this time write/modify support is not (yet) present.
The EBR and MBR schemes both check the parent scheme. The MBR will
back-off when nested under another MBR, whereas the EBR only nests
under a MBR.
2009-02-08 23:51:44 +00:00
Marcel Moolenaar
665557a4ec Allow gpe_offset to be set by the scheme. When gpe_offset is zero,
or invalid, initialize it to the start of the partition. Adjust
the mediasize when the offset lies somewhere inside the partition.
2009-02-08 23:39:30 +00:00
Marcel Moolenaar
165651a553 o Add the "PART::scheme" attribute that returns the name of the
underlying partitioning scheme.
o  Put the start and end of the partition in the XML configuration.
   The start and end are the LBAs of the first and last sector
   (resp.) of the partition. They are currently identical to the
   offset and size attributes, which describe the partition as an
   offset and size in bytes, but may not in the future. The start
   and end will be used for the logical partition boundaries and
   may include metadata. The offset and size will always represent
   the useful storage space within the partition. Typically these
   two notions are the same, but for logical partitions in an
   extended partition, the EBR is more naturally treated as being
   part of the partition.
2009-02-08 20:15:08 +00:00
Warner Losh
f4fddf53c7 Fix g_part_*dumpconf to return void to match kobj definition.
Fix g_part_*name to return a const char * rather than a char *.
2009-02-08 07:05:23 +00:00
Marcel Moolenaar
da3ee90988 In g_handleattr(), set bp->bio_completed also for the case
where len is 0. Otherwise g_getattr() will never succeed
when it is handled by g_handleattr_str().
2009-02-03 07:07:13 +00:00
Marcel Moolenaar
709a626613 Constify val in g_handleattr() and str in g_handleattr_str().
This allows passing string constants to g_handleattr_str().
2009-02-01 01:50:09 +00:00
Ed Schouten
739b705c7e Remove unused unrhdr from GEOM character device module.
Now that make_dev() doesn't require unit numbers to be unique, there is
no need to use an unrhdr here to generate the numbers. Remove the entire
init-routine, because it is optional.
2009-01-24 18:23:19 +00:00
Edward Tomasz Napierala
38153e80f7 Prevent a panic that happens on SMP machines when removing a disk with
many writes queued up.

Reviewed by:	phk, scottl
Approved by:	rwatson (mentor)
Sponsored by:	FreeBSD Foundation
2009-01-11 13:51:04 +00:00
Marius Strobl
c825397790 - Don't enforce an upper-bound to the number of sectors or heads,
allowing the full 16-bit width of the corresponding fields in the
  VTOC8 label to be used. The removed limits basically only held
  true for providers labeled using the synthetic geometry provided
  by cam_calc_geometry(9) but neither SCSI disks labeled with Solaris
  nor sufficiently large ATA disks.
- Given that providers (originally) labeled with Solaris typically
  use the native geometry as reported by the target while FreeBSD
  typically uses a synthetic one put the message complaining about
  mismatching geometries between what the label indicates and what
  GEOM thinks the provider has, which we generally can't help,
  under bootverbose in order to not unnecessarily scare users.
- For informational purposes add the non-matching values to the
  message complaining about them, similar to what r186501 did for
  g_part_bsd_read() except also indicating the origin of the
  values.
- Make it clear that the messages emitted by this code refer to
  the VTOC8 support rather than to another existing scheme or to
  VTOC32.
2009-01-06 14:10:10 +00:00
Marcel Moolenaar
3d556594df Don't enforce an upper-bound to the number of sectors or heads
that that the provider has. The limits we imposed were PC BIOS
specific and not always applicable.
2009-01-06 06:47:53 +00:00
Marcel Moolenaar
90886ebcc2 Improve probing.
o   Don't check the dummy fields.
o   The entry is unused if either dp_mid is 0 or dp_sid is 0.
o   The start or end cylinder cannot be 0.
o   The start CHS cannot be equal to the end CHS.

Submitted by:	nyan
2009-01-04 07:32:06 +00:00
Ulf Lilleengen
2155338ac1 - Fix an issue with access permissions to underlying disks used by a gvinum
plex. If the plex is a raid5 plex, and is being written to, parity data might
  have to be read from the underlying disks, requiring them to be opened for
  reading as well as writing.

MFC after:	1 week
2008-12-27 14:32:39 +00:00
David E. O'Brien
62a353c0bd When the geometry does not match the label, print out the values. 2008-12-26 20:27:32 +00:00
Edward Tomasz Napierala
ce8be7b8b0 Implement g_vfs_orphan(). Without it, the filesystem never closes
the device, which means refcount on periph drivers never drops,
which means cam_sim_free() never returns, which results in umass
sleeping there ad infinitum.

Submitted by:	pjd
Reviewed by:	scottl, pjd
Approved by:	rwatson (mentor)
Sponsored by:	FreeBSD Foundation
2008-12-16 17:04:52 +00:00
Ulf Lilleengen
fa13e9bb0b - Add missing word in comment. 2008-12-08 17:09:02 +00:00
Edward Tomasz Napierala
d27a975f72 Make it possible to use gjournal for the root filesystem. Previously,
an unclean shutdown would make it impossible to mount rootfs at boot.

PR:		kern/128529
Reviewed by:	pjd
Approved by:	rwatson (mentor)
Sponsored by:	FreeBSD Foundation
2008-12-06 11:33:10 +00:00
Ivan Voras
499c86ebd5 Trivial patch to show on which geom has the error been detected.
Submitted by:	Rick C. Petty
Approved by:	gnn (mentor)
MFC after:	1 month
2008-12-01 15:02:00 +00:00
Marcel Moolenaar
6647711279 Allow boot code to be smaller than what the scheme expects.
This effectively changes the boot code size to be an upper
bound and makes the interface more flexible.
2008-12-01 00:07:17 +00:00
Marcel Moolenaar
95fc269897 Allow dumpon to a partition of type FS_UNUSED as well. 2008-11-26 05:18:27 +00:00
Ulf Lilleengen
251048a1ab - Fix a potential NULL pointer reference. Note that this should not happen in
practice, but it is a good programming practice and allows the kernel to not
  depend on userland correctness.
- While there, make sizeof usage match the rest of the code.

Found with:	Coverity Prevent(tm)
CID:		660, 662
2008-11-25 20:28:33 +00:00
Ulf Lilleengen
7e11b694f4 - Fix a potential NULL pointer reference. Note that this cannot happen in
practice, but it is a good programming practice nontheless and it allows the
  kernel to not depend on userland correctness.

Found with:   Coverity Prevent(tm)
CID:          655-659, 664-667
2008-11-25 19:13:58 +00:00
Marcel Moolenaar
1a696559de Partition type FS_UNUSED does not mean the partition entry
is unused. Unused partition entries have a partition size
of zero. Therefore, partitions can have type FS_UNUSED.

MFC after:	3 days
2008-11-18 05:55:58 +00:00
Marcel Moolenaar
dd0db05da2 Fix a panic caused by a corrupted table when the header is
still valid. We were checking the state of the header and
not the table.

PR:		119868
Based on a patch from:	Jaakko Heinonen <jh@saunalahti.fi>
MFC after: 1 week
2008-11-06 16:51:33 +00:00
Attilio Rao
83b3bdbc8a Improve VFS locking:
- Implement real draining for vfs consumers by not relying on the
  mnt_lock and using instead a refcount in order to keep track of lock
  requesters.
- Due to the change above, remove the mnt_lock lockmgr because it is now
  useless.
- Due to the change above, vfs_busy() is no more linked to a lockmgr.
  Change so its KPI by removing the interlock argument and defining 2 new
  flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the
  old version (which was unlinked from the lockmgr alredy) and
  MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx
  once the mnt interlock is held (ability still desired by most consumers).
- The stub used into vfs_mount_destroy(), that allows to override the
  mnt_ref if running for more than 3 seconds, make it totally useless.
  Remove it as it was thought to work into older versions.
  If a problem of "refcount held never going away" should appear, we will
  need to fix properly instead than trust on such hackish solution.
- Fix a bug where returning (with an error) from dounmount() was still
  leaving the MNTK_MWAIT flag on even if it the waiters were actually
  woken up. Just a place in vfs_mount_destroy() is left because it is
  going to recycle the structure in any case, so it doesn't matter.
- Remove the markercnt refcount as it is useless.

This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and
__FreeBSD_version will be modified accordingly.

Discussed with:	kib
Tested by:	pho
2008-11-02 10:15:42 +00:00
Warner Losh
0952268ecd Add support for reading Tivo Series 1 partitioning. This likely needs
a little refinement, but is good enough to commit as is.

# Should look to see if I should move swab(3) into the kernel or just
# provide the unoptimized routine here.

Reviewed by:	marcel@
2008-11-02 03:02:56 +00:00
Konstantin Belousov
f5dfdb519f Revert r184136. Instead, push the check for crashdumpmap overflow into the
MD i386 and amd64 dump code.

Requested by:	jhb
Retested by:	pho
MFC after:	3 days (+ 176304 + 184136)
2008-10-31 10:11:35 +00:00
Ulf Lilleengen
86b3c6f5bc - Import macros used in gmirror for printing gvinum debug messages and making
the output more standardized.
- Add a sysctl to set the verbosity of the debug messages.
- While there, fixup typos and wording in the messages.
2008-10-26 17:20:37 +00:00
Marcel Moolenaar
38a2db2eb0 Invalid BSD disklabels have been created by sysinstall and
are possibly still being created. The d_secperunit field
contains the number of sectors of the disk and not of the
slice/partition to which the disklabel applies.
Rather than reject the disklabel, we now silently adjust
the field. Existing code, like bslabel(8), does not seem
to check the label that extensively and seems to adjust
fields as a side-effect as well.
In other words, it's not that important apparently, so
gpart should not be too strict about it.

Reported by: nyan@
Reported by: Andriy Gapon <avg@icyb.net.ua>
2008-10-25 17:21:46 +00:00
Marcel Moolenaar
66fedfca7b Allow dumps to partitions with a tag of 0. The legacy
sunlabel implementation in FreeBSD does not use VTOC
information and as such as no partition types.
2008-10-22 02:08:54 +00:00
Konstantin Belousov
7a882637fe Do not overflow crashdumpmap.
Reported and tested by:	pho
Reviewed by:	jhb
MFC after:	1 week
2008-10-21 18:52:38 +00:00
Marcel Moolenaar
e1d7111a53 The active and bootable flags are not part of the type.
Export the active and bootable flags as attributes in
the configuration XML and allow them to be manipulated
with the set/unset commands.

Since libdisk treats the flags as part of the partition
type, preserve behavior by keeping them included in the
configuration text.
2008-10-20 04:50:47 +00:00
Attilio Rao
0d7935fd01 Remove the struct thread unuseful argument from bufobj interface.
In particular following functions KPI results modified:
- bufobj_invalbuf()
- bufsync()

and BO_SYNC() "virtual method" of the buffer objects set.
Main consumers of bufobj functions are affected by this change too and,
in particular, functions which changed their KPI are:
- vinvalbuf()
- g_vfs_close()

Due to the KPI breakage, __FreeBSD_version will be bumped in a later
commit.

As a side note, please consider just temporary the 'curthread' argument
passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP

Reviewed by:	kib
Tested by:	Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
2008-10-10 21:23:50 +00:00
Ulf Lilleengen
64d62b289e - Use the new gv_write_header function to write out the header when removing a
drive to make sure that the header is in the correct format.
2008-10-02 10:01:05 +00:00
Ulf Lilleengen
d4b28d5b0f - Remove unneeded macro since the config_length field in the header was changed
to 64 bit in the new format.
2008-10-02 09:35:47 +00:00
Ulf Lilleengen
46ceb66ad3 - Make gvinum header on-disk structure consistent on all platforms by storing
the gvinum header in fields of fixed size and in a big endian byte order
  rather than the size and byte order of the actual platform.

Note that the change is backwards compatible with the old gvinum configuration
format, but will save the configuration in the new format when the 'saveconfig'
command is executed.

Submitted by:	Rick C. Petty <rick-freebsd -at- kiwi-computer.com>
2008-10-01 14:50:36 +00:00
Marcel Moolenaar
a87faebb8c Return G_PART_PROBE_PRI_HIGH instead of G_PART_PROBE_PRI_NORM
if the probe succeeds. This guarantees that the BSD scheme
wins over the MBR scheme when MBR gets to probe first. Build-
or link-time conditions can cause schemes to end up in the
linker set in a different order. Normally BSD is before MBR
in the linker set and as such get to probe first. But typically
when the kernel gets rebuild or relinked, this can change.
2008-09-29 02:48:22 +00:00
Marcel Moolenaar
1e2fbcfa44 Insert the null scheme at the head. This does not change any
functionality, but creates an invariant: the first element
on the list is always the null scheme.
2008-09-29 02:39:02 +00:00
Marcel Moolenaar
4bf0267894 Export the partition name in the conftxt and confxml output.
The conftxt output is used by libdisk, and the confxml
output is used by gpart itself (gpart show -l).

Submitted by:	nyan@
2008-09-27 19:58:11 +00:00
Marcel Moolenaar
a455de093b Hold the root mount while we're tasting. It is possible
that a nested partition (typically the BSD disklabel)
is not done tasting while the root file system is being
mounted. While this is rare, it's still possible.
2008-09-27 19:29:52 +00:00
Marcel Moolenaar
404cfb5e20 Allow 255 sectors/track for the BSD disklabel. The previous limit
of 63 sectors/track is too PC BIOS specific. On pc98, where the
BSD disklabel is used as well, 255 sectors/track is not uncommon.

Submitted by:	nyan@
2008-09-27 15:28:15 +00:00
Ed Schouten
d3ce832719 Remove unit2minor() use from kernel code.
When I changed kern_conf.c three months ago I made device unit numbers
equal to (unneeded) device minor numbers. We used to require
bitshifting, because there were eight bits in the middle that were
reserved for a device major number. Not very long after I turned
dev2unit(), minor(), unit2minor() and minor2unit() into macro's.
The unit2minor() and minor2unit() macro's were no-ops.

We'd better not remove these four macro's from the kernel, because there
is a lot of (external) code that may still depend on them. For now it's
harmless to remove all invocations of unit2minor() and minor2unit().

Reviewed by:	kib
2008-09-26 14:19:52 +00:00
Sean Bruno
c4901b6798 Just a fixup for a KTRACE message I stumbled upon many moons ago.
Reviewed by:	Scott Long
MFC after:	2 days
2008-09-18 15:02:19 +00:00
Ulf Lilleengen
f805f204b6 - Add a new ioctl for getting the provider name of a geom provider.
- Add a routine for looking up a device and checking if it is a valid geom
  provider given a partial or full path to its device node.

Reviewed by:	phk
Approved by:	pjd (mentor)
2008-09-07 13:54:57 +00:00
Rui Paulo
89ea496520 Fix build. 2008-09-05 18:11:18 +00:00
Rui Paulo
87662ab391 Keep entries sorted. 2008-09-05 18:09:49 +00:00
Rui Paulo
35fa2f1bcd Include the vendor in the partition name. 2008-09-05 16:54:07 +00:00
Rui Paulo
d7255ff42e Detect Apple HFS GPT slices. 2008-09-05 12:49:14 +00:00
Attilio Rao
59d4932531 Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions.
Manpages are updated accordingly.

Tested by:	Diego Sardina <siarodx at gmail dot com>
2008-08-31 14:26:08 +00:00
Bjoern A. Zeeb
603724d3ab Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from:	//depot/projects/vimage-commit2/...
Reviewed by:	brooks, des, ed, mav, julian,
		jamie, kris, rwatson, zec, ...
		(various people I forgot, different versions)
		md5 (with a bit of help)
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
X-MFC after:	never
V_Commit_Message_Reviewed_By:	more people than the patch
2008-08-17 23:27:27 +00:00
Pawel Jakub Dawidek
ed6c3e478f Style(9). 2008-08-12 20:19:08 +00:00
Dag-Erling Smørgrav
2616144e43 Add sbuf_new_auto as a shortcut for the very common case of creating a
completely dynamic sbuf.

Obtained from:	Varnish
MFC after:	2 weeks
2008-08-09 11:14:05 +00:00
Peter Wemm
fbbc785240 Trivial commit to attempt to diagnose a svn problem. Add
comment that Tivo disks are APM, but do not have a DDR record.
2008-07-22 18:05:50 +00:00
Pawel Jakub Dawidek
5527ecd9a5 Clear passphrase buffer after use.
Submitted by:	Fabian Keil <fk@fabiankeil.de> (a bit different version)
2008-07-20 19:56:13 +00:00
Ulf Lilleengen
14e96b45e8 - When renaming a drive, also set the drive name in the gvinum header.
PR:		kern/125632
Approved by:	pjd (mentor)
MFC after:	3 days
2008-07-19 13:53:11 +00:00
Ulf Lilleengen
56af4c6141 - Fix a logic error when updating plex configuration.
Approved by:	pjd (mentor)
2008-07-11 16:46:29 +00:00
Robert Watson
4f7d1876d5 Introduce a new lock, hostname_mtx, and use it to synchronize access
to global hostname and domainname variables.  Where necessary, copy
to or from a stack-local buffer before performing copyin() or
copyout().  A few uses, such as in cd9660 and daemon_saver, remain
under-synchronized and will require further updates.

Correct a bug in which a failed copyin() of domainname would leave
domainname potentially corrupted.

MFC after:	3 weeks
2008-07-05 13:10:10 +00:00
Xin LI
6c97c325ff Avoid NULL deference.
Reviewed by:	ivoras
2008-06-30 15:21:42 +00:00
Ulf Lilleengen
7e7a4e1d18 - Fix spelling errors.
Approved by:    kib (mentor)
PR:             kern/124788
Submitted by:   Hywel Mallett <Hywel -at- hmallett.co.uk>
2008-06-20 19:48:18 +00:00