to problems when the geli device is used with file system or as a swap.
Hopefully will prevent problems like kern/98742 in the future.
MFC after: 1 week
arrangement that has no intrinsic internal knowledge of whether devices
it is given are truly multipath devices. As such, this is a simplistic
approach, but still a useful one.
The basic approach is to (at present- this will change soon) use camcontrol
to find likely identical devices and and label the trailing sector of the
first one. This label contains both a full UUID and a name. The name is
what is presented in /dev/multipath, but the UUID is used as a true
distinguishor at g_taste time, thus making sure we don't have chaos
on a shared SAN where everyone names their data multipath as "Fred".
The first of N identical devices (and N *may* be 1!) becomes the active
path until a BIO request is failed with EIO or ENXIO. When this occurs,
the active disk is ripped away and the next in a list is picked to
(retry and) continue with.
During g_taste events new disks that meet the match criteria for existing
multipath geoms get added to the tail end of the list.
Thus, this active/passive setup actually does work for devices which
go away and come back, as do (now) mpt(4) and isp(4) SAN based disks.
There is still a lot to do to improve this- like about 5 of the 12
recommendations I've received about it, but it's been functional enough
for a while that it deserves a broader test base.
Reviewed by: pjd
Sponsored by: IronPort Systems
MFC: 2 months
flash card reader.
Also remove an 'Opened da0 -> <random number>' which is not needed on a daily
basis (available through bootverbose).
Reviewed by: phk, ken
MFC after: 1 week
partitioning class that supports multiple schemes. Current
schemes supported are APM (Apple Partition Map) and GPT.
Change all GEOM_APPLE anf GEOM_GPT options into GEOM_PART_APM
and GEOM_PART_GPT (resp).
The ctlreq interface supports verbs to create and destroy
partitioning schemes on a disk; to add, delete and modify
partitions; and to commit or undo changes made.
We can't bind to a CPU which is not yet on-line, so add code that wait for
CPUs to go on-line before binding to them.
Reported by: Alin-Adrian Anton <aanton@spintech.ro>
MFC after: 2 weeks
file are after snaplock, while other ffs device buffers are before
snaplock in global lock order. By itself, this could cause deadlock
when bdwrite() tries to flush dirty buffers on snapshotted ffs. If,
during the flush, COW activity for snapshot needs to allocate block
and ffs_alloccg() selects the cylinder group that is being written
by bdwrite(), then kernel would panic due to recursive buffer lock
acquision.
Avoid dealing with buffers in bdwrite() that are from other side of
snaplock divisor in the lock order then the buffer being written. Add
new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in
the bdwrite(). Default implementation, bufbdflush(), refactors the code
from bdwrite(). For ffs device buffers, specialized implementation is
used.
Reviewed by: tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes)
Tested by: Peter Holm
X-MFC after: 3 weeks (if ever: it changes ABI)
gmirror and graid3 in a way that it is not resynchronized after a
power failure or system crash.
It is safe when gjournal is running on top of gmirror/graid3.
we won't be able to exit from the thread.
Function g_eli_cpu_is_disabled() stoled from kern_pmc.c.
PR: 104669
Reported by: Nikolay Mirin <nik@optim.com.ru>
MFC after: 1 week
- Do not modify mnt_flag without mount interlock held.
- Do not touch MNT_ASYNC flag, as this can lead to a race with nmount(2).
Pointed out by: tegge
Reviewed by: tegge
journaling and can be tought about marking file system as clean before
doing journal switch, which easly allows to add journaling to file
systems that don't have this feature.
Sponsored by: home.pl
read requests to its consumer. It has been developed to address
the problem of a horrible read performance of a 64k blocksize FS
residing on a RAID3 array with 8 data components, where a single
disk component would only get 8k read requests, thus effectively
killing disk performance under high load. Documentation will be
provided later. I'd like to thank Vsevolod Lobko for his bright
ideas, and Pawel Jakub Dawidek for helping me fix the nasty bug.
request can still have bio_to set to sc_provider (this is READ part of a
synchronization request) and in this case g_{mirror,raid3}_sync() wasn't
called as it should be.
MFC after: 1 week
This way GEOM classes can safely detach from provider when an orphan
event is received. This fixes 'detach with active requests' panic for
gstripe/gconcat under load.
PR: kern/102766
Submitted by: mjacob
OK'ed by: phk
MFC after: 1 week
add count of active and total components to the launched line so you can
see at a glance if your mirror/raid3 is complete...
now:
GEOM_MIRROR: Device mirror/sam launched (2/2).
Reviewed by: pjd
- hold/release device in start/done routines, this will probably slow
down things a bit, but previous code was racy;
- only release device if g_gate_destroy() failed - if it succeeded device
is dead and there is nothing to release;
- various other changes which makes forcible destruction reliable.
MFC after: 3 days
created on Windows XP (and others maybe) were not detected.
We detected only those created with newfs_msdos(8).
Submitted by: Tobias Reifenberger <treif@mayn.de>
style(9)ified by: pjd
This way one will be able to use provider encrypted on eg. i386 on
eg. sparc64. This doesn't really buy us much today, because UFS isn't
endian agnostic.
We retain backward compatibility by setting G_ELI_FLAG_NATIVE_BYTE_ORDER
flag on devices with version number less than 2 and not converting the
offset.
o PMBR partitions count to the number of partitions on the disk, which
means that if a PMBR entry is invalid we will not treat the MBR as a
PMBR by virtue of it not describing any partitions.
Previously the checks were inconsistent in that an invalid PMBR entry
would be harmless when no other partitions exist (we would treat the
MBR as a PMBR by virtue of it being empty), but it would be fatal when
there is at least one other partition.
o The partition size of a PMBR partition is one less than the media size
because the GPT starts at the second sector (LBA 1) and extends to
the end of the media. For backward bug-compatibility we accept a size
that's exactly the media size (FreeBSD bug).
Also, when the partition size can not be represented in a 32-bit
integral, the partition size in the MBR is to be set to 0xFFFFFFFF.
Accept this as a valid size, even if the size can be represented.
we obtained access. It is possible that GPT gets to taste a disk
first, which means the disk has not been opened before and it will
not get opened until after we checked the mediasize and sectorsize.
However, since the mediasize and sectorsize are determined at open
and that happens when access is optained, checking the mediasize
and sectorsize before obtaining access may result in GPT rejecting
the disk.
uma(9) will be used for memory allocation.
In case of problems or tracking bugs, there are more useful tools for malloc(9)
debugging than for uma(9) debugging, like memguard(9) and redzone(9).
MFC after: 1 week
MBR should have only one entry of type 0xEE, consider protective MBR
to be one, that has at least one entry of type 0xEE covering the whole
unit. This makes GEOM_GPT compatible with disks partitioned by the
Apple's BootCamp.
Approved in principle by: marcel
MFC After: 1 month
offset or request size which is not a multiple of the sector size, make
sure that the bio is set to indicate that no data has actually been
transferred.
The result of this is that the file offset is no longer incremented for
these requests. The fact that the file offset was incremented broke
fdisk(8)'s probing of sector size for non-512 byte sector sizes.
Reviewed by: phk, cperciva
Submitted by: mdodd
MFC after: 2 weeks
By using a pointer to struct dos_partition, we implicitly tell the
compiler that the pointer is 4-bytes aligned, even though we know
that's not the case. The fact that we only dereference the pointer
to access a byte-wide field (field dp_ptyp) is not a guarantee that
the compiler will in fact use a byte-wide load. On some platforms
it's more efficient to use long word or quad word loads and use
bit-shifting and bit-masking to get the intended byte. On those
platforms an misaligned load will be the result.
The fix is to use byte-wide pointer arithmetic based on sizeof() and
offsetof() to avoid invalid casts which avoids that the compiler
makes invalid assumptions.
Backtrace provided by: wilko@
MFC after: 1 week
two places where g_io_request() is called. g_io_request() can free bio
structure so we can't reference it after and G_RAID3_FOREACH_BIO() macro
was doing this.
Found by: Coverity Prevent analysis tool (with my new models)
MFC after: 1 day
- Prevent possible live-lock in case of memory problems by freeing
already completed requests first.
Reported and tested by: markus, Bradley W. Dutton <brad-fbsd-stable@duttonbros.com>
MFC after: 1 day
- Comment possible event miss, which isn't critical, but probably can be
fixed by replacing the event lock usage with the queue lock.
MFC after: 2 weeks
stored in metadata instead of an offset in single disk.
After reboot/crash synchronization process started from a wrong offset
skipping (not synchronizing) part of the component which can lead to data
corrutpion (when synchronization process was interrupted on initial
synchronization) or other strange situations like 'graid3 status' showing
value more than 100%.
Reported, reviewed and tested by: ru
Reported by: Dmitry Morozovsky <marck@rinet.ru>
MFC after: 1 day
which means that devices will be destroyed on last close.
This fixes destruction order problems when, eg. RAID3 array is build on
top of RAID1 arrays.
Requested, reviewed and tested by: ru
MFC after: 2 weeks
o Implement the remove verb to remove a partition entry.
o Improve error reporting by first checking that the verb is valid.
o Add an entry parameter to the add verb. this parameter can be
both read-only as welll as read-write and specifies the entry
number of the newly added partition.
o Make sure that the provider is alive when passed to us. It may
be withering away.
o When adding a new partition entry, test for overlaps with existing
partitions.
particular provider. Use this function where g_orphan_provider()
is being called so that the flags are updated correctly and
g_orphan_provider() is called only when allowed.
error on the request. Add a wrapper, gctl_set_param_err(), that
sets the error on the request from the error returned by
gctl_set_param() and update current callers of gctl_set_param()
to call gctl_set_param_err() instead.
This makes gctl_set_param() much more usable in situations where
the caller knows better what to do with certain (apparent) error
conditions and setting an error on the request is not one of the
things that need to be done.
case panic on sparc64.
The problem is in MD5(9) implementation. The Encode() function takes
'unsigned char *output' as its first argument, which is then assigned to
'u_int32_t *op'. If the 'output' argument is not 4 byte aligned (and in
geli(8) case it is not), sparc64 machine will panic.
I don't know how to fix MD5(9) in a clean way, so I'm implementing a
work-around in geli(8).
Reported by: brueffer
MFC after: 3 days