the g_journal level needs to check whether it is holding a newer
copy of the block than that which exists on the disk. If so, it
needs to return its copy. If not, it should pass the request down
to the disk to fulfill. It currently considers six queues:
0) delayed queue,
1) unsent (current queue),
2) in-flight to the journal (flush queue),
3) active journal (active queue),
4) inactive journal (inactive queue), and
5) inflight to the disk (copy queue).
Checking on two of these queues is unnecessary:
0) The delayed requests should not be used for reads because they
have not yet been entered into the journal, so their value should
reflect the disk contents, not the future contents that are not
yet committed.
2) Because all the bio's in the flush queue are also found on the
active queue, there is no need to inspect the flush queue for
reads since they will be found when searching the active queue.
Submitted by: Dr. Andreas Longwitz <longwitz@incore.de>
Discussed with: kib
MFC after: 1 week
geom_bsd, geom_mbr and geom_sunlabel have been obsolete since Marcel
Moolenaar's geom_part was in FreeBSD 7. They haven't been in GENERIC
since FreeBSD 8. Add warning when used.
geom_vol_ffs has been obsolete since ufs support to geom_label was
committed in FreeBSD 5. It hasn't been in GENERIC since FreeBSD 5.
Add warning when used.
geom_fox has been obsolete since gmultipath was committed in FreeBSD 7.
(no warning added, since this is a very obscure class).
These will all be removed in FreeBSD 12.
MFC After: 3 days
Differential Revision: https://reviews.freebsd.org/D11935
Note: Classes will be removed after MFC
No need to set any fields in the cloned device. devfs uses symlinks,
so the adev entries returned won't be presented to the drivers. Since
we don't save copies, nothing else will see them. This code came from
the old compat code, and it appears to be obsolete or never needed.
Submitted by: kib@
Differential Review: https://reviews.freebsd.org/D11919
Implement disk_add_alias to allow aliases to be added to disks. All
disk have a primary name (say "foo") can also have secondary names
(say "bar") such that all instances of "foo" also have a "bar"
alias. So if you have foo0, foo0p1, foo1, foo1s1 and foo1s1a nodes
created by the foo driver and gpart, device nodes bar0, bar0p1, bar1,
bar1s1 and bar1s1a will appear as symlinks back to the original nodes.
This generalizes to multiple aliases. However, since the unit number
follows the primary name, multiple device drivers can't create the
same aliases unless those drives coorinate the unit number space (eg
you couldn't add an alias 'disk' to both 'da' and 'ada' because it's
possible to have da0 and ada0, because 'disk0' is ambiguous).
Differential Revision: https://reviews.freebsd.org/D11873
When we're creating new providers for each of the partitions, add
aliases to the geom before we create the provider so when geom_dev
tastes the provider, the aliases are in place so the proper /dev
entries are created. So foo5p6 gets created as an alias for bar5p6
when foo is an alias for bar in the geom we're partitioning with
g_part. This also copies aliases from the container geom (eg disk) to
the label geom (the disk with GPT partitioning) so that aliases nest
properly.
Differential Revision: https://reviews.freebsd.org/D11873
Add an alias name list to geoms. Use them in geom_dev to create
aliases. Previously, geom_dev would create an device node for the name
of the geom. Now, additional nodes are created pointing back to the
primary node with make_dev_alias_p. Aliases must be in place on the
geom before any tasting occurs.
Differential Revision: https://reviews.freebsd.org/D11873
in the flush_queue:
1 2 3 4 5 6 7 8 9 10
and another 10 bio's go into the flush queue after only the first five
bio's are removed from the flush queue, the queue should look like:
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20,
but because of the bug we end up with
6 11 12 13 14 15 16 17 18 19 20 7 8 9 10.
So the sequence of the bio's is damaged in the flush queue (and
therefore in the journal on disk !). This error can be triggered by
ffs_snapshot() when a block is read with readblock() and gjournal finds
this block in the broken flush queue before it goes to the correct
active queue.
The fix is to place all new blocks at the end of the queue.
Submitted by: Dr. Andreas Longwitz <longwitz@incore.de>
Discussed with: kib
MFC after: 1 week
system having over 4GB RAM. That's due to:
1) the limit being u_int instead of u_long like vm.kmem_size (the limit is
half of vm.kmem_size by default for amd64);
2) sysctl handler g_journal_cache_limit_sysctl() using u_int instead of u_long.
The fix is to replace u_int with u_long for the kern.geom.journal.cache.limit
sysctl variable.
PR: 198500
Submitted by: Dr. Andreas Longwitz <longwitz@incore.de>
Reported by: Eugene Grosbein
Discussed with: kib
MFC after: 1 week
RPI1-B, Alix and APU2 boards as well as NanoBSD with the following message:
vnode_pager_generic_getpages_done: I/O read error 5
Seems the breakage was because it was missed to include acr in glabel update.
Reported by: Peter Blok <pblok@bsd4all.org>,
madpilot, imp and trasz.
Reviewed by: trasz
Tested by: Peter Blok and madpilot.
MFC after: 3 days.
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D11365
Add -o [no]verify option to mdconfig (and document in man page.)
Implement GEOM attribute MNT::verified to ask md if the backing vnode is
verified.
Check for MNT::verified in cd9660 mount to flag the mount as MNT_VERIFIED if
the underlying device has been verified.
Reviewed by: rwatson
Approved by: sjg (mentor)
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D2902
During gmirror startup, if component mirrors are found to be dirty as is
typical after a system crash, the mirrors are synchronized to the mirror
with highest priority. However if a gmirror starts without all of its
mirrors present, for example because of some transient delays during
tasting, the remaining mirrors must be synchronized before they may become
active.
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
Before this change it was impossible to set number of PKCS#5v2 iterations,
required to set passphrase, if it has two keys and never had any passphrase.
Due to present metadata format limitations there are still cases when number
of iterations can not be changed, but now it works in cases when it can.
PR: 218512
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D10338
At this point we have not rendezvous'ed with the mirror worker thread, and
I/O may still be in flight. Various I/O completion paths expect to be able
to obtain a reference to the mirror softc from the GEOM, so setting it to
NULL may result in various NULL pointer dereferences if the mirror is
stopped with -f or the kernel is shut down while a mirror is
synchronizing. The worker thread will clear the softc pointer before
exiting.
Tested by: pho
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
We are otherwise susceptible to a race with a concurrent teardown of the
mirror provider, causing the I/O to be left uncompleted after the mirror
started withering.
Tested by: pho
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
Regular I/O requests may be blocked by concurrent synchronization requests
targeted to the same LBAs, in which case they are moved to a holding queue
until the conflicting I/O completes. We therefore want to stop
synchronization before completing pending I/O in g_mirror_destroy_provider()
since this ensures that blocked I/O requests are completed as well.
Tested by: pho
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
Entries may be removed and freed if an I/O error occurs during mirror
synchronization, so we cannot assume that all entries of ds_bios are
valid.
Also ensure that a synchronization BIO's array index is preserved after
a successful write.
Reported and tested by: pho
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
This patch adds a general mechanism for providing encryption keys to the
kernel from the boot loader. This is intended to enable GELI support at
boot time, providing a better mechanism for passing keys to the kernel
than environment variables. It is designed to be extensible to other
applications, and can easily handle multiple encrypted volumes with
different keys.
This mechanism is currently used by the pending GELI EFI work.
Additionally, this mechanism can potentially be used to interface with
GRUB, opening up options for coreboot+GRUB configurations with completely
encrypted disks.
Another benefit over the existing system is that it does not require
re-deriving the user key from the password at each boot stage.
Most of this patch was written by Eric McCorkle. It was extended by
Allan Jude with a number of minor enhancements and extending the keybuf
feature into boot2.
GELI user keys are now derived once, in boot2, then passed to the loader,
which reuses the key, then passes it to the kernel, where the GELI module
destroys the keybuf after decrypting the volumes.
Submitted by: Eric McCorkle <eric@metricspace.net> (Original Version)
Reviewed by: oshogbo (earlier version), cem (earlier version)
MFC after: 3 weeks
Relnotes: yes
Sponsored by: ScaleEngine Inc.
Differential Revision: https://reviews.freebsd.org/D9575
In GELI, anywhere we are zeroing out possibly sensitive data, like
the metadata struct, the metadata sector (both contain the encrypted
master key), the user key, or the master key, use explicit_bzero.
Didn't touch the bzero() used to initialize structs.
Reviewed by: delphij, oshogbo
Sponsored by: ScaleEngine Inc.
Differential Revision: https://reviews.freebsd.org/D9809
A request may be queued while the queue lock is dropped when the mirror is
being destroyed. The corresponding wakeup would be lost, possibly resulting
in an apparent hang of the mirror worker thread.
Tested by: pho (part of a larger patch)
MFC after: 1 week
Sponsored by: Dell EMC Isilon
The worker thread will destroy the mirror provider as part of its teardown
sequence. The call made sense in the initial revision of gmirror, but
became unnecessary in r137248.
Tested by: pho (part of a larger diff)
MFC afteR: 2 weeks
Sponsored by: Dell EMC Isilon
- Don't execute any of g_mirror_shutdown_post_sync() when panicking. We
cannot safely idle the mirror or stop synchronization in that state, and
the current attempts to do so complicate debugging of gmirror itself.
- Check for a non-NULL panicstr instead of using SCHEDULER_STOPPED(). The
latter was added for use in the locking primitives.
Reviewed by: mav, pjd
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
gpart(8) has functionality to change the label of an GPT partition.
This functionality works like it should, however, after a label change
the /dev/gpt/ entries remain unchanged. glabel(8) status output remains
unchanged. The change only takes effect after a reboot.
PR: 162690
Submitted by: sub.mesa@gmail, Ben RUBSON <ben.rubson@gmail.com>, ae
Reviewed by: allanjude, bapt, bcr
MFC after: 6 weeks.
Differential Revision: https://reviews.freebsd.org/D9935
with geom_flashmap(4) and teach it about MMC for slicing enhanced
user data area partitions. The FDT slicer still is the default for
CFI, NAND and SPI flash on FDT-enabled platforms.
- In addition to a device_t, also pass the name of the GEOM provider
in question to the slicers as a single device may provide more than
provider.
- Build a geom_flashmap.ko.
- Use MODULE_VERSION() so other modules can depend on geom_flashmap(4).
- Remove redundant/superfluous GEOM routines that either do nothing
or provide/just call default GEOM (slice) functionality.
- Trim/adjust includes
Submitted by: jhibbits (RouterBoard bits)
Reviewed by: jhibbits
The PBKDF2 in sys/geom/eli/pkcs5v2.c is around half the speed it could be
GELI's PBKDF2 uses a simple benchmark to determine a number of iterations
that will takes approximately 2 seconds. The security provided is actually
half what is expected, because an attacker could use the optimized
algorithm to brute force the key in half the expected time.
With this change, all newly generated GELI keys will be approximately 2x
as strong. Previously generated keys will talk half as long to calculate,
resulting in faster mounting of encrypted volumes. Users may choose to
rekey, to generate a new key with the larger default number of iterations
using the geli(8) setkey command.
Security of existing data is not compromised, as ~1 second per brute force
attempt is still a very high threshold.
PR: 202365
Original Research: https://jbp.io/2015/08/11/pbkdf2-performance-matters/
Submitted by: Joe Pixton <jpixton@gmail.com> (Original Version), jmg (Later Version)
Reviewed by: ed, pjd, delphij
Approved by: secteam, pjd (maintainer)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D8236
Don't start switcher kproc until the first GEOM is created.
Reviewed by: pjd
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D8576
Some g_raid tasters attempt metadata reads in multiples of the provider
sectorsize. Reads larger than MAXPHYS are invalid, so detect and abort
in such situations.
Spiritually similar to r217305 / PR 147851.
PR: 214721
Sponsored by: Dell EMC Isilon
With clang 4.0.0, I'm getting the following warnings:
sys/geom/vinum/geom_vinum_state.c:186:7: error: logical not is only
applied to the left hand side of this bitwise operator
[-Werror,-Wlogical-not-parentheses]
if (!flags & GV_SETSTATE_FORCE)
^ ~
The logical not operator should obiously be called after masking.
Reviewed by: mav, pfg
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D9093
Changes include modifications in kernel crash dump routines, dumpon(8) and
savecore(8). A new tool called decryptcore(8) was added.
A new DIOCSKERNELDUMP I/O control was added to send a kernel crash dump
configuration in the diocskerneldump_arg structure to the kernel.
The old DIOCSKERNELDUMP I/O control was renamed to DIOCSKERNELDUMP_FREEBSD11 for
backward ABI compatibility.
dumpon(8) generates an one-time random symmetric key and encrypts it using
an RSA public key in capability mode. Currently only AES-256-CBC is supported
but EKCD was designed to implement support for other algorithms in the future.
The public key is chosen using the -k flag. The dumpon rc(8) script can do this
automatically during startup using the dumppubkey rc.conf(5) variable. Once the
keys are calculated dumpon sends them to the kernel via DIOCSKERNELDUMP I/O
control.
When the kernel receives the DIOCSKERNELDUMP I/O control it generates a random
IV and sets up the key schedule for the specified algorithm. Each time the
kernel tries to write a crash dump to the dump device, the IV is replaced by
a SHA-256 hash of the previous value. This is intended to make a possible
differential cryptanalysis harder since it is possible to write multiple crash
dumps without reboot by repeating the following commands:
# sysctl debug.kdb.enter=1
db> call doadump(0)
db> continue
# savecore
A kernel dump key consists of an algorithm identifier, an IV and an encrypted
symmetric key. The kernel dump key size is included in a kernel dump header.
The size is an unsigned 32-bit integer and it is aligned to a block size.
The header structure has 512 bytes to match the block size so it was required to
make a panic string 4 bytes shorter to add a new field to the header structure.
If the kernel dump key size in the header is nonzero it is assumed that the
kernel dump key is placed after the first header on the dump device and the core
dump is encrypted.
Separate functions were implemented to write the kernel dump header and the
kernel dump key as they need to be unencrypted. The dump_write function encrypts
data if the kernel was compiled with the EKCD option. Encrypted kernel textdumps
are not supported due to the way they are constructed which makes it impossible
to use the CBC mode for encryption. It should be also noted that textdumps don't
contain sensitive data by design as a user decides what information should be
dumped.
savecore(8) writes the kernel dump key to a key.# file if its size in the header
is nonzero. # is the number of the current core dump.
decryptcore(8) decrypts the core dump using a private RSA key and the kernel
dump key. This is performed by a child process in capability mode.
If the decryption was not successful the parent process removes a partially
decrypted core dump.
Description on how to encrypt crash dumps was added to the decryptcore(8),
dumpon(8), rc.conf(5) and savecore(8) manual pages.
EKCD was tested on amd64 using bhyve and i386, mipsel and sparc64 using QEMU.
The feature still has to be tested on arm and arm64 as it wasn't possible to run
FreeBSD due to the problems with QEMU emulation and lack of hardware.
Designed by: def, pjd
Reviewed by: cem, oshogbo, pjd
Partial review: delphij, emaste, jhb, kib
Approved by: pjd (mentor)
Differential Revision: https://reviews.freebsd.org/D4712
It is quite specific mode of operation without storing on-disk metadata.
It can be useful in some cases in combination with some external control
tools handling mirror creation and disks hot-plug.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.