Commit Graph

2416 Commits

Author SHA1 Message Date
Xin LI
fcf69f3dbc Consistently use gctl_get_provider instead of home-grown variants.
Reviewed by:		cem, imp
MFC after:		2 weeks
Differential revision:	https://reviews.freebsd.org/D25739
2020-07-22 02:15:21 +00:00
Xin LI
0ab851aac3 gctl_get_class, gctl_get_geom and gctl_get_provider: provide feedback
when the requested argument is missing.

Reviewed by:		cem
MFC after:		2 weeks
Differential revision:	https://reviews.freebsd.org/D25738
2020-07-22 02:14:27 +00:00
Alan Somers
aafaa8b794 Fix geli's null cipher, and add a test case
PR:		247954
Submitted by:	jhb (sys), asomers (tests)
Reviewed by:	jhb (tests), asomers (sys)
MFC after:	2 weeks
Sponsored by:	Axcient
2020-07-21 19:18:29 +00:00
Xin LI
82b17c8e91 Fix indent for if clause.
MFC after:	2 weeks
2020-07-20 01:55:19 +00:00
Xin LI
b23a7fbaab g_concat_find_device: trim /dev/ if it is present, like other GEOM
classes.

Reviewed by:	cem
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D25596
2020-07-09 08:00:46 +00:00
Xin LI
8510f61acd sys/geom: consistently use _PATH_DEV instead of hardcoding "/dev/".
Reviewed by:	cem
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D25565
2020-07-09 02:52:39 +00:00
Alan Somers
6f818c1fb0 geli: enable direct dispatch
geli does all of its crypto operations in a separate thread pool, so
g_eli_start, g_eli_read_done, and g_eli_write_done don't actually do very
much work. Enabling direct dispatch eliminates the g_up/g_down bottlenecks,
doubling IOPs on my system. This change does not affect the thread pool.

Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	Axcient
Differential Revision:	https://reviews.freebsd.org/D25587
2020-07-08 17:12:12 +00:00
Conrad Meyer
64612d4e44 geom(4): Kill GEOM_PART_EBR_COMPAT option
Take advantage of Warner's nice new real GEOM aliasing system and use it for
aliased partition names that actually work.

Our canonical EBR partition name is the weird, not-default-on-x86-prior-to-
this-revision "da1p4+00001234."  However, if compatibility mode (tunable
kern.geom.part.ebr.compat_aliases) is enabled (1, default), we continue to
provide the alias names like "da1p5" in addition to the weird canonical
names.

Naming partition providers was just one aspect of the COMPAT knob; in
addition it limited mutability, in part because it did not preserve existing
EBR header content aside from that of LBA 0.  This change saves the EBR
header for LBA 0, as well as for every EBR partition encountered.  That way,
when we write out the EBR partition table on modification, we can restore
any bootloader or other metadata in both LBA0 (the first data-containing EBR
may start after 0) as well as every logical EBR we read from the disk, and
only update the geometry metadata and linked list pointers that describe the
actual partitioning.

(This change does not add support for the 'bootcode' verb to EBR.)

PR:		232463
Reported by:	Manish Jain <bourne.identity AT hotmail.com>
Discussed with:	ae (no objection)
Relnotes:	maybe
Differential Revision:	https://reviews.freebsd.org/D24939
2020-07-01 02:16:36 +00:00
John Baldwin
6572e5ff66 Use explicit_bzero() instead of bzero() for sensitive data.
Reviewed by:	delphij
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25441
2020-06-25 20:25:35 +00:00
John Baldwin
b172f23dd7 Use zfree() instead of bzero() and free().
These bzero's should have been explicit_bzero's.

Reviewed by:	cem, delphij
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25437
2020-06-25 20:20:22 +00:00
John Baldwin
4a711b8d04 Use zfree() instead of explicit_bzero() and free().
In addition to reducing lines of code, this also ensures that the full
allocation is always zeroed avoiding possible bugs with incorrect
lengths passed to explicit_bzero().

Suggested by:	cem
Reviewed by:	cem, delphij
Approved by:	csprng (cem)
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25435
2020-06-25 20:17:34 +00:00
Kirk McKusick
9407f25df2 Optimize g_journal's superblock update by noting that the summary
information is neither read nor written so it need not be written
out when updating the superblock.

PR:           247425
Sponsored by: Netflix
2020-06-23 21:44:00 +00:00
Baptiste Daroussin
5b990a9463 Revert r362466
Such change should not have happen without prior discussion and review.

With hat:	transitioning core
2020-06-22 07:46:24 +00:00
Hans Petter Selasky
7747001b12 Improve wording to be more precise and clear.
No functional change intended.

s/Master Boot/Main Boot/ (also called MBR)

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-06-21 13:34:08 +00:00
Kirk McKusick
34816cb9ae Move the pointers stored in the superblock into a separate
fs_summary_info structure. This change was originally done
by the CheriBSD project as they need larger pointers that
do not fit in the existing superblock.

This cleanup of the superblock eases the task of the commit
that immediately follows this one.

Suggested by: brooks
Reviewed by:  kib
PR:           246983
Sponsored by: Netflix
2020-06-19 01:02:53 +00:00
John Baldwin
a3d565a118 Add a crypto capability flag for accelerated software drivers.
Use this in GELI to print out a different message when accelerated
software such as AESNI is used vs plain software crypto.

While here, simplify the logic in GELI a bit for determing which type
of crypto driver was chosen the first time by examining the
capabilities of the matched driver after a single call to
crypto_newsession rather than making separate calls with different
flags.

Reviewed by:	delphij
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25126
2020-06-09 22:26:07 +00:00
Conrad Meyer
a9ca503b52 Revert r361838
Reported by:	delphij
2020-06-06 14:19:16 +00:00
Conrad Meyer
5b9b571cb3 geom_label: Use provider aliasing to alias upstream geoms
For synthetic aliases (just pseudonyms inferred from metadata like GPT or
UFS labels, GPT UUIDs, etc), use the GEOM provider aliasing system to create
a symlink to the real device instead of creating an independent device.
This makes it more clear which labels and devices correspond, and we can
safely have multiple labels to a single device accessed at once.

The confusingly named geom_label on-disk construct continues to behave
identically to how it did before.

This requires teaching GEOM's provider aliasing about the possibility
that aliases might be added later in time, and GEOM's devfs interaction
layer not to worry about existing aliases during retaste.

Discussed with:	imp
Relnotes:	sure, if we don't end up reverting it
Differential Revision:	https://reviews.freebsd.org/D24968
2020-06-05 16:12:21 +00:00
Conrad Meyer
c726a670df geom: Don't re-add duplicate aliases
Reviewed by:	imp (informal +1; extracted from phab 24968)
2020-06-05 16:05:09 +00:00
Conrad Meyer
b71dc87559 geom_part: Dispatch to partitions to create providers and aliases
This allows partitions to create additional aliases of their own.  The
default method implementations preserve the existing behavior.

No functional change.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D24938
2020-05-29 19:44:18 +00:00
Alan Somers
2a2306099d geli: fix a livelock during panic
During any kind of shutdown, kern_reboot calls geli's pre_sync event hook,
which tries to destroy all unused geli devices. But during a panic, geli
can't destroy any devices, because the scheduler is stopped, so it can't
switch threads. A livelock results, and the system never dumps core.

This commit fixes the problem by refusing to destroy any devices during
panic, used or otherwise.

PR:		246207
Reviewed by:	jhb
MFC after:	2 weeks
Sponsored by:	Axcient
Differential Revision:	https://reviews.freebsd.org/D24697
2020-05-27 19:13:26 +00:00
Chuck Silvers
d79ff54b5c This commit enables a UFS filesystem to do a forcible unmount when
the underlying media fails or becomes inaccessible. For example
when a USB flash memory card hosting a UFS filesystem is unplugged.

The strategy for handling disk I/O errors when soft updates are
enabled is to stop writing to the disk of the affected file system
but continue to accept I/O requests and report that all future
writes by the file system to that disk actually succeed. Then
initiate an asynchronous forced unmount of the affected file system.

There are two cases for disk I/O errors:

   - ENXIO, which means that this disk is gone and the lower layers
     of the storage stack already guarantee that no future I/O to
     this disk will succeed.

   - EIO (or most other errors), which means that this particular
     I/O request has failed but subsequent I/O requests to this
     disk might still succeed.

For ENXIO, we can just clear the error and continue, because we
know that the file system cannot affect the on-disk state after we
see this error. For EIO or other errors, we arrange for the geom_vfs
layer to reject all future I/O requests with ENXIO just like is
done when the geom_vfs is orphaned. In both cases, the file system
code can just clear the error and proceed with the forcible unmount.

This new treatment of I/O errors is needed for writes of any buffer
that is involved in a dependency. Most dependencies are described
by a structure attached to the buffer's b_dep field. But some are
created and processed as a result of the completion of the dependencies
attached to the buffer.

Clearing of some dependencies require a read. For example if there
is a dependency that requires an inode to be written, the disk block
containing that inode must be read, the updated inode copied into
place in that buffer, and the buffer then written back to disk.

Often the needed buffer is already in memory and can be used. But
if it needs to be read from the disk, the read will fail, so we
fabricate a buffer full of zeroes and pretend that the read succeeded.
This zero'ed buffer can be updated and written back to disk.

The only case where a buffer full of zeros causes the code to do
the wrong thing is when reading an inode buffer containing an inode
that still has an inode dependency in memory that will reinitialize
the effective link count (i_effnlink) based on the actual link count
(i_nlink) that we read. To handle this case we now store the i_nlink
value that we wrote in the inode dependency so that it can be
restored into the zero'ed buffer thus keeping the tracking of the
inode link count consistent.

Because applications depend on knowing when an attempt to write
their data to stable storage has failed, the fsync(2) and msync(2)
system calls need to return errors if data fails to be written to
stable storage. So these operations return ENXIO for every call
made on files in a file system where we have otherwise been ignoring
I/O errors.

Coauthered by: mckusick
Reviewed by:   kib
Tested by:     Peter Holm
Approved by:   mckusick (mentor)
Sponsored by:  Netflix
Differential Revision:  https://reviews.freebsd.org/D24088
2020-05-25 23:47:31 +00:00
John Baldwin
9c0e3d3a53 Add support for optional separate output buffers to in-kernel crypto.
Some crypto consumers such as GELI and KTLS for file-backed sendfile
need to store their output in a separate buffer from the input.
Currently these consumers copy the contents of the input buffer into
the output buffer and queue an in-place crypto operation on the output
buffer.  Using a separate output buffer avoids this copy.

- Create a new 'struct crypto_buffer' describing a crypto buffer
  containing a type and type-specific fields.  crp_ilen is gone,
  instead buffers that use a flat kernel buffer have a cb_buf_len
  field for their length.  The length of other buffer types is
  inferred from the backing store (e.g. uio_resid for a uio).
  Requests now have two such structures: crp_buf for the input buffer,
  and crp_obuf for the output buffer.

- Consumers now use helper functions (crypto_use_*,
  e.g. crypto_use_mbuf()) to configure the input buffer.  If an output
  buffer is not configured, the request still modifies the input
  buffer in-place.  A consumer uses a second set of helper functions
  (crypto_use_output_*) to configure an output buffer.

- Consumers must request support for separate output buffers when
  creating a crypto session via the CSP_F_SEPARATE_OUTPUT flag and are
  only permitted to queue a request with a separate output buffer on
  sessions with this flag set.  Existing drivers already reject
  sessions with unknown flags, so this permits drivers to be modified
  to support this extension without requiring all drivers to change.

- Several data-related functions now have matching versions that
  operate on an explicit buffer (e.g. crypto_apply_buf,
  crypto_contiguous_subsegment_buf, bus_dma_load_crp_buf).

- Most of the existing data-related functions operate on the input
  buffer.  However crypto_copyback always writes to the output buffer
  if a request uses a separate output buffer.

- For the regions in input/output buffers, the following conventions
  are followed:
  - AAD and IV are always present in input only and their
    fields are offsets into the input buffer.
  - payload is always present in both buffers.  If a request uses a
    separate output buffer, it must set a new crp_payload_start_output
    field to the offset of the payload in the output buffer.
  - digest is in the input buffer for verify operations, and in the
    output buffer for compute operations.  crp_digest_start is relative
    to the appropriate buffer.

- Add a crypto buffer cursor abstraction.  This is a more general form
  of some bits in the cryptosoft driver that tried to always use uio's.
  However, compared to the original code, this avoids rewalking the uio
  iovec array for requests with multiple vectors.  It also avoids
  allocate an iovec array for mbufs and populating it by instead walking
  the mbuf chain directly.

- Update the cryptosoft(4) driver to support separate output buffers
  making use of the cursor abstraction.

Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D24545
2020-05-25 22:12:04 +00:00
Warner Losh
ae1cce524e Reimplement aliases in geom
The alias needs to be part of the provider instead of the geom to work
properly. To bind the DEV geom, we need to look at the provider's names and
aliases and create the dev entries from there. If this lives in the GEOM, then
it won't propigate down the tree properly. Remove it from geom, add it provider.

Update geli, gmountver, gnop, gpart, and guzip to use it, which handles the bulk
of the uses in FreeBSD. I think this is all the providers that create a new name
based on their parent's name.
2020-05-13 19:17:28 +00:00
Conrad Meyer
844b743d31 geom(4) mirror: Do not panic on gmirror(8) insert, resize
Geom_mirror initialization occurs in spurts and the present of a
non-destroyed g_mirror softc does not always indicate that the geom has
launched (i.e., has an sc_provider).

Some gmirror(8) commands (via g_mirror_ctl) depend on a g_mirror's
sc_provider (insert and resize).  For those commands, g_mirror_ctl is
modified to sleep-poll in an interruptible way until the target geom is
either launched or destroyed.

Reviewed by:	markj
Tested by:	markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D24780
2020-05-11 22:39:53 +00:00
Pawel Jakub Dawidek
cefbc0d19b Add g_topology_locked() macro that returns true if we already hold the GEOM
topology lock.
2020-04-25 21:41:09 +00:00
John Baldwin
bfe26b9707 Mark eli_metadata_crypto_supported inline.
This quiets warnings about it not being always used.

Reported by:	kevans
2020-04-15 18:27:28 +00:00
John Baldwin
e2b9919398 Remove support for geli(4) algorithms deprecated in r348206.
This removes support for reading and writing volumes using the
following algorithms:

- Triple DES
- Blowfish
- MD5 HMAC integrity

In addition, this commit adds an explicit whitelist of supported
algorithms to give a better error message when an invalid or
unsupported algorithm is used by an existing volume.

Reviewed by:	cem
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D24343
2020-04-15 00:14:50 +00:00
Warner Losh
9cf738228d Now that we don't have special-case geom hacking defined in md_var.h, stop
including it. sparc64 was the last straggler here, but these weren't removed at
the time.
2020-04-07 22:23:22 +00:00
Mark Johnston
c205ac921b geom_journal: Only stop the switcher process if one was started.
PR:		243196
MFC after:	1 week
2020-04-03 13:57:41 +00:00
John Baldwin
c034143269 Refactor driver and consumer interfaces for OCF (in-kernel crypto).
- The linked list of cryptoini structures used in session
  initialization is replaced with a new flat structure: struct
  crypto_session_params.  This session includes a new mode to define
  how the other fields should be interpreted.  Available modes
  include:

  - COMPRESS (for compression/decompression)
  - CIPHER (for simply encryption/decryption)
  - DIGEST (computing and verifying digests)
  - AEAD (combined auth and encryption such as AES-GCM and AES-CCM)
  - ETA (combined auth and encryption using encrypt-then-authenticate)

  Additional modes could be added in the future (e.g. if we wanted to
  support TLS MtE for AES-CBC in the kernel we could add a new mode
  for that.  TLS modes might also affect how AAD is interpreted, etc.)

  The flat structure also includes the key lengths and algorithms as
  before.  However, code doesn't have to walk the linked list and
  switch on the algorithm to determine which key is the auth key vs
  encryption key.  The 'csp_auth_*' fields are always used for auth
  keys and settings and 'csp_cipher_*' for cipher.  (Compression
  algorithms are stored in csp_cipher_alg.)

- Drivers no longer register a list of supported algorithms.  This
  doesn't quite work when you factor in modes (e.g. a driver might
  support both AES-CBC and SHA2-256-HMAC separately but not combined
  for ETA).  Instead, a new 'crypto_probesession' method has been
  added to the kobj interface for symmteric crypto drivers.  This
  method returns a negative value on success (similar to how
  device_probe works) and the crypto framework uses this value to pick
  the "best" driver.  There are three constants for hardware
  (e.g. ccr), accelerated software (e.g. aesni), and plain software
  (cryptosoft) that give preference in that order.  One effect of this
  is that if you request only hardware when creating a new session,
  you will no longer get a session using accelerated software.
  Another effect is that the default setting to disallow software
  crypto via /dev/crypto now disables accelerated software.

  Once a driver is chosen, 'crypto_newsession' is invoked as before.

- Crypto operations are now solely described by the flat 'cryptop'
  structure.  The linked list of descriptors has been removed.

  A separate enum has been added to describe the type of data buffer
  in use instead of using CRYPTO_F_* flags to make it easier to add
  more types in the future if needed (e.g. wired userspace buffers for
  zero-copy).  It will also make it easier to re-introduce separate
  input and output buffers (in-kernel TLS would benefit from this).

  Try to make the flags related to IV handling less insane:

  - CRYPTO_F_IV_SEPARATE means that the IV is stored in the 'crp_iv'
    member of the operation structure.  If this flag is not set, the
    IV is stored in the data buffer at the 'crp_iv_start' offset.

  - CRYPTO_F_IV_GENERATE means that a random IV should be generated
    and stored into the data buffer.  This cannot be used with
    CRYPTO_F_IV_SEPARATE.

  If a consumer wants to deal with explicit vs implicit IVs, etc. it
  can always generate the IV however it needs and store partial IVs in
  the buffer and the full IV/nonce in crp_iv and set
  CRYPTO_F_IV_SEPARATE.

  The layout of the buffer is now described via fields in cryptop.
  crp_aad_start and crp_aad_length define the boundaries of any AAD.
  Previously with GCM and CCM you defined an auth crd with this range,
  but for ETA your auth crd had to span both the AAD and plaintext
  (and they had to be adjacent).

  crp_payload_start and crp_payload_length define the boundaries of
  the plaintext/ciphertext.  Modes that only do a single operation
  (COMPRESS, CIPHER, DIGEST) should only use this region and leave the
  AAD region empty.

  If a digest is present (or should be generated), it's starting
  location is marked by crp_digest_start.

  Instead of using the CRD_F_ENCRYPT flag to determine the direction
  of the operation, cryptop now includes an 'op' field defining the
  operation to perform.  For digests I've added a new VERIFY digest
  mode which assumes a digest is present in the input and fails the
  request with EBADMSG if it doesn't match the internally-computed
  digest.  GCM and CCM already assumed this, and the new AEAD mode
  requires this for decryption.  The new ETA mode now also requires
  this for decryption, so IPsec and GELI no longer do their own
  authentication verification.  Simple DIGEST operations can also do
  this, though there are no in-tree consumers.

  To eventually support some refcounting to close races, the session
  cookie is now passed to crypto_getop() and clients should no longer
  set crp_sesssion directly.

- Assymteric crypto operation structures should be allocated via
  crypto_getkreq() and freed via crypto_freekreq().  This permits the
  crypto layer to track open asym requests and close races with a
  driver trying to unregister while asym requests are in flight.

- crypto_copyback, crypto_copydata, crypto_apply, and
  crypto_contiguous_subsegment now accept the 'crp' object as the
  first parameter instead of individual members.  This makes it easier
  to deal with different buffer types in the future as well as
  separate input and output buffers.  It's also simpler for driver
  writers to use.

- bus_dmamap_load_crp() loads a DMA mapping for a crypto buffer.
  This understands the various types of buffers so that drivers that
  use DMA do not have to be aware of different buffer types.

- Helper routines now exist to build an auth context for HMAC IPAD
  and OPAD.  This reduces some duplicated work among drivers.

- Key buffers are now treated as const throughout the framework and in
  device drivers.  However, session key buffers provided when a session
  is created are expected to remain alive for the duration of the
  session.

- GCM and CCM sessions now only specify a cipher algorithm and a cipher
  key.  The redundant auth information is not needed or used.

- For cryptosoft, split up the code a bit such that the 'process'
  callback now invokes a function pointer in the session.  This
  function pointer is set based on the mode (in effect) though it
  simplifies a few edge cases that would otherwise be in the switch in
  'process'.

  It does split up GCM vs CCM which I think is more readable even if there
  is some duplication.

- I changed /dev/crypto to support GMAC requests using CRYPTO_AES_NIST_GMAC
  as an auth algorithm and updated cryptocheck to work with it.

- Combined cipher and auth sessions via /dev/crypto now always use ETA
  mode.  The COP_F_CIPHER_FIRST flag is now a no-op that is ignored.
  This was actually documented as being true in crypto(4) before, but
  the code had not implemented this before I added the CIPHER_FIRST
  flag.

- I have not yet updated /dev/crypto to be aware of explicit modes for
  sessions.  I will probably do that at some point in the future as well
  as teach it about IV/nonce and tag lengths for AEAD so we can support
  all of the NIST KAT tests for GCM and CCM.

- I've split up the exising crypto.9 manpage into several pages
  of which many are written from scratch.

- I have converted all drivers and consumers in the tree and verified
  that they compile, but I have not tested all of them.  I have tested
  the following drivers:

  - cryptosoft
  - aesni (AES only)
  - blake2
  - ccr

  and the following consumers:

  - cryptodev
  - IPsec
  - ktls_ocf
  - GELI (lightly)

  I have not tested the following:

  - ccp
  - aesni with sha
  - hifn
  - kgssapi_krb5
  - ubsec
  - padlock
  - safe
  - armv8_crypto (aarch64)
  - glxsb (i386)
  - sec (ppc)
  - cesa (armv7)
  - cryptocteon (mips64)
  - nlmsec (mips64)

Discussed with:	cem
Relnotes:	yes
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D23677
2020-03-27 18:25:23 +00:00
John Baldwin
47172feb8d Use the newer EINTEGRITY error when authentication fails.
GELI used to fail with EINVAL when a read request spanned a disk
sector whose contents did not match the sector's authentication tag.
The recently-added EINTEGRITY more closely matches to the error in
this case.

Reviewed by:	cem, mckusick
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D24131
2020-03-23 21:26:32 +00:00
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Pawel Biernacki
53a6215c83 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (12 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Approved by:	kib (mentor, blanket)
Differential Revision:	https://reviews.freebsd.org/D23637
2020-02-24 10:42:56 +00:00
Kyle Evans
c81929d343 geli taste: allow GELIBOOT tagged providers as well
Currently the installer will tag geliboot partitions with both BOOT and
GELIBOOT; the former allows the kernel to taste it at boot, while the latter
is what loaders keys off of.

However, it seems reasonable to assume that if a provider's been tagged with
GELIBOOT that the kernel should also take that as a hint to taste/attach at
boot. This would allow us to stop tagging GELIBOOT partitions with BOOT in
bsdinstall, but I'm not sure that there's a compelling reason to do so any
time soon.

Reviewed by:	oshogbo
Differential Revision:	https://reviews.freebsd.org/D23387
2020-02-07 21:36:14 +00:00
Warner Losh
9133f3d097 Supress not supported message
For the moment, supress the operation not supported messages at this level.  In
the fullness of time, we will have better error tracking so we can diagnose
issues in the future.

Reviewed by: scottl@
2020-02-07 17:47:08 +00:00
Pawel Jakub Dawidek
76b47dfb8f The error variable is not really needed. Remove it. 2020-02-01 10:15:23 +00:00
Konstantin Belousov
fd99699d7e Fix aggregating geoms for BIO_SPEEDUP.
If the bio was split into several bios going down, completion computes
bio_completed of the original bio as sum of the bio_completes of the
splits.  For BIO_SETUP, bio_length means something different than the
length. it is the requested speedup amount, and is duplicated into the
splits, which is in fact reasonable, since we cannot know how the
previous activity was distributed among subordinate geoms.  Obviously,
the sum of n bio_length is greater than bio_length for n > 1, which
triggers assert that bio_length >= bio_completed for e.g. geom_stripe
and geom_raid3.

Fix this by reassigning bio_completed from bio_length for completed
BIO_SPEEDED, I do not think it really mattters what we return in
bio_completed.

Reported and tested by:	pho
Reviewed by:	imp
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D23380
2020-01-27 13:15:16 +00:00
Conrad Meyer
151e04b3fe GEOM label: strip leading/trailing space synthesizing devfs names
%20%20%20 is ugly and doesn't really help make human-readable devfs names.

PR:		243318
Reported by:	Peter Eriksson <pen AT lysator.liu.se>
Relnotes:	yes
2020-01-18 03:33:44 +00:00
Warner Losh
3cf5dd8401 Use buf to send speedup
It turns out there's a problem with using g_io to send the speedup. It leads to
a race when there's a resource shortage when a disk fails.

Instead, send BIO_SPEEDUP via struct buf. This is pretty straight forward,
except we need to transfer the bio_flags from b_ioflags for BIO_SPEEDUP commands
in g_vfs_strategy.

Reviewed by: kirk, chs
Differential Revision: https://reviews.freebsd.org/D23117
2020-01-17 01:16:19 +00:00
Warner Losh
8b522bdae6 Pass BIO_SPEEDUP through all the geom layers
While some geom layers pass unknown commands down, not all do. For the ones that
don't, pass BIO_SPEEDUP down to the providers that constittue the geom, as
applicable. No changes to vinum or virstor because I was unsure how to add this
support, and I'm also unsure how to test these. gvinum doesn't implement
BIO_FLUSH either, so it may just be poorly maintained. gvirstor is for testing
and not supportig BIO_SPEEDUP is fine.

Reviewed by: chs
Differential Revision: https://reviews.freebsd.org/D23183
2020-01-17 01:15:55 +00:00
Mateusz Guzik
879e0604ee Add KERNEL_PANICKED macro for use in place of direct panicstr tests 2020-01-12 06:07:54 +00:00
Mateusz Guzik
c8b3463dd0 vfs: reimplement deferred inactive to use a dedicated flag (VI_DEFINACT)
The previous behavior of leaving VI_OWEINACT vnodes on the active list without
a hold count is eliminated. Hold count is kept and inactive processing gets
explicitly deferred by setting the VI_DEFINACT flag. The syncer is then
responsible for vdrop.

Reviewed by:	kib (previous version)
Tested by:	pho (in a larger patch, previous version)
Differential Revision:	https://reviews.freebsd.org/D23036
2020-01-07 15:56:24 +00:00
Alexander Motin
0aabbeff36 Remove extra check for provider being closed.
We already checked for that earlier, and since we hold topology lock
it could not change.

MFC after:	1 week
2020-01-02 20:30:53 +00:00
Alexander Motin
4aa1289a38 Avoid few memory accesses in g_disk_done(). 2019-12-31 03:43:13 +00:00
Alexander Motin
024932aae9 Use atomic for start_count in devstat_start_transaction().
Combined with earlier nstart/nend removal it allows to remove several locks
from request path of GEOM and few other places.  It would be cool if we had
more SMP-friendly statistics, but this helps too.

Sponsored by:	iXsystems, Inc.
2019-12-30 03:13:38 +00:00
Alexander Motin
9794a803fd Retire nstart/nend counters.
Those counters were abused for decade to workaround broken orphanization
process in different classes by delaying the call while there are active
requests.  But from one side it did not close all the races, while from
another was quite expensive on SMP due to trashing twice per request cache
lines of consumer and provider and requiring locks.  It lost its sense
after I manually went through all the GEOM classes in base and made
orphanization wait for either provider close or request completion.

Consumer counters are still used under INVARIANTS to detect premature
consumer close and detach.  Provider counters are removed completely.

Sponsored by:	iXsystems, Inc.
2019-12-30 00:46:10 +00:00
Alexander Motin
86c06ff886 Remove GEOM_SCHED class and gsched tool.
This code was not actively maintained since it was introduced 10 years ago.
It lacks support for many later GEOM features, such as direct dispatch,
unmapped I/O, stripesize/stripeoffset, resize, etc.  Plus it is the only
remaining use of GEOM nstart/nend request counters, used there to implement
live insertion/removal, questionable by itself.  Plus, as number of people
commented, GEOM is not the best place for I/O scheduler, since it has
limited information about layers both above and below it, required for
efficient scheduling.  Plus with the modern shift to SSDs there is just no
more significant need for this kind of scheduling.

Approved by:	imp, phk, luigi
Relnotes:	yes
2019-12-29 21:16:03 +00:00
Alexander Motin
cfdb91850c Missed part of r356162.
If we postpone consumer destruction till close, then the close calls should
not be ignored.  Delay geom withering till the last close too.

MFC after:	2 weeks
X-MFC-with:	r356162
Sponsored by:	iXsystems, Inc.
2019-12-29 19:33:41 +00:00
Alexander Motin
1d301810d3 Fix GEOM_VIRSTOR orphanization.
Previous code closed and destroyed consumer even with I/O in progress.
This patch postpones the destruction till the last close.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-12-29 19:21:29 +00:00
Alexander Motin
d2d5fee931 Fix GEOM_MOUNTVER orphanization.
Previous code closed and detached consumer even with I/O still in progress.
This patch adds locking and request counting to postpone the close till
the last of running requests completes.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-12-29 17:10:21 +00:00
Mariusz Zaborski
645532a448 gnop: change the "count until fail" option
Change the "count_until_fail" option of gnop, now it enables the failing
rating instead of setting them to 100%.

The original patch introduced the new flag, which sets the fail/rate to 100%
after N requests. In some cases, we don't want to have 100% of failure
probabilities. We want to start failing at some point.
For example, on the early stage, we may like to allow some read/writes requests
before having some requests delayed - when we try to mount the partition,
or when we are trying to import the pool.
Another case may be to check how scrub in ZFS will behave on different stages.

This allows us to cover more cases.
The previous behavior still may be configured.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D22632
2019-12-29 15:47:37 +00:00
Mariusz Zaborski
80e63e0a90 gnop: allow to change the name of created device
Thanks to this option we can create more then one gnop provider from
single provider. This may be useful for temporary labeling some data
on the disk.

Reviewed by:	markj, allanjude, bcr
Differential Revision:	https://reviews.freebsd.org/D22304
2019-12-29 15:40:02 +00:00
Alexander Motin
6a8eef35b5 Fix GEOM_SHSEC orphanization.
Previous code closed and destroyed consumer even with I/O in progress.
This patch postpones the destruction till the last close, identical to
GEOM_STRIPE, since they seem to have common origin.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-12-28 23:21:53 +00:00
Alexander Motin
351d0fa6df Fix GEOM_GATE orphanization.
Previous code closed and destroyed direct read consumer even with I/O still
in progress.  This patch adds locking and request counting to postpone the
close till the last of running requests completes.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-12-28 17:52:53 +00:00
Alexander Motin
2178f45b86 Fix GEOM_UZIP orphanization.
Previous code destroyed softc even with provider still open, that resulted
in panic under load.  This change postpones the free till the final close,
when we know for sure there will be no more I/O requests.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-12-27 21:44:13 +00:00
Alexander Motin
a29df733fa Reimplement gvinum orphanization.
gvinum was the only GEOM class, using consumer nstart/nend fields. Making
it do its own accounting for orphanization purposes allows in perspective
to remove burden of that expensive for SMP accounting from GEOM.

Also the previous implementation spinned in a tight event loop, waiting
for all active BIOs to complete, while the new one knows exactly when it
is possible to close the consumer.

MFC after:	1 month
Sponsored by:	iXsystems, Inc.
2019-12-27 01:36:53 +00:00
Warner Losh
b182c79211 Add BIO_SPEEDUP
Add BIO_SPEEDUP bio command and g_io_speedup wrapper. It tells the
lower layers that the upper layers are dealing with some shortage
(dirty pages and/or disk blocks). The lower layers should do what they
can to speed up anything that's been delayed.

The first use will be to tell the CAM I/O scheduler that any TRIM
shaping should be short-circuited because the system needs
blocks. We'll also call it when there's too many resources used by
UFS.

Reviewed by: kirk, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D18351
2019-12-17 00:13:35 +00:00
Edward Tomasz Napierala
2006d590d6 Add kern.geom.part.separator tunable. This makes it possible
to specify an optional separator to insert before partition name;
eg if it's set to "c/", you'll get "ada0c/s1" instead of "ada0s1".
(It cannot be set to just “/“, since ada0 is a device node, not
a directory.)

Reviewed by:	imp
MFC after:	2 weeks
Sponsored by:	Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D22193
2019-12-13 09:28:44 +00:00
Alexander Motin
5ccbeea1c5 Remove some branching from GEOM_DISK hot path.
pp->private just can not be NULL in those places.

In g_disk_start() and g_disk_ioctl() both dp != NULL and !dp->d_destroyed
should always be true if disk_gone() and disk_destroy() are used properly,
since GEOM does not send requests to errored providers.  If the protocol is
not followed, then no amount of additional checks here give real safety.

In g_disk_access() though the checks are useful, since GEOM blocks only
new opens for errored providers, but allows closes.  It should not happen
if disk_gone() and disk_destroy() are used properly, but may otherwise.

To improve cases when disk_gone() is not used, call it from disk_destroy().
It does not give full guaranties, but it errors the provider and makes
GEOM block unwanted requests at least after some race.

MFC after:	2 weeks
2019-12-06 16:48:36 +00:00
Alexander Motin
19cfcf253e Block ioctls for dying GEOM_DEV instances.
For normal I/Os consumer and provider statuses are checked by g_io_check().
But ioctl calls often do not go through it, being dispatched directly. This
change makes their semantics more alike, protecting lower levels.

MFC after:	2 weeks
2019-12-06 03:46:38 +00:00
Alexander Motin
6b3c68bf09 Make GEOM_DEV code slightly more compact.
Should be no functional change.

MFC after:	2 weeks
2019-12-06 03:18:37 +00:00
Alan Somers
67f72211dd gmultipath: add ATF tests
Add ATF tests for most gmultipath operations. Add some dtrace probes too,
primarily for configuration changes that happen in response to provider
errors.

PR:		178473
MFC after:	2 weeks
Sponsored by:	Axcient
Differential Revision:	https://reviews.freebsd.org/D22235
2019-12-06 00:12:14 +00:00
Alexander Motin
c4c88d4718 Remove duplicate g_debugflags declaration.
While there, define G_F_FOOTSHOOTING instead of numeric constants.

MFC after:	13 days
X-MFX-with:	r355412
2019-12-05 15:07:32 +00:00
Alexander Motin
2efaef42e4 Wrap g_trace() into a macro to avoid unneeded calls.
In most cases with debug disabled this function does nothing, but argument
passing and the call still cost measurable time due to cache misses, etc.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-12-05 04:52:19 +00:00
Alexander Motin
e61ed7983e Switch GEOM_DEV from make_dev_p() to make_dev_s().
It closes the race condition and so allows to remove few NULL checks.

Also while there, use dev->si_drv1 in addition to cp->private to store
softc pointer.  For calls coming from the dev side it gives reliable cache
hit instead of often miss before.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-12-05 04:03:08 +00:00
Alexander Motin
61322a0a8a Mark some more hot global variables with __read_mostly.
MFC after:	1 week
2019-12-04 21:26:03 +00:00
Warner Losh
283a5a3796 We don't even need Giant here. It isn't protecting anything internal
to geom, and nothing we call requires it to be held. It's left over
from a time when the latter wasn't the case. Retire it.

Reviewed in concept: scottl@
2019-11-23 23:44:00 +00:00
Edward Tomasz Napierala
b5961be1ab Add GEOM attribute to report physical device name, and report it
via 'diskinfo -v'.  This avoids the need to track it down via CAM,
and should also work for disks that don't use CAM.  And since it's
inherited thru the GEOM hierarchy, in most cases one doesn't need
to walk the GEOM graph either, eg you can use it on a partition
instead of disk itself.

Reviewed by:	allanjude, imp
Sponsored by:	Klara Inc
Differential Revision:	https://reviews.freebsd.org/D22249
2019-11-09 17:30:19 +00:00
Chuck Silvers
30738a349d Make all the gnop parameters optional in the request from userland,
filling in the same defaults that the current userland module uses.
This allows an old geom_nop.so userland module to work with a new kernel.

Approved by:	imp (mentor)
Reviewed by:	cem
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21972
2019-10-16 21:49:44 +00:00
Chuck Silvers
1e00bb458d Add a new gctl_get_paraml_opt() interface to extract optional parameters from
the request.  It is the same as gctl_get_paraml() except that the request
is not marked with an error if the parameter is not present.

Approved by:	imp (mentor)
Reviewed by:	cem
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D21972
2019-10-16 21:49:39 +00:00
Chuck Silvers
090a3ea3c2 Add a "count_until_fail" option to gnop, which says to start failing
I/O requests after the given number have been allowed though.

Approved by:    imp (mentor)
Reviewed by:    rpokala kib 0mp mckusick
Sponsored by:   Netflix
Differential Revision:  https://reviews.freebsd.org/D21593
2019-09-13 23:03:56 +00:00
Kyle Evans
ef03f57dd2 Allow more nesting of GEOM partitioning schemes
GEOM is supposed to be topology-agnostic, but the GPT and BSD partition code
has arbitrary restrictions on nesting that are annoying in cases such as
running VMs on raw partitions (since the VM's partitioning scheme is not
visible to the host).

This patch adds sysctls to disable the restrictions except in the case of
BSD label (and similar) partitions with offset 0 (where we need to avoid
recursively recognizing the label).

Submitted by:	Andrew Gierth
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D21350
2019-09-03 20:57:20 +00:00
Conrad Meyer
eefd8f96fb geom_uzip(4), mkuzip(8): Add Zstd image mode
The Zstd format bumps the CLOOP major number to 4 to avoid incompatibility
with older systems.  Support in geom_uzip(4) is conditional on the ZSTDIO
kernel option, which is enabled in amd64 GENERIC, but not all in-tree
configurations.

mkuzip(8) was modified slightly to always initialize the nblocks + 1'th
offset in the CLOOP file format.  Previously, it was only initialized in the
case where the final compressed block happened to be unaligned w.r.t.
DEV_BSIZE.  The "Fake" last+1 block change in r298619 means that the final
compressed block's 'blen' was never correct unless the compressed uzip image
happened to be BSIZE-aligned.  This happened in about 1 out of every 512
cases.  The zlib and lzma decompressors are probably tolerant of extra trash
following the frame they were told to decode, but Zstd complains that the
input size is incorrect.

Correspondingly, geom_uzip(4) was modified slightly to avoid trashing the
nblocks + 1'th offset when it is known to be initialized to a good value.
This corrects the calculated final real cluster compressed length to match
that printed by mkuzip(8).

mkuzip(8) was refactored somewhat to reduce code duplication and increase
ease of adding other compression formats.

  * Input block size validation was pulled out of individual compression
    init routines into main().

  * Init routines now validate a user-provided compression level or select
    an algorithm-specific default, if none was provided.

  * A new interface for calculating the maximal compressed size of an
    incompressible input block was added for each driver.  The generic code
    uses it to validate against MAXPHYS as well as to allocate compression
    result buffers in the generic code.

  * Algorithm selection is now driven by a table lookup, to increase ease of
    adding other formats in the future.

mkuzip(8) gained the ability to explicitly specify a compression level with
'-C'.  The prior defaults -- 9 for zlib and 6 for lzma -- are maintained.
The new zstd default is 9, to match zlib.

Rather than select lzma or zlib with '-L' or its absense, respectively, a
new argument '-A <algorithm>' is provided to select 'zlib', 'lzma', or
'zstd'.  '-L' is considered deprecated, but will probably never be removed.

All of the new features were documented in mkuzip.8; the page was also
cleaned up slightly.

Relnotes:	yes
2019-08-13 23:32:56 +00:00
Conrad Meyer
ac8e5d02cf Remove deprecated GEOM classes
Follow-up on r322318 and r322319 and remove the deprecated modules.

Shift some now-unused kernel files into userspace utilities that incorporate
them.  Remove references to removed GEOM classes in userspace utilities.

Reviewed by:	imp (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D21249
2019-08-13 20:06:55 +00:00
Xin LI
2b0cabbdae Update geom_uzip to use new zlib:
- Use new zlib headers;
 - Removed z_alloc and z_free to use the common sys/dev/zlib version.
 - Replace z_compressBound with compressBound from zlib.

While there, limit LZMA CFLAGS to apply only for g_uzip_lzma.c.

PR:		229763
Submitted by:	Yoshihiro Ota <ota j email ne jp> (with changes,
		bugs are mine)
Differential Revision:	https://reviews.freebsd.org/D20271
2019-08-08 06:27:39 +00:00
Conrad Meyer
ac03832ef3 GEOM: Reduce unnecessary log interleaving with sbufs
Similar to what was done for device_printfs in r347229.

Convert g_print_bio() to a thin shim around g_format_bio(), which acts on an
sbuf; documented in g_bio.9.

Reviewed by:	markj
Discussed with:	rlibby
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D21165
2019-08-07 19:28:35 +00:00
Kirk McKusick
e9660daffb Ignore UFS/FFS superblock check hash failures so as to allow a higher
level in the filesystem stack to decide what to do about them.

Reported by:  Peter Holm
Tested by:    Peter Holm
Sponsored by: Netflix
2019-08-06 18:28:44 +00:00
Mariusz Zaborski
4d7486c30f gnop: style nits 2019-07-31 17:51:06 +00:00
Mariusz Zaborski
4f80c85519 gnop: Introduce requests delay.
This allows to simulated disk that is responding slowly to the IO requests.

Reviewed by:	markj, bcr, pjd (previous version)
Differential Revision:	https://reviews.freebsd.org/D21052
2019-07-31 17:47:12 +00:00
Ryan Libby
9167705c8c g_mirror_taste: avoid deadlock, always clear tasting flag
If g_mirror_taste encountered an error at g_mirror_add_disk, it might
try to g_mirror_destroy the device with the G_MIRROR_DEVICE_FLAG_TASTING
flag still set.  This would wait on a worker to complete the destruction
with g_mirror_try_destroy, but that function bails out if the tasting
flag is set, resulting in a deadlock.  Clear the tasting flag before
trying to destroy the device.

Test Plan:
sysctl debug.fail_point.mnowait="1%return"
kyua test -k /usr/tests/sys/geom/class/mirror/Kyuafile

Reviewed by:	markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20744
2019-07-01 22:06:36 +00:00
Ryan Libby
3bb6e0f0c7 g_eli_create: only dec g_access acw if we inc'd it
Reviewed by:	cem, markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D20743
2019-07-01 22:06:16 +00:00
Warner Losh
f5a95d9a07 Remove NAND and NANDFS support
NANDFS has been broken for years. Remove it. The NAND drivers that
remain are for ancient parts that are no longer relevant. They are
polled, have terrible performance and just for ancient arm
hardware. NAND parts have evolved significantly from this early work
and little to none of it would be relevant should someone need to
update to support raw nand. This code has been off by default for
years and has violated the vnode protocol leading to panics since it
was committed.

Numerous posts to arch@ and other locations have found no actual users
for this software.

Relnotes:	Yes
No Objection From: arch@
Differential Revision: https://reviews.freebsd.org/D20745
2019-06-25 04:50:09 +00:00
Alexander Motin
49ee0fcea5 Use sbuf_cat() in GEOM confxml generation.
When it comes to megabytes of text, difference between sbuf_printf() and
sbuf_cat() becomes substantial.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2019-06-19 15:36:02 +00:00
Alexander Motin
5c32e9fcb2 Optimize kern.geom.conf* sysctls.
On large systems those sysctls may generate megabytes of output.  Before
this change sbuf(9) code was resizing buffer by 4KB each time many times,
generating tons of TLB shootdowns.  Unfortunately in this case existing
sbuf_new_for_sysctl() mechanism, supposed to help with this issue, is not
applicable, since all the sbuf writes are done in different kernel thread.

This change improves situation in two ways:
 - on first sysctl call, not providing any output buffer, it sets special
sbuf drain function, just counting the data and so not needing big buffer;
 - on second sysctl call it uses as initial buffer size value saved on
previous call, so that in most cases there will be no reallocation, unless
GEOM topology changed significantly.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-06-18 21:05:10 +00:00
Xin LI
f89d207279 Separate kernel crc32() implementation to its own header (gsb_crc32.h) and
rename the source to gsb_crc32.c.

This is a prerequisite of unifying kernel zlib instances.

PR:		229763
Submitted by:	Yoshihiro Ota <ota at j.email.ne.jp>
Differential Revision:	https://reviews.freebsd.org/D20193
2019-06-17 19:49:08 +00:00
Mariusz Zaborski
a802439365 geli: style nits 2019-06-12 19:29:48 +00:00
Mariusz Zaborski
e7630efbe6 geli: partially revert r348709
Let's change the unsigned arguments to the signed one, but let's don't
change pointers to the array notation.

Requested by:	pjd
2019-06-12 19:29:12 +00:00
Mariusz Zaborski
1808673cc4 geli: build warning fixes
Submitted by:	Aaron Prieger <aprieger@llnw.com>
Reviewed by:	sbruno
Differential Revision:	https://reviews.freebsd.org/D11068
2019-06-05 22:46:18 +00:00
Kirk McKusick
10519e1398 When using the destroy option to shut down a nop GEOM module, I/O
operations already in its queue were not being properly drained.
The GEOM framework does the queue draining, but the module needs
to wait for the draining to happen. The waiting is done by adding
a g_nop_providergone() function to wait for the I/O operations to
finish up. This change is similar to change -r345758 made to the
memory-disk driver.

Submitted by: Chuck Silvers
Tested by:    Chuck Silvers
MFC after:    1 week
Sponsored by: Netflix
2019-05-25 00:07:49 +00:00
John Baldwin
5c420aae3b Add deprecation warnings for weaker algorithms to geli(4).
- Triple DES has been formally deprecated in Kerberos (RFC 8429)
  and is soon to be deprecated in IPsec (RFC 8221).
- Blowfish is deprecated.  FreeBSD doesn't support its successor
  (Twofish).
- MD5 is generally considered a weak digest that has known attacks.

geli refuses to create new volumes using these algorithms via 'geli
init'.  It also warns when attaching to existing volumes or creating
temporary volumes via 'geli onetime' .  The plan is to fully remove
support for these algorithms in FreeBSD 13.

Note that none of these algorithms have ever been the default
algorithm used by geli(8).  Users would have had to explicitly select
these algorithms when creating volumes in the past.

Reviewed by:	cem, delphij
MFC after:	3 days
Relnotes:	yes
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D20344
2019-05-23 22:31:55 +00:00
Conrad Meyer
6b6e2954dd List-ify kernel dump device configuration
Allow users to specify multiple dump configurations in a prioritized list.
This enables fallback to secondary device(s) if primary dump fails.  E.g.,
one might configure a preference for netdump, but fallback to disk dump as a
second choice if netdump is unavailable.

This change does not list-ify netdump configuration, which is tracked
separately from ordinary disk dumps internally; only one netdump
configuration can be made at a time, for now.  It also does not implement
IPv6 netdump.

savecore(8) is already capable of scanning and iterating multiple devices
from /etc/fstab or passed on the command line.

This change doesn't update the rc or loader variables 'dumpdev' in any way;
it can still be set to configure a single dump device, and rc.d/savecore
still uses it as a single device.  Only dumpon(8) is updated to be able to
configure the more complicated configurations for now.

As part of revving the ABI, unify netdump and disk dump configuration ioctl
/ structure, and leave room for ipv6 netdump as a future possibility.
Backwards-compatibility ioctls are added to smooth ABI transition,
especially for developers who may not keep kernel and userspace perfectly
synced.

Reviewed by:	markj, scottl (earlier version)
Relnotes:	maybe
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19996
2019-05-06 18:24:07 +00:00
Roger Pau Monné
b951b8f721 geom: fix initialization order
There's a race between the initialization of devsoftc.mtx (by devinit)
and the creation of the geom worker thread g_run_events, which calls
devctl_queue_data_f. Both of those are initialized at SI_SUB_DRIVERS
and SI_ORDER_FIRST, which means the geom worked thread can be created
before the mutex has been initialized, leading to the panic below:

 wpanic: mtx_lock() of spin mutex (null) @ /usr/home/osstest/build.135317.build-amd64-freebsd/freebsd/sys/kern/subr_bus.c:620
 cpuid = 3
 time = 1
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe003b968710
 vpanic() at vpanic+0x19d/frame 0xfffffe003b968760
 panic() at panic+0x43/frame 0xfffffe003b9687c0
 __mtx_lock_flags() at __mtx_lock_flags+0x145/frame 0xfffffe003b968810
 devctl_queue_data_f() at devctl_queue_data_f+0x6a/frame 0xfffffe003b968840
 g_dev_taste() at g_dev_taste+0x463/frame 0xfffffe003b968a00
 g_load_class() at g_load_class+0x1bc/frame 0xfffffe003b968a30
 g_run_events() at g_run_events+0x197/frame 0xfffffe003b968a70
 fork_exit() at fork_exit+0x84/frame 0xfffffe003b968ab0
 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe003b968ab0
 --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
 KDB: enter: panic
 [ thread pid 13 tid 100029 ]
 Stopped at      kdb_enter+0x3b: movq    $0,kdb_why

Fix this by initializing geom at SI_ORDER_SECOND instead of
SI_ORDER_FIRST.

Sponsored by:		Citrix Systems R&D
Reviewed by:		kevans, markj
Differential revision:	https://reviews.freebsd.org/D20148
2019-05-06 09:48:34 +00:00
Alexander Motin
9c498bd5c3 Call delist_dev() before destroy_dev_sched_cb().
destroy_dev_sched_cb() is excessively asynchronous, and during media change
retaste new provider may appear sooner then device of the previous one get
destroyed.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-04-24 19:56:02 +00:00
Conrad Meyer
83efd2885e gnop(8): Nopify configuration as a kernel dump device
As a dummy / no-op dump device, to facilitate dumpon(8) testing.

Reviewed by:	markj (earlier version)
Differential Revision:	https://reviews.freebsd.org/D19991
2019-04-22 03:25:49 +00:00
Pawel Jakub Dawidek
2f07cdf871 Implement automatic online expansion of GELI providers - if the underlying
provider grows, GELI will expand automatically and will move the metadata
to the new location of the last sector.

This functionality is turned on by default. It can be turned off with the
-R flag, but it is not recommended - if the underlying provider grows and
automatic expansion is turned off, it won't be possible to attach this
provider again, as the metadata is no longer located in the last sector.

If the automatic expansion is turned off and the underlying provider grows,
GELI will only log a message with the previous size of the provider, so
recovery can be easier.

Obtained from:	Fudo Security
2019-04-03 23:57:37 +00:00
Pawel Jakub Dawidek
e6b0d5eb9f Introduce new event SIZECHANGE within GEOM system to inform about GEOM
providers mediasize changes.

While here, use GEOM nomenclature to describe providers instead of calling
them device nodes.

Obtained from:	Fudo Security
Tested in:	AWS
2019-03-30 07:24:34 +00:00
Ian Lepore
91a3f3588a Support device-independent labels for geom_flashmap slices.
While geom_flashmap has always supported label names for its slices, it does
so by appending "s.labelname" to the provider device name, meaning you still
have to know the name and unit of the hardware device to use the labels.

These changes add support for device-independent geom_flashmap labels, using
the standard geom_label infrastructure. geom_flashmap now creates a softc
struct attached to its geom, and as it creates slices it stores the label
into an array in the softc. The new geom_label_flashmap uses those labels
when tasting a geom_flashmap provider.

Differential Revision:	https://reviews.freebsd.org/D19535
2019-03-24 19:11:45 +00:00
Conrad Meyer
54533f66c9 stack(9): Drop unused API mode and comment that referenced it
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D19601
2019-03-15 22:39:55 +00:00
Marcel Moolenaar
96937e3b23 Revert revision 254095
In revision 254095, gpt_entries is not set to match the on-disk
hdr_entries, but rather is computed based on available space.
There are 2 problems with this:

1.  The GPT backend respects hdr_entries and only reads and writes
    that number of partition entries.  On top of that, CRC32 is
    computed over the table that has hdr_entries elements.  When
    the common code works on what is possibly a larger number, the
    behaviour becomes inconsistent and problematic.  In particular,
    it would be possible to add a new partition that on a reboot
    isn't there anymore.
2.  The calculation of gpt_entries is based on flawed assumptions.
    The GPT specification does not dictate that sectors are layed
    out in a particular way that the available space can be
    determined by looking at LBAs.  In practice, implementations
    do the same thing, because there's no reason to do it any
    other way.  Still, GPT allows certain freedoms that can be
    exploited in some form or shape if the need arises.

PR:		229977
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D19438
2019-03-05 04:15:34 +00:00
Konstantin Belousov
e8643b01e6 Modularize xz.
Embedded lzma decompression library becomes a module usable by other
consumers, in addition to geom_uzip.

Most important code changes are
- removal of XZ_DEC_SINGLE define, we need the code to work
  with XZ_DEC_DYNALLOC;
- xz_crc32_init() call is removed from geom_uzip, xz module handles
  initialization on its own.

xz is no longer embedded into geom_uzip, instead the depend line for
the module is provided, and corresponding kernel option is added to
each MIPS kernel config file using geom_uzip.

The commit also carries unrelated cleanup by removing excess "device geom_uzip"
in places which were missed in r344479.

Reviewed by:	cem, hselasky, ray, slavash (previous versions)
Sponsored by:	Mellanox Technologies
Differential revision:	https://reviews.freebsd.org/D19266
MFC after:	3 weeks
2019-02-26 19:55:03 +00:00
Mark Johnston
3843d88ca8 Add a missing return statement to g_concat_kernel_dump().
The error occurs when upper layers attempt an out-of-bounds write.

Submitted by:	Noah Bergbauer <noah.bergbauer@tum.de>
MFC after:	1 week
2019-02-26 18:30:51 +00:00
Mark Johnston
cd2e908669 Define a constant for the maximum number of GEOM_CTL arguments.
Reviewed by:	eugen
MFC with:	r344305
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19271
2019-02-20 17:07:08 +00:00
Mark Johnston
d4fbe32c65 Limit the number of entries allocated for a REPORT_ZONES command.
The DIOCGETZONE ioctl can be used to fetch the zone list of an SMR
drive, and the caller specifies the number of entries it wants to fetch.
Clamp the caller's request to a sane limit so that a user cannot attempt
large allocations. Callers already need to invoke the ioctl multiple
times to fetch the full list in general, so there's no harm in limiting
the number of entries returned.

Fix style while here.

admbug:		807
Reported by:	Ilja Van Sprundel <ivansprundel@ioactive.com>
Reviewed by:	asomers, ken
Tested by:	ken
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19249
2019-02-19 21:33:02 +00:00
Mark Johnston
60a92c781d Impose a limit on the number of GEOM_CTL arguments.
Otherwise a privileged user can trigger a memory allocation of
unbounded size, or an integer overflow in the subsequent
geom_alloc_copyin() call, leading to out-of-bounds accesses.

Hard-code a large limit to circumvent this problem.

admbug:		854
Reported by:	Anonymous of the Shellphish Grill Team
Reviewed by:	ae
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19251
2019-02-19 21:22:22 +00:00
Andriy Voskoboinyk
81df432ecf geom_uzip(4): set 'gp != NULL' assertion on top of the function
There was yet another access to this variable in g_trace() few
lines upper.

PR:		203499
Reported by:	cem
MFC after:	5 days
MFC with:	343473
2019-01-26 17:17:25 +00:00
Andriy Voskoboinyk
34fd9d7000 geom_uzip(4): move NULL pointer KASSERT check before it is dereferenced
PR:		203499
Submitted by:	<chadf@triularity.org>
MFC after:	5 days
2019-01-26 14:54:06 +00:00
Conrad Meyer
797f009d59 gmirror: Relocate DEVICE_FLAGS to adjacent lines
gmirror's sc_flags is shared between some on-disk state and some runtime
only state.  There's no real reason for that and they could probably be
split up.  Until they are, locate all of the flags for the same field
nearby each other in the source, for clarity.

No functional change.

Sponsored by:	Dell EMC Isilon
2019-01-23 16:44:21 +00:00
Mark Johnston
438622af06 Use g_handleattr() to reply to GEOM::candelete queries.
g_handleattr() fills out bp->bio_completed; otherwise, g_getattr()
returns an error in response to the query.  This caused BIO_DELETE
support to not be propagated through stacked configurations, e.g.,
a gconcat of gmirror volumes would not handle BIO_DELETE even when
the gmirrors do.  g_io_getattr() was not affected by the problem.

PR:		232676
Reported and tested by:	noah.bergbauer@tum.de
MFC after:	1 week
2019-01-02 15:52:16 +00:00
Alexander Motin
02a9923034 Switch from mutexes to atomics in GEOM_DEV I/O path.
Mutexes in I/O path there were used twice per I/O to atomically access
several variables to close and/or destroy the device on last request
completion.  I found the way to fit all required info into one integer,
suitable for atomic operations.  It opened race window on device close,
but addition of timeout to the msleep() there should cover it.

Profiling shows removal of significant spinning time on those mutexes
and IOPS increase from ~600K to >800K to NVMe on 72-core systems.

MFC after:	1 month
Sponsored by:	iXsystems, Inc.
2018-12-27 19:15:24 +00:00
Conrad Meyer
d2d82bfc90 gmirror: Remove a last-minute INVARIANTS breakage in r341840
I mistakenly added a lock assertion to this routine at the last minute
without confirming it was held during g_mirror_create.  It isn't (it isn't
even initialized yet).  Mea culpa.  Access is exclusive in both callers,
just not always by that particular lock.

Reported by:	lwhsu
X-MFC-With:	r341840, r341674
2018-12-12 18:13:56 +00:00
Conrad Meyer
23c25bd8b1 gmirror: Fix a bug introduced in r341674
r341674 inadvertently introduced a bug where newer mirror components being
tasted would clear the high sc_flags that are not controlled by component
metadata, such as G_MIRROR_DEVICE_FLAG_TASTING.  This could plausibly expose
a small window of time during STARTING where device destruction might race
with mirror component addition, probably resulting in a crash.

Reviewed by:	markj
X-MFC-With:	r341674
Differential Revision:	https://reviews.freebsd.org/D18521
2018-12-12 05:48:27 +00:00
Conrad Meyer
af7dcae0e2 gmirror: Evaluate mirror components against newest metadata copy
Re-apply r341665 with format strings fixed.

If we happen to taste a stale mirror component first, don't reject valid,
newer components that have differing metadata from the stale component
(during STARTING).  Instead, update our view of the most recent metadata as
we taste components.

Like mediasize beforehand, remove some checks from g_mirror_check_metadata
which would evict valid components due to metadata that can change over a
mirror's lifetime.  g_mirror_check_metadata is invoked long before we check
genid/syncid and decide which component(s) are newest and whether or not we
have quorum.

Before checking if we can enter RUNNING (i.e., we have quorum) after a NEW
component is added, first remove any known stale or inconsistent disks from
the mirrorset, rather than removing them *after* deciding we have quorum.
Check if we have quorum after removing these components.

Additionally, add a knob, kern.geom.mirror.launch_mirror_before_timeout, to
force gmirrors to wait out the full timeout (kern.geom.mirror.timeout)
before transitioning from STARTING to RUNNING.  This is a kludge to help
ensure all eligible, boot-time available mirror components are tasted before
RUNNING a gmirror.

Add a basic test case for STARTING -> RUNNING startup behavior around stale
genids.

PR:		232671, 232835
Submitted by:	Cindy Yang <cyang AT isilon.com> (previous version)
Reviewed by:	markj (kernel portions)
Discussed with:	asomers, Cindy Yang
Tested by:	pho
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D18062
2018-12-07 02:44:04 +00:00
Conrad Meyer
c4e87bdfc1 Revert r341665 due to tinderbox breakage
I didn't notice that some format strings were non-portable.  Will fix and
re-commit later.
2018-12-07 00:47:05 +00:00
Conrad Meyer
bc1ee0be2d gmirror: Evaluate mirror components against newest metadata copy
If we happen to taste a stale mirror component first, don't reject valid,
newer components that have differing metadata from the stale component
(during STARTING).  Instead, update our view of the most recent metadata as
we taste components.

Like mediasize beforehand, remove some checks from g_mirror_check_metadata
which would evict valid components due to metadata that can change over a
mirror's lifetime.  g_mirror_check_metadata is invoked long before we check
genid/syncid and decide which component(s) are newest and whether or not we
have quorum.

Before checking if we can enter RUNNING (i.e., we have quorum) after a NEW
component is added, first remove any known stale or inconsistent disks from
the mirrorset, rather than removing them *after* deciding we have quorum.
Check if we have quorum after removing these components.

Additionally, add a knob, kern.geom.mirror.launch_mirror_before_timeout, to
force gmirrors to wait out the full timeout (kern.geom.mirror.timeout)
before transitioning from STARTING to RUNNING.  This is a kludge to help
ensure all eligible, boot-time available mirror components are tasted before
RUNNING a gmirror.

When we are instructed to forget mirror components, bump the generation id
to avoid confusion with such stale components later.

Add a basic test case for STARTING -> RUNNING startup behavior around stale
genids.

PR:		232671, 232835
Submitted by:	Cindy Yang <cyang AT isilon.com> (previous version)
Reviewed by:	markj (kernel portions)
Discussed with:	asomers, Cindy Yang
Tested by:	pho
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D18062
2018-12-06 23:55:39 +00:00
Kirk McKusick
fb14e73cb4 Normally when an attempt is made to mount a UFS/FFS filesystem whose
superblock has a check-hash error, an error message noting the
superblock check-hash failure is printed and the mount fails. The
administrator then runs fsck to repair the filesystem and when
successful, the filesystem can once again be mounted.

This approach fails if the filesystem in question is a root filesystem
from which you are trying to boot. Here, the loader fails when trying
to access the filesystem to get the kernel to boot. So it is necessary
to allow the loader to ignore the superblock check-hash error and make
a best effort to read the kernel. The filesystem may be suffiently
corrupted that the read attempt fails, but there is no harm in trying
since the loader makes no attempt to write to the filesystem.

Once the kernel is loaded and starts to run, it attempts to mount its
root filesystem. Once again, failure means that it breaks to its prompt
to ask where to get its root filesystem. Unless you have an alternate
root filesystem, you are stuck.

Since the root filesystem is initially mounted read-only, it is
safe to make an attempt to mount the root filesystem with the failed
superblock check-hash. Thus, when asked to mount a root filesystem
with a failed superblock check-hash, the kernel prints a warning
message that the root filesystem superblock check-hash needs repair,
but notes that it is ignoring the error and proceeding. It does
mark the filesystem as needing an fsck which prevents it from being
enabled for writing until fsck has been run on it. The net effect
is that the reboot fails to single user, but at least at that point
the administrator has the tools at hand to fix the problem.

Reported by:    Rick Macklem (rmacklem@)
Discussed with: Warner Losh (imp@)
Sponsored by:   Netflix
2018-12-06 00:09:39 +00:00
Maxim Sobolev
9dcafe16d4 Another attempt to fix issue with the DIOCGDELETE ioctl(2) not
handling slightly out-of-bound requests properly (r340187).
Perform range check here rather then rely on g_delete_data() to DTRT.

The g_delete_data() would always return success for requests
starting just the next byte after providers media boundary.

MFC after:	4 weeks
2018-12-04 21:48:56 +00:00
Dag-Erling Smørgrav
cdd2df880d Add a “skip_dsn” option to g_part's bootcode verb to prevent g_part_mbr
from setting the volume serial number.  This unbreaks older boot blocks
that don't support serial numbers, and allows boot0cfg to set the serial
number itself if requested by the user.

Submitted by:	lev@, yuripv@
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D17386
2018-11-27 14:58:19 +00:00
Maxim Sobolev
de66da7374 Revert r340187, it breaks EOD (end-of-device) detection logic. Turns out,
i/o into last_sector+N is handled differently for N==1 and N>1 cases to
accomodate that, so some other approach would be needed to fix DIOCGDELETE
ioctl(2).
2018-11-07 16:28:09 +00:00
Maxim Sobolev
8948179aba Don't allow BIO_READ, BIO_WRITE or BIO_DELETE requests that are
fully beyond the end of providers media. The only exception is made
for the zero length transfers which are allowed to be just on the
boundary. Previously, any requests starting on the boundary (i.e. next
byte after the last one) have been allowed to go through.

No response from:	freebsd-geom@, phk
MFC after:		1 month
2018-11-06 15:55:41 +00:00
Mark Johnston
25c9cca757 Have gconcat advertise delete support if one of its disks does.
This follows the example set by other multi-disk GEOM classes.

PR:		232676
Tested by:	noah.bergbauer@tum.de
MFC after:	1 month
2018-10-30 00:22:14 +00:00
Eugene Grosbein
6d305ab0b2 Extend stripeoffset and stripesize of GEOMs from u_int to off_t
GEOM's stripeoffset overflows at 4 gigabyte margin (2^32)
because of its u_int type. This leads to incorrect data in the output
generated by "sysctl kern.geom.confxml" command, "graid list" etc.
when GEOM array has volumes larger than 4G, for example.

This change does not affect ABI but changes KBI. No MFC planned.

Differential Revision:	https://reviews.freebsd.org/D13426
2018-10-27 16:14:42 +00:00
Xin LI
0db665bb98 Restore backward compatibility for "attach" verb.
In r332361 and r333439, two new parameters were added to geli attach
verb using gctl_get_paraml, which requires the value to be present.
This would prevent old geli(8) binary from attaching geli(4) device
as they have no knowledge about the new parameters.

Restore backward compatibility by treating the absense of these two
values as seeing the default value supplied by userland.

PR:		232595
Reviewed by:	oshogbo
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D17680
2018-10-27 03:37:14 +00:00
Glen Barber
01d4e2149e MFH r338661 through r339200.
Sponsored by:	The FreeBSD Foundation
2018-10-05 17:53:47 +00:00
Alexander Motin
cf4a52cf67 Fix use-after-free in RAID0 error reporting of GEOM_RAID.
PR:		231510
Submitted by:	yangx92@hotmail.com
Approved by:	re (gjb)
MFC after:	1 week
2018-09-24 16:58:55 +00:00
Jung-uk Kim
9c40dcbe5f Make geli(8) buildable. 2018-09-19 07:08:04 +00:00
Conrad Meyer
1b0909d51a OpenCrypto: Convert sessions to opaque handles instead of integers
Track session objects in the framework, and pass handles between the
framework (OCF), consumers, and drivers.  Avoid redundancy and complexity in
individual drivers by allocating session memory in the framework and
providing it to drivers in ::newsession().

Session handles are no longer integers with information encoded in various
high bits.  Use of the CRYPTO_SESID2FOO() macros should be replaced with the
appropriate crypto_ses2foo() function on the opaque session handle.

Convert OCF drivers (in particular, cryptosoft, as well as myriad others) to
the opaque handle interface.  Discard existing session tracking as much as
possible (quick pass).  There may be additional code ripe for deletion.

Convert OCF consumers (ipsec, geom_eli, krb5, cryptodev) to handle-style
interface.  The conversion is largely mechnical.

The change is documented in crypto.9.

Inspired by
https://lists.freebsd.org/pipermail/freebsd-arch/2018-January/018835.html .

No objection from:	ae (ipsec portion)
Reported by:	jhb
2018-07-18 00:56:25 +00:00
Conrad Meyer
1df7f41560 OCF: Convert consumers to the session id typedef
These were missed in the earlier r336269.

No functional change.

Sponsored by:	Dell EMC Isilon
2018-07-16 19:01:05 +00:00
Mariusz Zaborski
78f79a9a08 Let geli deal with lost devices without crashing.
PR:		162036
Submitted by:	Fabian Keil <fk@fabiankeil.de>
Obtained from:	ElectroBSD
Discussed with: pjd@
2018-07-15 18:03:19 +00:00
Warner Losh
4bae19e9b8 g_eli_key_cmp is used only in the kernel, so only define it in the
kernel.
2018-07-13 18:21:38 +00:00
Mikolaj Golub
874774c5d4 geom_gate: enable resize
Reviewed By:	pjd
Approved By:	pjd
Differential Revision:	https://reviews.freebsd.org/D11531
2018-07-13 07:08:06 +00:00
Ed Maste
76db6c8773 gpart: add EFI alias for MBR partition scheme
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D15870
2018-06-17 20:10:48 +00:00
Ed Maste
a0a8412b2a Sort geom/part mbr/ebr/ldm alias table entries
Having the table entries in alpha order simplifies future additions.

Sponsored by:	The FreeBSD Foundation
2018-06-17 20:06:27 +00:00
Mariusz Zaborski
31f7586d73 Introduce the 'n' flag for the geli attach command.
If the 'n' flag is provided the provided key number will be used to
decrypt device. This can be used combined with dryrun to verify if the key
is set correctly. This can be also used to determine which key slot we want to
change on already attached device.

Reviewed by:	allanjude
Differential Revision:	https://reviews.freebsd.org/D15309
2018-05-09 20:53:38 +00:00
Mark Johnston
bd92e6b6f5 Refactor some of the MI kernel dump code in preparation for netdump.
- Add clear_dumper() to complement set_dumper().
- Drain netdump's preallocated mbuf pool when clearing the dumper.
- Don't do bounds checking for dumpers with mediasize 0.
- Add dumper callbacks for initialization for writing out headers.

Reviewed by:	sbruno
MFC after:	1 month
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D15252
2018-05-06 00:22:38 +00:00
Mark Johnston
681554d70b Remove a redundant assertion.
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
2018-05-06 00:05:03 +00:00
Mark Johnston
40e805221b Avoid dropping the topology lock in gmirror's dumpconf implementation.
Doing so introduces races which can lead to a use-after-free when
grabbing a snapshot of the GEOM mesh.

To ensure that a mirror's disk list remains stable, change its locking
protocol: both the softc lock and the topology lock are now required
to modify the list, so either lock is sufficient for traversal.

Tested by:	pho
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2018-05-06 00:03:24 +00:00
Ed Maste
b525a10ac0 gpart: add fat32lba MBR partition type
FAT32 partition with LBA addressing.

Reviewed by:	marcel
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D15266
2018-05-04 00:34:27 +00:00
Kyle Evans
74d6c131cb Annotate geom modules with MODULE_VERSION
GEOM ELI may double ask the password during boot. Once at loader time, and
once at init time.

This happens due a module loading bug. By default GEOM ELI caches the
password in the kernel, but without the MODULE_VERSION annotation, the
kernel loads over the kernel module, even if the GEOM ELI was compiled into
the kernel. In this case, the newly loaded module
purges/invalidates/overwrites the GEOM ELI's password cache, which causes
the double asking.

MFC Note: There's a pc98 component to the original submission that is
omitted here due to pc98 removal in head. This part will need to be revived
upon MFC.

Reviewed by:	imp
Submitted by:	op
Obtained from:	opBSD
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D14992
2018-04-10 19:18:16 +00:00
Mariusz Zaborski
8f1c45c20a Introduce dry run option for attaching the device.
This will allow us to verify if passphrase and key is valid without
decrypting whole device.

Reviewed by:	cem@, allanjude@
Differential Revision:	https://reviews.freebsd.org/D15000
2018-04-10 13:22:48 +00:00
Kyle Evans
2967ace894 Retire the geom_aes class
It's had a good life, but it's not really configurable and not really used.

Obtained from:	opBSD (with some changes)
Differential Revision:	https://reviews.freebsd.org/D14991
2018-04-09 17:30:30 +00:00
Brooks Davis
6469bdcdb6 Move most of the contents of opt_compat.h to opt_global.h.
opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c.  A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by:	kib, cem, jhb, jtl
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D14941
2018-04-06 17:35:35 +00:00
Sean Bruno
2c385d51ce Squash error from geom by sizing ident strings to DISK_IDENT_SIZE.
Display attribute in future error strings and differentiate g_handleattr()
error messages for ease of debugging in the future.

"g_handleattr: md1 bio_length 24 strlen 31 -> EFAULT"

Reported by:	swills
Reviewed by:	imp cem avg
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D14962
2018-04-05 13:56:40 +00:00
Kirk McKusick
fb15890a8c When freeing a superblock returned by ffs_sbget, be sure to also
free the superblock summary information.

Reported by: Peter Holm (pho@)
Tested by: Peter Holm (pho@)
2018-03-24 15:36:25 +00:00
Mariusz Zaborski
9ea857cf0f Remove unneeded variable which was introduced in r328472.
Pointed out by:	pjd@
2018-03-18 15:09:55 +00:00
Andriy Gapon
aca41af247 g_access: deal with races created by geoms that drop the topology lock
The problem is that g_access() must be called with the GEOM topology
lock held.  And that gives a false impression that the lock is indeed
held across the call.  But this isn't always true because many classes,
ZVOL being one of the many, need to drop the lock.  It's either to
perform an I/O on the first open or to acquire a different lock (like in
g_mirror_access).

That, of course, can break many assumptions.  For example,
g_slice_access() adds an extra exclusive count on the first open. As
described above, an underlying geom may drop the topology lock and that
would open a race with another thread that would also request another
extra exclusive count.  In general, two consumers may be granted
incompatible accesses.

To avoid this problem the code is changed to mark a geom with special
flag before calling its access method and clear the flag afterwards.  If
another thread sees that flag, then it means that the topology lock has
been dropped (either by the geom in question or downstream from it), so
it is not safe to make another access call.  So, the second thread would
use g_topology_sleep() to wait until the flag is cleared and only then
would it proceed with the access.

Also see http://docs.freebsd.org/cgi/mid.cgi?809d9254-ee56-59d8-69a4-08838e985cea

PR:		225960
Reported by:	asomers
Reviewed by:	markj, mav
MFC after:	3 weeks
Differential Revision: https://reviews.freebsd.org/D14533
2018-03-15 09:16:10 +00:00
Conrad Meyer
ee4d316fe7 g_part_gpt: Fix memory leak in error path
If g_part_gpt_read() encountered a disk with bad primary and secondary
tables, it could leak memory.

Reported by:	Coverity
Sponsored by:	Dell EMC Isilon
2018-03-07 01:55:50 +00:00
Conrad Meyer
90575a0ec9 g_label_ufs: Fix typo from r330264
Reported by:	O. Hartmann <o.hartmann AT walstatt.org>
Sponsored by:	Dell EMC Isilon
2018-03-02 06:02:54 +00:00
Kirk McKusick
efbf396426 This change is some refactoring of Mark Johnston's changes in r329375
to fix the memory leak that I introduced in r328426. Instead of
trying to clear up the possible memory leak in all the clients, I
ensure that it gets cleaned up in the source (e.g., ffs_sbget ensures
that memory is always freed if it returns an error).

The original change in r328426 was a bit sparse in its description.
So I am expanding on its description here (thanks cem@ and rgrimes@
for your encouragement for my longer commit messages).

In preparation for adding check hashing to superblocks, r328426 is
a refactoring of the code to get the reading/writing of the superblock
into one place. Unlike the cylinder group reading/writing which
ends up in two places (ffs_getcg/ffs_geom_strategy in the kernel
and cgget/cgput in libufs), I have the core superblock functions
just in the kernel (ffs_sbfetch/ffs_sbput in ffs_subr.c which is
already imported into utilities like fsck_ffs as well as libufs to
implement sbget/sbput). The ffs_sbfetch and ffs_sbput functions
take a function pointer to do the actual I/O for which there are
four variants:

    ffs_use_bread / ffs_use_bwrite for the in-kernel filesystem

    g_use_g_read_data / g_use_g_write_data for kernel geom clients

    ufs_use_sa_read for the standalone code (stand/libsa/ufs.c
	but not stand/libsa/ufsread.c which is size constrained)

    use_pread / use_pwrite for libufs

Uses of these interfaces are in the UFS filesystem, geoms journal &
label, libsa changes, and libufs. They also permeate out into the
filesystem utilities fsck_ffs, newfs, growfs, clri, dump, quotacheck,
fsirand, fstyp, and quot. Some of these utilities should probably be
converted to directly use libufs (like dumpfs was for example), but
there does not seem to be much win in doing so.

Tested by: Peter Holm (pho@)
2018-03-02 04:34:53 +00:00
Mark Johnston
16759360d4 Fix a memory leak introduced in r328426.
ffs_sbget() may return a superblock buffer even if it fails, so the
caller must be prepared to free it in this case. Moreover, when tasting
alternate superblock locations in a loop, ffs_sbget()'s readfunc
callback must free the previously allocated buffer.

Reported and tested by:	pho
Reviewed by:		kib (previous version)
Differential Revision:	https://reviews.freebsd.org/D14390
2018-02-16 15:41:03 +00:00