freebsd-dev/share/man/man9/bus_dma.9

1344 lines
42 KiB
Groff
Raw Normal View History

.\" Copyright (c) 2002, 2003 Hiten M. Pandya.
.\" All rights reserved.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions, and the following disclaimer,
.\" without modification, immediately at the beginning of the file.
.\" 2. The name of the author may not be used to endorse or promote products
.\" derived from this software without specific prior written permission.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR, CONTRIBUTORS OR THE
.\" VOICES IN HITEN PANDYA'S HEAD BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
.\" SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
.\" TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
.\" PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
.\" LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
.\" NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
.\" SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
.\"
.\" Copyright (c) 1996, 1997, 1998, 2001 The NetBSD Foundation, Inc.
.\" All rights reserved.
.\"
.\" This code is derived from software contributed to The NetBSD Foundation
.\" by Jason R. Thorpe of the Numerical Aerospace Simulation Facility,
.\" NASA Ames Research Center.
.\"
.\" Redistribution and use in source and binary forms, with or without
.\" modification, are permitted provided that the following conditions
.\" are met:
.\" 1. Redistributions of source code must retain the above copyright
.\" notice, this list of conditions and the following disclaimer.
.\" 2. Redistributions in binary form must reproduce the above copyright
.\" notice, this list of conditions and the following disclaimer in the
.\" documentation and/or other materials provided with the distribution.
.\"
.\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
.\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
.\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
.\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
.\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
.\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
.\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
.\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
.\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
.\" POSSIBILITY OF SUCH DAMAGE.
.\"
.\" $FreeBSD$
.\" $NetBSD: bus_dma.9,v 1.25 2002/10/14 13:43:16 wiz Exp $
.\"
Add support for optional separate output buffers to in-kernel crypto. Some crypto consumers such as GELI and KTLS for file-backed sendfile need to store their output in a separate buffer from the input. Currently these consumers copy the contents of the input buffer into the output buffer and queue an in-place crypto operation on the output buffer. Using a separate output buffer avoids this copy. - Create a new 'struct crypto_buffer' describing a crypto buffer containing a type and type-specific fields. crp_ilen is gone, instead buffers that use a flat kernel buffer have a cb_buf_len field for their length. The length of other buffer types is inferred from the backing store (e.g. uio_resid for a uio). Requests now have two such structures: crp_buf for the input buffer, and crp_obuf for the output buffer. - Consumers now use helper functions (crypto_use_*, e.g. crypto_use_mbuf()) to configure the input buffer. If an output buffer is not configured, the request still modifies the input buffer in-place. A consumer uses a second set of helper functions (crypto_use_output_*) to configure an output buffer. - Consumers must request support for separate output buffers when creating a crypto session via the CSP_F_SEPARATE_OUTPUT flag and are only permitted to queue a request with a separate output buffer on sessions with this flag set. Existing drivers already reject sessions with unknown flags, so this permits drivers to be modified to support this extension without requiring all drivers to change. - Several data-related functions now have matching versions that operate on an explicit buffer (e.g. crypto_apply_buf, crypto_contiguous_subsegment_buf, bus_dma_load_crp_buf). - Most of the existing data-related functions operate on the input buffer. However crypto_copyback always writes to the output buffer if a request uses a separate output buffer. - For the regions in input/output buffers, the following conventions are followed: - AAD and IV are always present in input only and their fields are offsets into the input buffer. - payload is always present in both buffers. If a request uses a separate output buffer, it must set a new crp_payload_start_output field to the offset of the payload in the output buffer. - digest is in the input buffer for verify operations, and in the output buffer for compute operations. crp_digest_start is relative to the appropriate buffer. - Add a crypto buffer cursor abstraction. This is a more general form of some bits in the cryptosoft driver that tried to always use uio's. However, compared to the original code, this avoids rewalking the uio iovec array for requests with multiple vectors. It also avoids allocate an iovec array for mbufs and populating it by instead walking the mbuf chain directly. - Update the cryptosoft(4) driver to support separate output buffers making use of the cursor abstraction. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24545
2020-05-25 22:12:04 +00:00
.Dd May 25, 2020
.Dt BUS_DMA 9
.Os
.Sh NAME
.Nm bus_dma ,
.Nm bus_dma_tag_create ,
.Nm bus_dma_tag_destroy ,
.Nm bus_dma_template_init ,
.Nm bus_dma_template_tag ,
.Nm bus_dma_template_clone ,
.Nm bus_dma_template_fill ,
.Nm BUS_DMA_TEMPLATE_FILL ,
.Nm bus_dmamap_create ,
.Nm bus_dmamap_destroy ,
.Nm bus_dmamap_load ,
.Nm bus_dmamap_load_bio ,
.Nm bus_dmamap_load_ccb ,
Refactor driver and consumer interfaces for OCF (in-kernel crypto). - The linked list of cryptoini structures used in session initialization is replaced with a new flat structure: struct crypto_session_params. This session includes a new mode to define how the other fields should be interpreted. Available modes include: - COMPRESS (for compression/decompression) - CIPHER (for simply encryption/decryption) - DIGEST (computing and verifying digests) - AEAD (combined auth and encryption such as AES-GCM and AES-CCM) - ETA (combined auth and encryption using encrypt-then-authenticate) Additional modes could be added in the future (e.g. if we wanted to support TLS MtE for AES-CBC in the kernel we could add a new mode for that. TLS modes might also affect how AAD is interpreted, etc.) The flat structure also includes the key lengths and algorithms as before. However, code doesn't have to walk the linked list and switch on the algorithm to determine which key is the auth key vs encryption key. The 'csp_auth_*' fields are always used for auth keys and settings and 'csp_cipher_*' for cipher. (Compression algorithms are stored in csp_cipher_alg.) - Drivers no longer register a list of supported algorithms. This doesn't quite work when you factor in modes (e.g. a driver might support both AES-CBC and SHA2-256-HMAC separately but not combined for ETA). Instead, a new 'crypto_probesession' method has been added to the kobj interface for symmteric crypto drivers. This method returns a negative value on success (similar to how device_probe works) and the crypto framework uses this value to pick the "best" driver. There are three constants for hardware (e.g. ccr), accelerated software (e.g. aesni), and plain software (cryptosoft) that give preference in that order. One effect of this is that if you request only hardware when creating a new session, you will no longer get a session using accelerated software. Another effect is that the default setting to disallow software crypto via /dev/crypto now disables accelerated software. Once a driver is chosen, 'crypto_newsession' is invoked as before. - Crypto operations are now solely described by the flat 'cryptop' structure. The linked list of descriptors has been removed. A separate enum has been added to describe the type of data buffer in use instead of using CRYPTO_F_* flags to make it easier to add more types in the future if needed (e.g. wired userspace buffers for zero-copy). It will also make it easier to re-introduce separate input and output buffers (in-kernel TLS would benefit from this). Try to make the flags related to IV handling less insane: - CRYPTO_F_IV_SEPARATE means that the IV is stored in the 'crp_iv' member of the operation structure. If this flag is not set, the IV is stored in the data buffer at the 'crp_iv_start' offset. - CRYPTO_F_IV_GENERATE means that a random IV should be generated and stored into the data buffer. This cannot be used with CRYPTO_F_IV_SEPARATE. If a consumer wants to deal with explicit vs implicit IVs, etc. it can always generate the IV however it needs and store partial IVs in the buffer and the full IV/nonce in crp_iv and set CRYPTO_F_IV_SEPARATE. The layout of the buffer is now described via fields in cryptop. crp_aad_start and crp_aad_length define the boundaries of any AAD. Previously with GCM and CCM you defined an auth crd with this range, but for ETA your auth crd had to span both the AAD and plaintext (and they had to be adjacent). crp_payload_start and crp_payload_length define the boundaries of the plaintext/ciphertext. Modes that only do a single operation (COMPRESS, CIPHER, DIGEST) should only use this region and leave the AAD region empty. If a digest is present (or should be generated), it's starting location is marked by crp_digest_start. Instead of using the CRD_F_ENCRYPT flag to determine the direction of the operation, cryptop now includes an 'op' field defining the operation to perform. For digests I've added a new VERIFY digest mode which assumes a digest is present in the input and fails the request with EBADMSG if it doesn't match the internally-computed digest. GCM and CCM already assumed this, and the new AEAD mode requires this for decryption. The new ETA mode now also requires this for decryption, so IPsec and GELI no longer do their own authentication verification. Simple DIGEST operations can also do this, though there are no in-tree consumers. To eventually support some refcounting to close races, the session cookie is now passed to crypto_getop() and clients should no longer set crp_sesssion directly. - Assymteric crypto operation structures should be allocated via crypto_getkreq() and freed via crypto_freekreq(). This permits the crypto layer to track open asym requests and close races with a driver trying to unregister while asym requests are in flight. - crypto_copyback, crypto_copydata, crypto_apply, and crypto_contiguous_subsegment now accept the 'crp' object as the first parameter instead of individual members. This makes it easier to deal with different buffer types in the future as well as separate input and output buffers. It's also simpler for driver writers to use. - bus_dmamap_load_crp() loads a DMA mapping for a crypto buffer. This understands the various types of buffers so that drivers that use DMA do not have to be aware of different buffer types. - Helper routines now exist to build an auth context for HMAC IPAD and OPAD. This reduces some duplicated work among drivers. - Key buffers are now treated as const throughout the framework and in device drivers. However, session key buffers provided when a session is created are expected to remain alive for the duration of the session. - GCM and CCM sessions now only specify a cipher algorithm and a cipher key. The redundant auth information is not needed or used. - For cryptosoft, split up the code a bit such that the 'process' callback now invokes a function pointer in the session. This function pointer is set based on the mode (in effect) though it simplifies a few edge cases that would otherwise be in the switch in 'process'. It does split up GCM vs CCM which I think is more readable even if there is some duplication. - I changed /dev/crypto to support GMAC requests using CRYPTO_AES_NIST_GMAC as an auth algorithm and updated cryptocheck to work with it. - Combined cipher and auth sessions via /dev/crypto now always use ETA mode. The COP_F_CIPHER_FIRST flag is now a no-op that is ignored. This was actually documented as being true in crypto(4) before, but the code had not implemented this before I added the CIPHER_FIRST flag. - I have not yet updated /dev/crypto to be aware of explicit modes for sessions. I will probably do that at some point in the future as well as teach it about IV/nonce and tag lengths for AEAD so we can support all of the NIST KAT tests for GCM and CCM. - I've split up the exising crypto.9 manpage into several pages of which many are written from scratch. - I have converted all drivers and consumers in the tree and verified that they compile, but I have not tested all of them. I have tested the following drivers: - cryptosoft - aesni (AES only) - blake2 - ccr and the following consumers: - cryptodev - IPsec - ktls_ocf - GELI (lightly) I have not tested the following: - ccp - aesni with sha - hifn - kgssapi_krb5 - ubsec - padlock - safe - armv8_crypto (aarch64) - glxsb (i386) - sec (ppc) - cesa (armv7) - cryptocteon (mips64) - nlmsec (mips64) Discussed with: cem Relnotes: yes Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D23677
2020-03-27 18:25:23 +00:00
.Nm bus_dmamap_load_crp ,
Add support for optional separate output buffers to in-kernel crypto. Some crypto consumers such as GELI and KTLS for file-backed sendfile need to store their output in a separate buffer from the input. Currently these consumers copy the contents of the input buffer into the output buffer and queue an in-place crypto operation on the output buffer. Using a separate output buffer avoids this copy. - Create a new 'struct crypto_buffer' describing a crypto buffer containing a type and type-specific fields. crp_ilen is gone, instead buffers that use a flat kernel buffer have a cb_buf_len field for their length. The length of other buffer types is inferred from the backing store (e.g. uio_resid for a uio). Requests now have two such structures: crp_buf for the input buffer, and crp_obuf for the output buffer. - Consumers now use helper functions (crypto_use_*, e.g. crypto_use_mbuf()) to configure the input buffer. If an output buffer is not configured, the request still modifies the input buffer in-place. A consumer uses a second set of helper functions (crypto_use_output_*) to configure an output buffer. - Consumers must request support for separate output buffers when creating a crypto session via the CSP_F_SEPARATE_OUTPUT flag and are only permitted to queue a request with a separate output buffer on sessions with this flag set. Existing drivers already reject sessions with unknown flags, so this permits drivers to be modified to support this extension without requiring all drivers to change. - Several data-related functions now have matching versions that operate on an explicit buffer (e.g. crypto_apply_buf, crypto_contiguous_subsegment_buf, bus_dma_load_crp_buf). - Most of the existing data-related functions operate on the input buffer. However crypto_copyback always writes to the output buffer if a request uses a separate output buffer. - For the regions in input/output buffers, the following conventions are followed: - AAD and IV are always present in input only and their fields are offsets into the input buffer. - payload is always present in both buffers. If a request uses a separate output buffer, it must set a new crp_payload_start_output field to the offset of the payload in the output buffer. - digest is in the input buffer for verify operations, and in the output buffer for compute operations. crp_digest_start is relative to the appropriate buffer. - Add a crypto buffer cursor abstraction. This is a more general form of some bits in the cryptosoft driver that tried to always use uio's. However, compared to the original code, this avoids rewalking the uio iovec array for requests with multiple vectors. It also avoids allocate an iovec array for mbufs and populating it by instead walking the mbuf chain directly. - Update the cryptosoft(4) driver to support separate output buffers making use of the cursor abstraction. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24545
2020-05-25 22:12:04 +00:00
.Nm bus_dmamap_load_crp_buffer ,
.Nm bus_dmamap_load_mbuf ,
2005-01-15 20:50:52 +00:00
.Nm bus_dmamap_load_mbuf_sg ,
.Nm bus_dmamap_load_uio ,
.Nm bus_dmamap_unload ,
.Nm bus_dmamap_sync ,
.Nm bus_dmamem_alloc ,
.Nm bus_dmamem_free
.Nd Bus and Machine Independent DMA Mapping Interface
.Sh SYNOPSIS
.In machine/bus.h
.Ft int
.Fn bus_dma_tag_create "bus_dma_tag_t parent" "bus_size_t alignment" \
"bus_addr_t boundary" "bus_addr_t lowaddr" "bus_addr_t highaddr" \
"bus_dma_filter_t *filtfunc" "void *filtfuncarg" "bus_size_t maxsize" \
"int nsegments" "bus_size_t maxsegsz" "int flags" "bus_dma_lock_t *lockfunc" \
"void *lockfuncarg" "bus_dma_tag_t *dmat"
.Ft int
.Fn bus_dma_tag_destroy "bus_dma_tag_t dmat"
.Ft void
.Fo bus_dma_template_init
.Fa "bus_dma_template_t *template"
.Fa "bus_dma_tag_t parent"
.Fc
.Ft int
.Fo bus_dma_template_tag
.Fa "bus_dma_template_t *template"
.Fa "bus_dma_tag_t *dmat"
.Fc
.Ft void
.Fo bus_dma_template_clone
.Fa "bus_dma_template_t *template"
.Fa "bus_dma_tag_t dmat"
.Fc
.Ft void
.Fo bus_dma_template_fill
.Fa "bus_dma_template_t *template"
.Fa "bus_dma_param_t params[]"
.Fa "u_int count"
.Fc
.Fo BUS_DMA_TEMPLATE_FILL
.Fa "bus_dma_template_t *template"
.Fa "bus_dma_param_t param ..."
.Fc
.Ft int
.Fn bus_dmamap_create "bus_dma_tag_t dmat" "int flags" "bus_dmamap_t *mapp"
.Ft int
.Fn bus_dmamap_destroy "bus_dma_tag_t dmat" "bus_dmamap_t map"
.Ft int
.Fn bus_dmamap_load "bus_dma_tag_t dmat" "bus_dmamap_t map" "void *buf" \
"bus_size_t buflen" "bus_dmamap_callback_t *callback" "void *callback_arg" \
"int flags"
.Ft int
.Fn bus_dmamap_load_bio "bus_dma_tag_t dmat" "bus_dmamap_t map" \
"struct bio *bio" "bus_dmamap_callback_t *callback" "void *callback_arg" \
"int flags"
.Ft int
.Fn bus_dmamap_load_ccb "bus_dma_tag_t dmat" "bus_dmamap_t map" \
"union ccb *ccb" "bus_dmamap_callback_t *callback" "void *callback_arg" \
"int flags"
.Ft int
Refactor driver and consumer interfaces for OCF (in-kernel crypto). - The linked list of cryptoini structures used in session initialization is replaced with a new flat structure: struct crypto_session_params. This session includes a new mode to define how the other fields should be interpreted. Available modes include: - COMPRESS (for compression/decompression) - CIPHER (for simply encryption/decryption) - DIGEST (computing and verifying digests) - AEAD (combined auth and encryption such as AES-GCM and AES-CCM) - ETA (combined auth and encryption using encrypt-then-authenticate) Additional modes could be added in the future (e.g. if we wanted to support TLS MtE for AES-CBC in the kernel we could add a new mode for that. TLS modes might also affect how AAD is interpreted, etc.) The flat structure also includes the key lengths and algorithms as before. However, code doesn't have to walk the linked list and switch on the algorithm to determine which key is the auth key vs encryption key. The 'csp_auth_*' fields are always used for auth keys and settings and 'csp_cipher_*' for cipher. (Compression algorithms are stored in csp_cipher_alg.) - Drivers no longer register a list of supported algorithms. This doesn't quite work when you factor in modes (e.g. a driver might support both AES-CBC and SHA2-256-HMAC separately but not combined for ETA). Instead, a new 'crypto_probesession' method has been added to the kobj interface for symmteric crypto drivers. This method returns a negative value on success (similar to how device_probe works) and the crypto framework uses this value to pick the "best" driver. There are three constants for hardware (e.g. ccr), accelerated software (e.g. aesni), and plain software (cryptosoft) that give preference in that order. One effect of this is that if you request only hardware when creating a new session, you will no longer get a session using accelerated software. Another effect is that the default setting to disallow software crypto via /dev/crypto now disables accelerated software. Once a driver is chosen, 'crypto_newsession' is invoked as before. - Crypto operations are now solely described by the flat 'cryptop' structure. The linked list of descriptors has been removed. A separate enum has been added to describe the type of data buffer in use instead of using CRYPTO_F_* flags to make it easier to add more types in the future if needed (e.g. wired userspace buffers for zero-copy). It will also make it easier to re-introduce separate input and output buffers (in-kernel TLS would benefit from this). Try to make the flags related to IV handling less insane: - CRYPTO_F_IV_SEPARATE means that the IV is stored in the 'crp_iv' member of the operation structure. If this flag is not set, the IV is stored in the data buffer at the 'crp_iv_start' offset. - CRYPTO_F_IV_GENERATE means that a random IV should be generated and stored into the data buffer. This cannot be used with CRYPTO_F_IV_SEPARATE. If a consumer wants to deal with explicit vs implicit IVs, etc. it can always generate the IV however it needs and store partial IVs in the buffer and the full IV/nonce in crp_iv and set CRYPTO_F_IV_SEPARATE. The layout of the buffer is now described via fields in cryptop. crp_aad_start and crp_aad_length define the boundaries of any AAD. Previously with GCM and CCM you defined an auth crd with this range, but for ETA your auth crd had to span both the AAD and plaintext (and they had to be adjacent). crp_payload_start and crp_payload_length define the boundaries of the plaintext/ciphertext. Modes that only do a single operation (COMPRESS, CIPHER, DIGEST) should only use this region and leave the AAD region empty. If a digest is present (or should be generated), it's starting location is marked by crp_digest_start. Instead of using the CRD_F_ENCRYPT flag to determine the direction of the operation, cryptop now includes an 'op' field defining the operation to perform. For digests I've added a new VERIFY digest mode which assumes a digest is present in the input and fails the request with EBADMSG if it doesn't match the internally-computed digest. GCM and CCM already assumed this, and the new AEAD mode requires this for decryption. The new ETA mode now also requires this for decryption, so IPsec and GELI no longer do their own authentication verification. Simple DIGEST operations can also do this, though there are no in-tree consumers. To eventually support some refcounting to close races, the session cookie is now passed to crypto_getop() and clients should no longer set crp_sesssion directly. - Assymteric crypto operation structures should be allocated via crypto_getkreq() and freed via crypto_freekreq(). This permits the crypto layer to track open asym requests and close races with a driver trying to unregister while asym requests are in flight. - crypto_copyback, crypto_copydata, crypto_apply, and crypto_contiguous_subsegment now accept the 'crp' object as the first parameter instead of individual members. This makes it easier to deal with different buffer types in the future as well as separate input and output buffers. It's also simpler for driver writers to use. - bus_dmamap_load_crp() loads a DMA mapping for a crypto buffer. This understands the various types of buffers so that drivers that use DMA do not have to be aware of different buffer types. - Helper routines now exist to build an auth context for HMAC IPAD and OPAD. This reduces some duplicated work among drivers. - Key buffers are now treated as const throughout the framework and in device drivers. However, session key buffers provided when a session is created are expected to remain alive for the duration of the session. - GCM and CCM sessions now only specify a cipher algorithm and a cipher key. The redundant auth information is not needed or used. - For cryptosoft, split up the code a bit such that the 'process' callback now invokes a function pointer in the session. This function pointer is set based on the mode (in effect) though it simplifies a few edge cases that would otherwise be in the switch in 'process'. It does split up GCM vs CCM which I think is more readable even if there is some duplication. - I changed /dev/crypto to support GMAC requests using CRYPTO_AES_NIST_GMAC as an auth algorithm and updated cryptocheck to work with it. - Combined cipher and auth sessions via /dev/crypto now always use ETA mode. The COP_F_CIPHER_FIRST flag is now a no-op that is ignored. This was actually documented as being true in crypto(4) before, but the code had not implemented this before I added the CIPHER_FIRST flag. - I have not yet updated /dev/crypto to be aware of explicit modes for sessions. I will probably do that at some point in the future as well as teach it about IV/nonce and tag lengths for AEAD so we can support all of the NIST KAT tests for GCM and CCM. - I've split up the exising crypto.9 manpage into several pages of which many are written from scratch. - I have converted all drivers and consumers in the tree and verified that they compile, but I have not tested all of them. I have tested the following drivers: - cryptosoft - aesni (AES only) - blake2 - ccr and the following consumers: - cryptodev - IPsec - ktls_ocf - GELI (lightly) I have not tested the following: - ccp - aesni with sha - hifn - kgssapi_krb5 - ubsec - padlock - safe - armv8_crypto (aarch64) - glxsb (i386) - sec (ppc) - cesa (armv7) - cryptocteon (mips64) - nlmsec (mips64) Discussed with: cem Relnotes: yes Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D23677
2020-03-27 18:25:23 +00:00
.Fn bus_dmamap_load_crp "bus_dma_tag_t dmat" "bus_dmamap_t map" \
"struct crypto *crp" "bus_dmamap_callback_t *callback" "void *callback_arg" \
"int flags"
.Ft int
Add support for optional separate output buffers to in-kernel crypto. Some crypto consumers such as GELI and KTLS for file-backed sendfile need to store their output in a separate buffer from the input. Currently these consumers copy the contents of the input buffer into the output buffer and queue an in-place crypto operation on the output buffer. Using a separate output buffer avoids this copy. - Create a new 'struct crypto_buffer' describing a crypto buffer containing a type and type-specific fields. crp_ilen is gone, instead buffers that use a flat kernel buffer have a cb_buf_len field for their length. The length of other buffer types is inferred from the backing store (e.g. uio_resid for a uio). Requests now have two such structures: crp_buf for the input buffer, and crp_obuf for the output buffer. - Consumers now use helper functions (crypto_use_*, e.g. crypto_use_mbuf()) to configure the input buffer. If an output buffer is not configured, the request still modifies the input buffer in-place. A consumer uses a second set of helper functions (crypto_use_output_*) to configure an output buffer. - Consumers must request support for separate output buffers when creating a crypto session via the CSP_F_SEPARATE_OUTPUT flag and are only permitted to queue a request with a separate output buffer on sessions with this flag set. Existing drivers already reject sessions with unknown flags, so this permits drivers to be modified to support this extension without requiring all drivers to change. - Several data-related functions now have matching versions that operate on an explicit buffer (e.g. crypto_apply_buf, crypto_contiguous_subsegment_buf, bus_dma_load_crp_buf). - Most of the existing data-related functions operate on the input buffer. However crypto_copyback always writes to the output buffer if a request uses a separate output buffer. - For the regions in input/output buffers, the following conventions are followed: - AAD and IV are always present in input only and their fields are offsets into the input buffer. - payload is always present in both buffers. If a request uses a separate output buffer, it must set a new crp_payload_start_output field to the offset of the payload in the output buffer. - digest is in the input buffer for verify operations, and in the output buffer for compute operations. crp_digest_start is relative to the appropriate buffer. - Add a crypto buffer cursor abstraction. This is a more general form of some bits in the cryptosoft driver that tried to always use uio's. However, compared to the original code, this avoids rewalking the uio iovec array for requests with multiple vectors. It also avoids allocate an iovec array for mbufs and populating it by instead walking the mbuf chain directly. - Update the cryptosoft(4) driver to support separate output buffers making use of the cursor abstraction. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24545
2020-05-25 22:12:04 +00:00
.Fn bus_dmamap_load_crp_buffer "bus_dma_tag_t dmat" "bus_dmamap_t map" \
"struct crypto_buffer *cb" "bus_dmamap_callback_t *callback" \
"void *callback_arg" "int flags"
.Ft int
.Fn bus_dmamap_load_mbuf "bus_dma_tag_t dmat" "bus_dmamap_t map" \
"struct mbuf *mbuf" "bus_dmamap_callback2_t *callback" "void *callback_arg" \
"int flags"
.Ft int
2005-01-15 20:50:52 +00:00
.Fn bus_dmamap_load_mbuf_sg "bus_dma_tag_t dmat" "bus_dmamap_t map" \
"struct mbuf *mbuf" "bus_dma_segment_t *segs" "int *nsegs" "int flags"
.Ft int
.Fn bus_dmamap_load_uio "bus_dma_tag_t dmat" "bus_dmamap_t map" \
"struct uio *uio" "bus_dmamap_callback2_t *callback" "void *callback_arg" \
"int flags"
.Ft void
.Fn bus_dmamap_unload "bus_dma_tag_t dmat" "bus_dmamap_t map"
.Ft void
.Fn bus_dmamap_sync "bus_dma_tag_t dmat" "bus_dmamap_t map" \
"op"
.Ft int
.Fn bus_dmamem_alloc "bus_dma_tag_t dmat" "void **vaddr" \
"int flags" "bus_dmamap_t *mapp"
.Ft void
.Fn bus_dmamem_free "bus_dma_tag_t dmat" "void *vaddr" \
"bus_dmamap_t map"
.Sh DESCRIPTION
Direct Memory Access (DMA) is a method of transferring data
without involving the CPU, thus providing higher performance.
A DMA transaction can be achieved between device to memory,
device to device, or memory to memory.
.Pp
The
.Nm
API is a bus, device, and machine-independent (MI) interface to
DMA mechanisms.
It provides the client with flexibility and simplicity by
abstracting machine dependent issues like setting up
DMA mappings, handling cache issues, bus specific features
and limitations.
.Sh OVERVIEW
A tag structure
.Vt ( bus_dma_tag_t )
is used to describe the properties of a group of related DMA
transactions.
One way to view this is that a tag describes the limitations of a DMA engine.
For example, if a DMA engine in a device is limited to 32-bit addresses,
that limitation is specified by a parameter when creating the tag
for that device.
Similarly, a tag can be marked as requiring buffers whose addresses are
aligned to a specific boundary.
.Pp
Some devices may require multiple tags to describe DMA
transactions with differing properties.
For example, a device might require 16-byte alignment of its descriptor ring
while permitting arbitrary alignment of I/O buffers.
In this case,
the driver must create one tag for the descriptor ring and a separate tag for
I/O buffers.
If a device has restrictions that are common to all DMA transactions
in addition to restrictions that differ between unrelated groups of
transactions,
the driver can first create a
.Dq parent
tag that decribes the common restrictions.
The per-group tags can then inherit these restrictions from this
.Dq parent
tag rather than having to list them explicitly when creating the per-group tags.
.Pp
A mapping structure
.Vt ( bus_dmamap_t )
represents a mapping of a memory region for DMA.
On systems with I/O MMUs,
the mapping structure tracks any I/O MMU entries used by a request.
For DMA requests that require bounce pages,
the mapping tracks the bounce pages used.
.Pp
To prepare for one or more DMA transactions,
a mapping must be bound to a memory region by calling one of the
.Fn bus_dmamap_load
functions.
These functions configure the mapping which can include programming entries
in an I/O MMU and/or allocating bounce pages.
An output of these functions
(either directly or indirectly by invoking a callback routine)
is the list of scatter/gather address ranges a consumer can pass to a DMA
engine to access the memory region.
When a mapping is no longer needed,
the mapping must be unloaded via
.Fn bus_dmamap_unload .
.Pp
Before and after each DMA transaction,
.Fn bus_dmamap_sync
must be used to ensure that the correct data is used by the DMA engine and
the CPU.
If a mapping uses bounce pages,
the sync operations copy data between the bounce pages and the memory region
bound to the mapping.
Sync operations also handle architecture-specific details such as CPU cache
flushing and CPU memory operation ordering.
.Sh STATIC VS DYNAMIC
.Nm
handles two types of DMA transactions: static and dynamic.
Static transactions are used with a long-lived memory region that is reused
for many transactions such as a descriptor ring.
Dynamic transactions are used for transfers to or from transient buffers
such as I/O buffers holding a network packet or disk block.
Each transaction type uses a different subset of the
.Nm
API.
.Ss Static Transactions
Static transactions use memory regions allocated by
.Nm .
Each static memory region is allocated by calling
.Fn bus_dmamem_alloc .
This function requires a valid tag describing the properties of the
DMA transactions to this region such as alignment or address restrictions.
Multiple regions can share a single tag if they share the same restrictions.
.Pp
.Fn bus_dmamem_alloc
allocates a memory region along with a mapping object.
The associated tag, memory region, and mapping object must then be passed to
.Fn bus_dmamap_load
to bind the mapping to the allocated region and obtain the
scatter/gather list.
.Pp
It is expected that
.Fn bus_dmamem_alloc
will attempt to allocate memory requiring less expensive sync operations
(for example, implementations should not allocate regions requiring bounce
pages),
but sync operations should still be used.
For example, a driver should use
.Fn bus_dmamap_sync
in an interrupt handler before reading descriptor ring entries written by the
device prior to the interrupt.
.Pp
When a consumer is finished with a memory region,
it should unload the mapping via
.Fn bus_dmamap_unload
and then release the memory region and mapping object via
.Fn bus_dmamem_free .
.Ss Dynamic Transactions
Dynamic transactions map memory regions provided by other parts of the system.
A tag must be created via
.Fn bus_dma_tag_create
to describe the DMA transactions to and from these memory regions,
and a pool of mapping objects must be allocated via
.Fn bus_dmamap_create
to track the mappings of any in-flight transactions.
.Pp
When a consumer wishes to schedule a transaction for a memory region,
the consumer must first obtain an unused mapping object from its pool
of mapping objects.
The memory region must be bound to the mapping object via one of the
.Fn bus_dmamap_load
functions.
Before scheduling the transaction,
the consumer should sync the memory region via
.Fn bus_dmamap_sync
with one or more of the
.Dq PRE
flags.
After the transaction has completed,
the consumer should sync the memory region via
.Fn bus_dmamap_sync
with one or more of the
.Dq POST
flags.
The mapping can then be unloaded via
.Fn bus_dmamap_unload ,
and the mapping object can be returned to the pool of unused mapping objects.
.Pp
When a consumer is no longer scheduling DMA transactions,
the mapping objects should be freed via
.Fn bus_dmamap_destroy ,
and the tag should be freed via
.Fn bus_dma_tag_destroy .
.Sh STRUCTURES AND TYPES
2006-09-18 15:24:20 +00:00
.Bl -tag -width indent
.It Vt bus_dma_tag_t
A machine-dependent (MD) opaque type that describes the
characteristics of a group of DMA transactions.
DMA tags are organized into a hierarchy, with each child
tag inheriting the restrictions of its parent.
This allows all devices along the path of DMA transactions
to contribute to the constraints of those transactions.
.It Vt bus_dma_template_t
A template is a structure for creating a
.Fa bus_dma_tag_t
from a set of defaults.
Once initialized with
.Fn bus_dma_template_init ,
a driver can over-ride individual fields to suit its needs.
The following fields start with the indicated default values:
.Bd -literal
alignment 1
boundary 0
lowaddr BUS_SPACE_MAXADDR
highaddr BUS_SPACE_MAXADDR
maxsize BUS_SPACE_MAXSIZE
nsegments BUS_SPACE_UNRESTRICTED
maxsegsize BUS_SPACE_MAXSIZE
flags 0
lockfunc NULL
lockfuncarg NULL
.Ed
.Pp
Descriptions of each field are documented with
.Fn bus_dma_tag_create .
Note that the
.Fa filtfunc
and
.Fa filtfuncarg
attributes of the DMA tag are not supported with templates.
.It Vt bus_dma_filter_t
Client specified address filter having the format:
2006-09-18 15:24:20 +00:00
.Bl -tag -width indent
.It Ft int
.Fn "client_filter" "void *filtarg" "bus_addr_t testaddr"
.El
2006-09-18 15:24:20 +00:00
.Pp
Address filters can be specified during tag creation to allow
2005-02-13 23:23:30 +00:00
for devices whose DMA address restrictions cannot be specified
by a single window.
The
.Fa filtarg
argument is specified by the client during tag creation to be passed to all
invocations of the callback.
The
.Fa testaddr
argument contains a potential starting address of a DMA mapping.
The filter function operates on the set of addresses from
.Fa testaddr
to
.Ql trunc_page(testaddr) + PAGE_SIZE - 1 ,
inclusive.
The filter function should return zero if any mapping in this range
can be accommodated by the device and non-zero otherwise.
.Pp
.Em Note: The use of filters is deprecated. Proper operation is not guaranteed.
.It Vt bus_dma_segment_t
A machine-dependent type that describes individual
DMA segments.
It contains the following fields:
.Bd -literal
bus_addr_t ds_addr;
bus_size_t ds_len;
.Ed
2006-09-18 15:24:20 +00:00
.Pp
The
.Fa ds_addr
field contains the device visible address of the DMA segment, and
.Fa ds_len
contains the length of the DMA segment.
Although the DMA segments returned by a mapping call will adhere to
all restrictions necessary for a successful DMA operation, some conversion
(e.g.\& a conversion from host byte order to the device's byte order) is
almost always required when presenting segment information to the device.
.It Vt bus_dmamap_t
A machine-dependent opaque type describing an individual mapping.
One map is used for each memory allocation that will be loaded.
Maps can be reused once they have been unloaded.
Multiple maps can be associated with one DMA tag.
2006-09-18 15:24:20 +00:00
While the value of the map may evaluate to
.Dv NULL
on some platforms under certain conditions,
it should never be assumed that it will be
.Dv NULL
in all cases.
.It Vt bus_dmamap_callback_t
Client specified callback for receiving mapping information resulting from
the load of a
.Vt bus_dmamap_t
via
.Fn bus_dmamap_load ,
Refactor driver and consumer interfaces for OCF (in-kernel crypto). - The linked list of cryptoini structures used in session initialization is replaced with a new flat structure: struct crypto_session_params. This session includes a new mode to define how the other fields should be interpreted. Available modes include: - COMPRESS (for compression/decompression) - CIPHER (for simply encryption/decryption) - DIGEST (computing and verifying digests) - AEAD (combined auth and encryption such as AES-GCM and AES-CCM) - ETA (combined auth and encryption using encrypt-then-authenticate) Additional modes could be added in the future (e.g. if we wanted to support TLS MtE for AES-CBC in the kernel we could add a new mode for that. TLS modes might also affect how AAD is interpreted, etc.) The flat structure also includes the key lengths and algorithms as before. However, code doesn't have to walk the linked list and switch on the algorithm to determine which key is the auth key vs encryption key. The 'csp_auth_*' fields are always used for auth keys and settings and 'csp_cipher_*' for cipher. (Compression algorithms are stored in csp_cipher_alg.) - Drivers no longer register a list of supported algorithms. This doesn't quite work when you factor in modes (e.g. a driver might support both AES-CBC and SHA2-256-HMAC separately but not combined for ETA). Instead, a new 'crypto_probesession' method has been added to the kobj interface for symmteric crypto drivers. This method returns a negative value on success (similar to how device_probe works) and the crypto framework uses this value to pick the "best" driver. There are three constants for hardware (e.g. ccr), accelerated software (e.g. aesni), and plain software (cryptosoft) that give preference in that order. One effect of this is that if you request only hardware when creating a new session, you will no longer get a session using accelerated software. Another effect is that the default setting to disallow software crypto via /dev/crypto now disables accelerated software. Once a driver is chosen, 'crypto_newsession' is invoked as before. - Crypto operations are now solely described by the flat 'cryptop' structure. The linked list of descriptors has been removed. A separate enum has been added to describe the type of data buffer in use instead of using CRYPTO_F_* flags to make it easier to add more types in the future if needed (e.g. wired userspace buffers for zero-copy). It will also make it easier to re-introduce separate input and output buffers (in-kernel TLS would benefit from this). Try to make the flags related to IV handling less insane: - CRYPTO_F_IV_SEPARATE means that the IV is stored in the 'crp_iv' member of the operation structure. If this flag is not set, the IV is stored in the data buffer at the 'crp_iv_start' offset. - CRYPTO_F_IV_GENERATE means that a random IV should be generated and stored into the data buffer. This cannot be used with CRYPTO_F_IV_SEPARATE. If a consumer wants to deal with explicit vs implicit IVs, etc. it can always generate the IV however it needs and store partial IVs in the buffer and the full IV/nonce in crp_iv and set CRYPTO_F_IV_SEPARATE. The layout of the buffer is now described via fields in cryptop. crp_aad_start and crp_aad_length define the boundaries of any AAD. Previously with GCM and CCM you defined an auth crd with this range, but for ETA your auth crd had to span both the AAD and plaintext (and they had to be adjacent). crp_payload_start and crp_payload_length define the boundaries of the plaintext/ciphertext. Modes that only do a single operation (COMPRESS, CIPHER, DIGEST) should only use this region and leave the AAD region empty. If a digest is present (or should be generated), it's starting location is marked by crp_digest_start. Instead of using the CRD_F_ENCRYPT flag to determine the direction of the operation, cryptop now includes an 'op' field defining the operation to perform. For digests I've added a new VERIFY digest mode which assumes a digest is present in the input and fails the request with EBADMSG if it doesn't match the internally-computed digest. GCM and CCM already assumed this, and the new AEAD mode requires this for decryption. The new ETA mode now also requires this for decryption, so IPsec and GELI no longer do their own authentication verification. Simple DIGEST operations can also do this, though there are no in-tree consumers. To eventually support some refcounting to close races, the session cookie is now passed to crypto_getop() and clients should no longer set crp_sesssion directly. - Assymteric crypto operation structures should be allocated via crypto_getkreq() and freed via crypto_freekreq(). This permits the crypto layer to track open asym requests and close races with a driver trying to unregister while asym requests are in flight. - crypto_copyback, crypto_copydata, crypto_apply, and crypto_contiguous_subsegment now accept the 'crp' object as the first parameter instead of individual members. This makes it easier to deal with different buffer types in the future as well as separate input and output buffers. It's also simpler for driver writers to use. - bus_dmamap_load_crp() loads a DMA mapping for a crypto buffer. This understands the various types of buffers so that drivers that use DMA do not have to be aware of different buffer types. - Helper routines now exist to build an auth context for HMAC IPAD and OPAD. This reduces some duplicated work among drivers. - Key buffers are now treated as const throughout the framework and in device drivers. However, session key buffers provided when a session is created are expected to remain alive for the duration of the session. - GCM and CCM sessions now only specify a cipher algorithm and a cipher key. The redundant auth information is not needed or used. - For cryptosoft, split up the code a bit such that the 'process' callback now invokes a function pointer in the session. This function pointer is set based on the mode (in effect) though it simplifies a few edge cases that would otherwise be in the switch in 'process'. It does split up GCM vs CCM which I think is more readable even if there is some duplication. - I changed /dev/crypto to support GMAC requests using CRYPTO_AES_NIST_GMAC as an auth algorithm and updated cryptocheck to work with it. - Combined cipher and auth sessions via /dev/crypto now always use ETA mode. The COP_F_CIPHER_FIRST flag is now a no-op that is ignored. This was actually documented as being true in crypto(4) before, but the code had not implemented this before I added the CIPHER_FIRST flag. - I have not yet updated /dev/crypto to be aware of explicit modes for sessions. I will probably do that at some point in the future as well as teach it about IV/nonce and tag lengths for AEAD so we can support all of the NIST KAT tests for GCM and CCM. - I've split up the exising crypto.9 manpage into several pages of which many are written from scratch. - I have converted all drivers and consumers in the tree and verified that they compile, but I have not tested all of them. I have tested the following drivers: - cryptosoft - aesni (AES only) - blake2 - ccr and the following consumers: - cryptodev - IPsec - ktls_ocf - GELI (lightly) I have not tested the following: - ccp - aesni with sha - hifn - kgssapi_krb5 - ubsec - padlock - safe - armv8_crypto (aarch64) - glxsb (i386) - sec (ppc) - cesa (armv7) - cryptocteon (mips64) - nlmsec (mips64) Discussed with: cem Relnotes: yes Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D23677
2020-03-27 18:25:23 +00:00
.Fn bus_dmamap_load_bio ,
.Fn bus_dmamap_load_ccb ,
Add support for optional separate output buffers to in-kernel crypto. Some crypto consumers such as GELI and KTLS for file-backed sendfile need to store their output in a separate buffer from the input. Currently these consumers copy the contents of the input buffer into the output buffer and queue an in-place crypto operation on the output buffer. Using a separate output buffer avoids this copy. - Create a new 'struct crypto_buffer' describing a crypto buffer containing a type and type-specific fields. crp_ilen is gone, instead buffers that use a flat kernel buffer have a cb_buf_len field for their length. The length of other buffer types is inferred from the backing store (e.g. uio_resid for a uio). Requests now have two such structures: crp_buf for the input buffer, and crp_obuf for the output buffer. - Consumers now use helper functions (crypto_use_*, e.g. crypto_use_mbuf()) to configure the input buffer. If an output buffer is not configured, the request still modifies the input buffer in-place. A consumer uses a second set of helper functions (crypto_use_output_*) to configure an output buffer. - Consumers must request support for separate output buffers when creating a crypto session via the CSP_F_SEPARATE_OUTPUT flag and are only permitted to queue a request with a separate output buffer on sessions with this flag set. Existing drivers already reject sessions with unknown flags, so this permits drivers to be modified to support this extension without requiring all drivers to change. - Several data-related functions now have matching versions that operate on an explicit buffer (e.g. crypto_apply_buf, crypto_contiguous_subsegment_buf, bus_dma_load_crp_buf). - Most of the existing data-related functions operate on the input buffer. However crypto_copyback always writes to the output buffer if a request uses a separate output buffer. - For the regions in input/output buffers, the following conventions are followed: - AAD and IV are always present in input only and their fields are offsets into the input buffer. - payload is always present in both buffers. If a request uses a separate output buffer, it must set a new crp_payload_start_output field to the offset of the payload in the output buffer. - digest is in the input buffer for verify operations, and in the output buffer for compute operations. crp_digest_start is relative to the appropriate buffer. - Add a crypto buffer cursor abstraction. This is a more general form of some bits in the cryptosoft driver that tried to always use uio's. However, compared to the original code, this avoids rewalking the uio iovec array for requests with multiple vectors. It also avoids allocate an iovec array for mbufs and populating it by instead walking the mbuf chain directly. - Update the cryptosoft(4) driver to support separate output buffers making use of the cursor abstraction. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24545
2020-05-25 22:12:04 +00:00
.Fn bus_dmamap_load_crp ,
or
Add support for optional separate output buffers to in-kernel crypto. Some crypto consumers such as GELI and KTLS for file-backed sendfile need to store their output in a separate buffer from the input. Currently these consumers copy the contents of the input buffer into the output buffer and queue an in-place crypto operation on the output buffer. Using a separate output buffer avoids this copy. - Create a new 'struct crypto_buffer' describing a crypto buffer containing a type and type-specific fields. crp_ilen is gone, instead buffers that use a flat kernel buffer have a cb_buf_len field for their length. The length of other buffer types is inferred from the backing store (e.g. uio_resid for a uio). Requests now have two such structures: crp_buf for the input buffer, and crp_obuf for the output buffer. - Consumers now use helper functions (crypto_use_*, e.g. crypto_use_mbuf()) to configure the input buffer. If an output buffer is not configured, the request still modifies the input buffer in-place. A consumer uses a second set of helper functions (crypto_use_output_*) to configure an output buffer. - Consumers must request support for separate output buffers when creating a crypto session via the CSP_F_SEPARATE_OUTPUT flag and are only permitted to queue a request with a separate output buffer on sessions with this flag set. Existing drivers already reject sessions with unknown flags, so this permits drivers to be modified to support this extension without requiring all drivers to change. - Several data-related functions now have matching versions that operate on an explicit buffer (e.g. crypto_apply_buf, crypto_contiguous_subsegment_buf, bus_dma_load_crp_buf). - Most of the existing data-related functions operate on the input buffer. However crypto_copyback always writes to the output buffer if a request uses a separate output buffer. - For the regions in input/output buffers, the following conventions are followed: - AAD and IV are always present in input only and their fields are offsets into the input buffer. - payload is always present in both buffers. If a request uses a separate output buffer, it must set a new crp_payload_start_output field to the offset of the payload in the output buffer. - digest is in the input buffer for verify operations, and in the output buffer for compute operations. crp_digest_start is relative to the appropriate buffer. - Add a crypto buffer cursor abstraction. This is a more general form of some bits in the cryptosoft driver that tried to always use uio's. However, compared to the original code, this avoids rewalking the uio iovec array for requests with multiple vectors. It also avoids allocate an iovec array for mbufs and populating it by instead walking the mbuf chain directly. - Update the cryptosoft(4) driver to support separate output buffers making use of the cursor abstraction. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24545
2020-05-25 22:12:04 +00:00
.Fn bus_dmamap_load_crp_buffer .
Callbacks are of the format:
2006-09-18 15:24:20 +00:00
.Bl -tag -width indent
.It Ft void
.Fn "client_callback" "void *callback_arg" "bus_dma_segment_t *segs" \
"int nseg" "int error"
.El
2006-09-18 15:24:20 +00:00
.Pp
The
.Fa callback_arg
is the callback argument passed to dmamap load functions.
The
.Fa segs
and
.Fa nseg
arguments describe an array of
.Vt bus_dma_segment_t
structures that represent the mapping.
This array is only valid within the scope of the callback function.
The success or failure of the mapping is indicated by the
.Fa error
argument.
More information on the use of callbacks can be found in the
description of the individual dmamap load functions.
.It Vt bus_dmamap_callback2_t
Client specified callback for receiving mapping information resulting from
the load of a
.Vt bus_dmamap_t
via
.Fn bus_dmamap_load_uio
or
.Fn bus_dmamap_load_mbuf .
2006-09-18 15:24:20 +00:00
.Pp
Callback2s are of the format:
2006-09-18 15:24:20 +00:00
.Bl -tag -width indent
.It Ft void
.Fn "client_callback2" "void *callback_arg" "bus_dma_segment_t *segs" \
"int nseg" "bus_size_t mapsize" "int error"
.El
2006-09-18 15:24:20 +00:00
.Pp
Callback2's behavior is the same as
.Vt bus_dmamap_callback_t
with the addition that the length of the data mapped is provided via
.Fa mapsize .
.It Vt bus_dmasync_op_t
Memory synchronization operation specifier.
2005-02-13 23:45:54 +00:00
Bus DMA requires explicit synchronization of memory with its device
visible mapping in order to guarantee memory coherency.
The
.Vt bus_dmasync_op_t
allows the type of DMA operation that will be or has been performed
to be communicated to the system so that the correct coherency measures
are taken.
The operations are represented as bitfield flags that can be combined together,
though it only makes sense to combine PRE flags or POST flags, not both.
See the
.Fn bus_dmamap_sync
description below for more details on how to use these operations.
.Pp
All operations specified below are performed from the host memory point of view,
where a read implies data coming from the device to the host memory, and a write
implies data going from the host memory to the device.
Alternatively, the operations can be thought of in terms of driver operations,
where reading a network packet or storage sector corresponds to a read operation
in
.Nm .
.Bl -tag -width ".Dv BUS_DMASYNC_POSTWRITE"
.It Dv BUS_DMASYNC_PREREAD
Perform any synchronization required prior to an update of host memory by the
device.
.It Dv BUS_DMASYNC_PREWRITE
Perform any synchronization required after an update of host memory by the CPU
and prior to device access to host memory.
.It Dv BUS_DMASYNC_POSTREAD
Perform any synchronization required after an update of host memory by the
device and prior to CPU access to host memory.
.It Dv BUS_DMASYNC_POSTWRITE
Perform any synchronization required after device access to host memory.
.El
.It Vt bus_dma_lock_t
Client specified lock/mutex manipulation method.
This will be called from
within busdma whenever a client lock needs to be manipulated.
In its current form, the function will be called immediately before
the callback for a DMA load operation that has been deferred with
.Dv BUS_DMA_LOCK
and immediately after with
.Dv BUS_DMA_UNLOCK .
If the load operation does not need to be deferred, then it
will not be called since the function loading the map should
be holding the appropriate locks.
This method is of the format:
2006-09-18 15:24:20 +00:00
.Bl -tag -width indent
.It Ft void
.Fn "lockfunc" "void *lockfunc_arg" "bus_dma_lock_op_t op"
.El
2006-09-18 15:24:20 +00:00
.Pp
The
.Fa lockfuncarg
argument is specified by the client during tag creation to be passed to all
invocations of the callback.
The
.Fa op
argument specifies the lock operation to perform.
.Pp
Two
.Vt lockfunc
implementations are provided for convenience.
.Fn busdma_lock_mutex
performs standard mutex operations on the sleep mutex provided via
.Fa lockfuncarg .
.Fn dflt_lock
will generate a system panic if it is called.
It is substituted into the tag when
.Fa lockfunc
2006-09-18 15:24:20 +00:00
is passed as
.Dv NULL
to
.Fn bus_dma_tag_create
and is useful for tags that should not be used with deferred load operations.
.It Vt bus_dma_lock_op_t
Operations to be performed by the client-specified
.Fn lockfunc .
2006-09-18 15:24:20 +00:00
.Bl -tag -width ".Dv BUS_DMA_UNLOCK"
.It Dv BUS_DMA_LOCK
2004-06-21 14:11:45 +00:00
Acquires and/or locks the client locking primitive.
.It Dv BUS_DMA_UNLOCK
Releases and/or unlocks the client locking primitive.
.El
.El
.Sh FUNCTIONS
2006-09-18 15:24:20 +00:00
.Bl -tag -width indent
.It Fn bus_dma_tag_create "parent" "alignment" "boundary" "lowaddr" \
"highaddr" "*filtfunc" "*filtfuncarg" "maxsize" "nsegments" "maxsegsz" \
"flags" "lockfunc" "lockfuncarg" "*dmat"
Allocates a DMA tag, and initializes it according to
the arguments provided:
2006-09-18 15:24:20 +00:00
.Bl -tag -width ".Fa filtfuncarg"
.It Fa parent
A parent tag from which to inherit restrictions.
The restrictions passed in other arguments can only further tighten the
restrictions inherited from the parent tag.
.Pp
All tags created by a device driver must inherit from the tag returned by
.Fn bus_get_dma_tag
to honor restrictions between the parent bridge, CPU memory, and the
device.
.It Fa alignment
Alignment constraint, in bytes, of any mappings created using this tag.
The alignment must be a power of 2.
Hardware that can DMA starting at any address would specify
.Em 1
for byte alignment.
Hardware requiring DMA transfers to start on a multiple of 4K
would specify
.Em 4096 .
.It Fa boundary
Boundary constraint, in bytes, of the target DMA memory region.
The boundary indicates the set of addresses, all multiples of the
boundary argument, that cannot be crossed by a single
.Vt bus_dma_segment_t .
2005-01-18 18:13:03 +00:00
The boundary must be a power of 2 and must be no smaller than the
maximum segment size.
.Ql 0
indicates that there are no boundary restrictions.
2006-09-18 15:24:20 +00:00
.It Fa lowaddr , highaddr
Bounds of the window of bus address space that
.Em cannot
be directly accessed by the device.
The window contains all addresses greater than
.Fa lowaddr
and less than or equal to
.Fa highaddr .
For example, a device incapable of DMA above 4GB, would specify a
.Fa highaddr
of
.Dv BUS_SPACE_MAXADDR
and a
.Fa lowaddr
of
.Dv BUS_SPACE_MAXADDR_32BIT .
Similarly a device that can only perform DMA to addresses below
16MB would specify a
.Fa highaddr
of
.Dv BUS_SPACE_MAXADDR
and a
.Fa lowaddr
of
.Dv BUS_SPACE_MAXADDR_24BIT .
Some implementations require that some region of device visible
address space, overlapping available host memory, be outside the
window.
This area of
.Ql safe memory
is used to bounce requests that would otherwise conflict with
the exclusion window.
.It Fa filtfunc
2006-09-18 15:24:20 +00:00
Optional filter function (may be
.Dv NULL )
to be called for any attempt to
map memory into the window described by
.Fa lowaddr
and
.Fa highaddr .
A filter function is only required when the single window described
by
.Fa lowaddr
and
.Fa highaddr
cannot adequately describe the constraints of the device.
The filter function will be called for every machine page
that overlaps the exclusion window.
.Pp
.Em Note: The use of filters is deprecated. Proper operation is not guaranteed.
.It Fa filtfuncarg
Argument passed to all calls to the filter function for this tag.
2006-09-18 15:24:20 +00:00
May be
.Dv NULL .
.It Fa maxsize
Maximum size, in bytes, of the sum of all segment lengths in a given
DMA mapping associated with this tag.
.It Fa nsegments
Number of discontinuities (scatter/gather segments) allowed
in a DMA mapped region.
.It Fa maxsegsz
Maximum size, in bytes, of a segment in any DMA mapped region associated
with
.Fa dmat .
.It Fa flags
Are as follows:
2006-09-18 15:24:20 +00:00
.Bl -tag -width ".Dv BUS_DMA_ALLOCNOW"
.It Dv BUS_DMA_ALLOCNOW
Pre-allocate enough resources to handle at least one map load operation on
this tag.
If sufficient resources are not available,
.Er ENOMEM
is returned.
This should not be used for tags that only describe buffers that will be
allocated with
.Fn bus_dmamem_alloc .
Also, due to resource sharing with other tags, this flag does not guarantee
that resources will be allocated or reserved exclusively for this tag.
It should be treated only as a minor optimization.
.It Dv BUS_DMA_COHERENT
Indicate that the DMA engine and CPU are cache-coherent.
Cached memory may be used to back allocations created by
.Fn bus_dmamem_alloc .
For
.Fn bus_dma_tag_create ,
the
.Dv BUS_DMA_COHERENT
flag is currently implemented on arm64.
.El
.It Fa lockfunc
2006-09-18 15:24:20 +00:00
Optional lock manipulation function (may be
.Dv NULL )
to be called when busdma
needs to manipulate a lock on behalf of the client.
2006-09-18 15:24:20 +00:00
If
.Dv NULL
is specified,
.Fn dflt_lock
is used.
.It Fa lockfuncarg
Optional argument to be passed to the function specified by
.Fa lockfunc .
.It Fa dmat
Pointer to a bus_dma_tag_t where the resulting DMA tag will
be stored.
.El
.Pp
Returns
.Er ENOMEM
if sufficient memory is not available for tag creation
or allocating mapping resources.
.It Fn bus_dma_tag_destroy "dmat"
Deallocate the DMA tag
.Fa dmat
that was created by
.Fn bus_dma_tag_create .
.Pp
Returns
.Er EBUSY
if any DMA maps remain associated with
.Fa dmat
or
.Ql 0
on success.
.It Fn bus_dma_template_init "*template" "parent"
Initializes a
.Fa bus_dma_template_t
structure.
If the
.Fa parent
argument is non-NULL, this parent tag is associated with the template and
will be compiled into the dma tag that is later created.
The values of the parent are not copied into the template.
During tag creation in
.Fn bus_dma_tag_template ,
any parameters from the parent tag that are more restrictive than what is
in the provided template will overwrite what goes into the new tag.
.It Fn bus_dma_template_tag "*template" "*dmat"
Unpacks a template into a tag, and returns the tag via the
.Fa dmat .
All return values are identical to
.Fn bus_dma_tag_create .
The template is not modified by this function, and can be reused and/or
freed upon return.
.It Fn bus_dma_template_clone "*template" "dmat"
Copies the fields from an existing tag to a template.
The template does not need to be initialized first.
All of its fields will be overwritten by the values contained in the tag.
When paired with
.Fn bus_dma_template_tag ,
this function is useful for creating copies of tags.
.It Fn bus_dma_template_fill "*template" "params[]" "count"
Fills in the selected fields of the template with the keyed values from the
.Fa params
array.
This is not meant to be called directly, use
.Fn BUS_DMA_TEMPLATE_FILL
instead.
.It Fn BUS_DMA_TEMPLATE_FILL "*template" "param ..."
Fills in the selected fields of the template with a variable number of
key-value parameters.
The macros listed below take an argument of the specified type and encapsulate
it into a key-value structure that is directly usable as a parameter argument.
Muliple parameters may be provided at once.
.Bd -literal
BD_PARENT() void *
BD_ALIGNMENT() uintmax_t
BD_BOUNDARY() uintmax_t
BD_LOWADDR() vm_paddr_t
BD_HIGHADDR() vm_paddr_t
BD_MAXSIZE() uintmax_t
BD_NSEGMENTS() uintmax_t
BD_MAXSEGSIZE() uintmax_t
BD_FLAGS() uintmax_t
BD_LOCKFUNC() void *
BD_LOCKFUNCARG() void *
.Ed
.It Fn bus_dmamap_create "dmat" "flags" "*mapp"
Allocates and initializes a DMA map.
Arguments are as follows:
2006-09-18 15:24:20 +00:00
.Bl -tag -width ".Fa nsegments"
.It Fa dmat
DMA tag.
.It Fa flags
Are as follows:
.Bl -tag -width ".Dv BUS_DMA_COHERENT"
.It Dv BUS_DMA_COHERENT
Attempt to map the memory loaded with this map such that cache sync
operations are as cheap as possible.
This flag is typically set on maps when the memory loaded with these will
be accessed by both a CPU and a DMA engine, frequently such as control data
and as opposed to streamable data such as receive and transmit buffers.
Use of this flag does not remove the requirement of using
.Fn bus_dmamap_sync ,
but it may reduce the cost of performing these operations.
.El
.It Fa mapp
Pointer to a
.Vt bus_dmamap_t
where the resulting DMA map will be stored.
.El
.Pp
Returns
.Er ENOMEM
if sufficient memory is not available for creating the
map or allocating mapping resources.
.It Fn bus_dmamap_destroy "dmat" "map"
Frees all resources associated with a given DMA map.
Arguments are as follows:
2006-09-18 15:24:20 +00:00
.Bl -tag -width ".Fa dmat"
.It Fa dmat
DMA tag used to allocate
.Fa map .
.It Fa map
The DMA map to destroy.
.El
.Pp
Returns
.Er EBUSY
if a mapping is still active for
.Fa map .
.It Fn bus_dmamap_load "dmat" "map" "buf" "buflen" "*callback" \
"callback_arg" "flags"
Creates a mapping in device visible address space of
.Fa buflen
bytes of
.Fa buf ,
associated with the DMA map
.Fa map .
This call will always return immediately and will not block for any reason.
Arguments are as follows:
2006-09-18 15:24:20 +00:00
.Bl -tag -width ".Fa buflen"
.It Fa dmat
DMA tag used to allocate
.Fa map .
.It Fa map
A DMA map without a currently active mapping.
.It Fa buf
A kernel virtual address pointer to a contiguous (in KVA) buffer, to be
mapped into device visible address space.
.It Fa buflen
The size of the buffer.
.It Fa callback Fa callback_arg
The callback function, and its argument.
This function is called once sufficient mapping resources are available for
the DMA operation.
If resources are temporarily unavailable, this function will be deferred until
later, but the load operation will still return immediately to the caller.
Thus, callers should not assume that the callback will be called before the
load returns, and code should be structured appropriately to handle this.
See below for specific flags and error codes that control this behavior.
.It Fa flags
Are as follows:
2006-09-18 15:24:20 +00:00
.Bl -tag -width ".Dv BUS_DMA_NOWAIT"
.It Dv BUS_DMA_NOWAIT
The load should not be deferred in case of insufficient mapping resources,
and instead should return immediately with an appropriate error.
.It Dv BUS_DMA_NOCACHE
The generated transactions to and from the virtual page are non-cacheable.
.El
.El
.Pp
Return values to the caller are as follows:
2006-09-18 15:24:20 +00:00
.Bl -tag -width ".Er EINPROGRESS"
.It 0
The callback has been called and completed.
The status of the mapping has been delivered to the callback.
.It Er EINPROGRESS
The mapping has been deferred for lack of resources.
The callback will be called as soon as resources are available.
Callbacks are serviced in FIFO order.
.Pp
Note that subsequent load operations for the same tag that do not require
extra resources will still succeed.
This may result in out-of-order processing of requests.
If the caller requires the order of requests to be preserved,
then the caller is required to stall subsequent requests until a pending
request's callback is invoked.
.It Er ENOMEM
The load request has failed due to insufficient resources, and the caller
specifically used the
2006-09-18 15:24:20 +00:00
.Dv BUS_DMA_NOWAIT
flag.
.It Er EINVAL
The load request was invalid.
The callback has been called and has been provided the same error.
This error value may indicate that
.Fa dmat ,
.Fa map ,
.Fa buf ,
or
.Fa callback
were invalid, or
.Fa buflen
was larger than the
.Fa maxsize
argument used to create the dma tag
.Fa dmat .
.El
.Pp
When the callback is called, it is presented with an error value
indicating the disposition of the mapping.
Error may be one of the following:
2006-09-18 15:24:20 +00:00
.Bl -tag -width ".Er EINPROGRESS"
.It 0
The mapping was successful and the
.Fa dm_segs
callback argument contains an array of
.Vt bus_dma_segment_t
elements describing the mapping.
This array is only valid during the scope of the callback function.
.It Er EFBIG
A mapping could not be achieved within the segment constraints provided
in the tag even though the requested allocation size was less than maxsize.
.El
.It Fn bus_dmamap_load_bio "dmat" "map" "bio" "callback" "callback_arg" "flags"
This is a variation of
.Fn bus_dmamap_load
which maps buffers pointed to by
.Fa bio
for DMA transfers.
.Fa bio
may point to either a mapped or unmapped buffer.
.It Fn bus_dmamap_load_ccb "dmat" "map" "ccb" "callback" "callback_arg" "flags"
This is a variation of
.Fn bus_dmamap_load
which maps data pointed to by
.Fa ccb
for DMA transfers.
The data for
.Fa ccb
may be any of the following types:
.Bl -tag -width ".Er CAM_DATA_SG_PADDR"
.It CAM_DATA_VADDR
The data is a single KVA buffer.
.It CAM_DATA_PADDR
The data is a single bus address range.
.It CAM_DATA_SG
The data is a scatter/gather list of KVA buffers.
.It CAM_DATA_SG_PADDR
The data is a scatter/gather list of bus address ranges.
.It CAM_DATA_BIO
The data is contained in a
.Vt struct bio
attached to the CCB.
.El
.Pp
.Fn bus_dmamap_load_ccb
supports the following CCB XPT function codes:
.Pp
.Bl -item -offset indent -compact
.It
XPT_ATA_IO
.It
XPT_CONT_TARGET_IO
.It
XPT_SCSI_IO
.El
Refactor driver and consumer interfaces for OCF (in-kernel crypto). - The linked list of cryptoini structures used in session initialization is replaced with a new flat structure: struct crypto_session_params. This session includes a new mode to define how the other fields should be interpreted. Available modes include: - COMPRESS (for compression/decompression) - CIPHER (for simply encryption/decryption) - DIGEST (computing and verifying digests) - AEAD (combined auth and encryption such as AES-GCM and AES-CCM) - ETA (combined auth and encryption using encrypt-then-authenticate) Additional modes could be added in the future (e.g. if we wanted to support TLS MtE for AES-CBC in the kernel we could add a new mode for that. TLS modes might also affect how AAD is interpreted, etc.) The flat structure also includes the key lengths and algorithms as before. However, code doesn't have to walk the linked list and switch on the algorithm to determine which key is the auth key vs encryption key. The 'csp_auth_*' fields are always used for auth keys and settings and 'csp_cipher_*' for cipher. (Compression algorithms are stored in csp_cipher_alg.) - Drivers no longer register a list of supported algorithms. This doesn't quite work when you factor in modes (e.g. a driver might support both AES-CBC and SHA2-256-HMAC separately but not combined for ETA). Instead, a new 'crypto_probesession' method has been added to the kobj interface for symmteric crypto drivers. This method returns a negative value on success (similar to how device_probe works) and the crypto framework uses this value to pick the "best" driver. There are three constants for hardware (e.g. ccr), accelerated software (e.g. aesni), and plain software (cryptosoft) that give preference in that order. One effect of this is that if you request only hardware when creating a new session, you will no longer get a session using accelerated software. Another effect is that the default setting to disallow software crypto via /dev/crypto now disables accelerated software. Once a driver is chosen, 'crypto_newsession' is invoked as before. - Crypto operations are now solely described by the flat 'cryptop' structure. The linked list of descriptors has been removed. A separate enum has been added to describe the type of data buffer in use instead of using CRYPTO_F_* flags to make it easier to add more types in the future if needed (e.g. wired userspace buffers for zero-copy). It will also make it easier to re-introduce separate input and output buffers (in-kernel TLS would benefit from this). Try to make the flags related to IV handling less insane: - CRYPTO_F_IV_SEPARATE means that the IV is stored in the 'crp_iv' member of the operation structure. If this flag is not set, the IV is stored in the data buffer at the 'crp_iv_start' offset. - CRYPTO_F_IV_GENERATE means that a random IV should be generated and stored into the data buffer. This cannot be used with CRYPTO_F_IV_SEPARATE. If a consumer wants to deal with explicit vs implicit IVs, etc. it can always generate the IV however it needs and store partial IVs in the buffer and the full IV/nonce in crp_iv and set CRYPTO_F_IV_SEPARATE. The layout of the buffer is now described via fields in cryptop. crp_aad_start and crp_aad_length define the boundaries of any AAD. Previously with GCM and CCM you defined an auth crd with this range, but for ETA your auth crd had to span both the AAD and plaintext (and they had to be adjacent). crp_payload_start and crp_payload_length define the boundaries of the plaintext/ciphertext. Modes that only do a single operation (COMPRESS, CIPHER, DIGEST) should only use this region and leave the AAD region empty. If a digest is present (or should be generated), it's starting location is marked by crp_digest_start. Instead of using the CRD_F_ENCRYPT flag to determine the direction of the operation, cryptop now includes an 'op' field defining the operation to perform. For digests I've added a new VERIFY digest mode which assumes a digest is present in the input and fails the request with EBADMSG if it doesn't match the internally-computed digest. GCM and CCM already assumed this, and the new AEAD mode requires this for decryption. The new ETA mode now also requires this for decryption, so IPsec and GELI no longer do their own authentication verification. Simple DIGEST operations can also do this, though there are no in-tree consumers. To eventually support some refcounting to close races, the session cookie is now passed to crypto_getop() and clients should no longer set crp_sesssion directly. - Assymteric crypto operation structures should be allocated via crypto_getkreq() and freed via crypto_freekreq(). This permits the crypto layer to track open asym requests and close races with a driver trying to unregister while asym requests are in flight. - crypto_copyback, crypto_copydata, crypto_apply, and crypto_contiguous_subsegment now accept the 'crp' object as the first parameter instead of individual members. This makes it easier to deal with different buffer types in the future as well as separate input and output buffers. It's also simpler for driver writers to use. - bus_dmamap_load_crp() loads a DMA mapping for a crypto buffer. This understands the various types of buffers so that drivers that use DMA do not have to be aware of different buffer types. - Helper routines now exist to build an auth context for HMAC IPAD and OPAD. This reduces some duplicated work among drivers. - Key buffers are now treated as const throughout the framework and in device drivers. However, session key buffers provided when a session is created are expected to remain alive for the duration of the session. - GCM and CCM sessions now only specify a cipher algorithm and a cipher key. The redundant auth information is not needed or used. - For cryptosoft, split up the code a bit such that the 'process' callback now invokes a function pointer in the session. This function pointer is set based on the mode (in effect) though it simplifies a few edge cases that would otherwise be in the switch in 'process'. It does split up GCM vs CCM which I think is more readable even if there is some duplication. - I changed /dev/crypto to support GMAC requests using CRYPTO_AES_NIST_GMAC as an auth algorithm and updated cryptocheck to work with it. - Combined cipher and auth sessions via /dev/crypto now always use ETA mode. The COP_F_CIPHER_FIRST flag is now a no-op that is ignored. This was actually documented as being true in crypto(4) before, but the code had not implemented this before I added the CIPHER_FIRST flag. - I have not yet updated /dev/crypto to be aware of explicit modes for sessions. I will probably do that at some point in the future as well as teach it about IV/nonce and tag lengths for AEAD so we can support all of the NIST KAT tests for GCM and CCM. - I've split up the exising crypto.9 manpage into several pages of which many are written from scratch. - I have converted all drivers and consumers in the tree and verified that they compile, but I have not tested all of them. I have tested the following drivers: - cryptosoft - aesni (AES only) - blake2 - ccr and the following consumers: - cryptodev - IPsec - ktls_ocf - GELI (lightly) I have not tested the following: - ccp - aesni with sha - hifn - kgssapi_krb5 - ubsec - padlock - safe - armv8_crypto (aarch64) - glxsb (i386) - sec (ppc) - cesa (armv7) - cryptocteon (mips64) - nlmsec (mips64) Discussed with: cem Relnotes: yes Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D23677
2020-03-27 18:25:23 +00:00
.It Fn bus_dmamap_load_crp "dmat" "map" "crp" "callback" "callback_arg" "flags"
This is a variation of
.Fn bus_dmamap_load
Add support for optional separate output buffers to in-kernel crypto. Some crypto consumers such as GELI and KTLS for file-backed sendfile need to store their output in a separate buffer from the input. Currently these consumers copy the contents of the input buffer into the output buffer and queue an in-place crypto operation on the output buffer. Using a separate output buffer avoids this copy. - Create a new 'struct crypto_buffer' describing a crypto buffer containing a type and type-specific fields. crp_ilen is gone, instead buffers that use a flat kernel buffer have a cb_buf_len field for their length. The length of other buffer types is inferred from the backing store (e.g. uio_resid for a uio). Requests now have two such structures: crp_buf for the input buffer, and crp_obuf for the output buffer. - Consumers now use helper functions (crypto_use_*, e.g. crypto_use_mbuf()) to configure the input buffer. If an output buffer is not configured, the request still modifies the input buffer in-place. A consumer uses a second set of helper functions (crypto_use_output_*) to configure an output buffer. - Consumers must request support for separate output buffers when creating a crypto session via the CSP_F_SEPARATE_OUTPUT flag and are only permitted to queue a request with a separate output buffer on sessions with this flag set. Existing drivers already reject sessions with unknown flags, so this permits drivers to be modified to support this extension without requiring all drivers to change. - Several data-related functions now have matching versions that operate on an explicit buffer (e.g. crypto_apply_buf, crypto_contiguous_subsegment_buf, bus_dma_load_crp_buf). - Most of the existing data-related functions operate on the input buffer. However crypto_copyback always writes to the output buffer if a request uses a separate output buffer. - For the regions in input/output buffers, the following conventions are followed: - AAD and IV are always present in input only and their fields are offsets into the input buffer. - payload is always present in both buffers. If a request uses a separate output buffer, it must set a new crp_payload_start_output field to the offset of the payload in the output buffer. - digest is in the input buffer for verify operations, and in the output buffer for compute operations. crp_digest_start is relative to the appropriate buffer. - Add a crypto buffer cursor abstraction. This is a more general form of some bits in the cryptosoft driver that tried to always use uio's. However, compared to the original code, this avoids rewalking the uio iovec array for requests with multiple vectors. It also avoids allocate an iovec array for mbufs and populating it by instead walking the mbuf chain directly. - Update the cryptosoft(4) driver to support separate output buffers making use of the cursor abstraction. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24545
2020-05-25 22:12:04 +00:00
which maps the input buffer pointed to by
Refactor driver and consumer interfaces for OCF (in-kernel crypto). - The linked list of cryptoini structures used in session initialization is replaced with a new flat structure: struct crypto_session_params. This session includes a new mode to define how the other fields should be interpreted. Available modes include: - COMPRESS (for compression/decompression) - CIPHER (for simply encryption/decryption) - DIGEST (computing and verifying digests) - AEAD (combined auth and encryption such as AES-GCM and AES-CCM) - ETA (combined auth and encryption using encrypt-then-authenticate) Additional modes could be added in the future (e.g. if we wanted to support TLS MtE for AES-CBC in the kernel we could add a new mode for that. TLS modes might also affect how AAD is interpreted, etc.) The flat structure also includes the key lengths and algorithms as before. However, code doesn't have to walk the linked list and switch on the algorithm to determine which key is the auth key vs encryption key. The 'csp_auth_*' fields are always used for auth keys and settings and 'csp_cipher_*' for cipher. (Compression algorithms are stored in csp_cipher_alg.) - Drivers no longer register a list of supported algorithms. This doesn't quite work when you factor in modes (e.g. a driver might support both AES-CBC and SHA2-256-HMAC separately but not combined for ETA). Instead, a new 'crypto_probesession' method has been added to the kobj interface for symmteric crypto drivers. This method returns a negative value on success (similar to how device_probe works) and the crypto framework uses this value to pick the "best" driver. There are three constants for hardware (e.g. ccr), accelerated software (e.g. aesni), and plain software (cryptosoft) that give preference in that order. One effect of this is that if you request only hardware when creating a new session, you will no longer get a session using accelerated software. Another effect is that the default setting to disallow software crypto via /dev/crypto now disables accelerated software. Once a driver is chosen, 'crypto_newsession' is invoked as before. - Crypto operations are now solely described by the flat 'cryptop' structure. The linked list of descriptors has been removed. A separate enum has been added to describe the type of data buffer in use instead of using CRYPTO_F_* flags to make it easier to add more types in the future if needed (e.g. wired userspace buffers for zero-copy). It will also make it easier to re-introduce separate input and output buffers (in-kernel TLS would benefit from this). Try to make the flags related to IV handling less insane: - CRYPTO_F_IV_SEPARATE means that the IV is stored in the 'crp_iv' member of the operation structure. If this flag is not set, the IV is stored in the data buffer at the 'crp_iv_start' offset. - CRYPTO_F_IV_GENERATE means that a random IV should be generated and stored into the data buffer. This cannot be used with CRYPTO_F_IV_SEPARATE. If a consumer wants to deal with explicit vs implicit IVs, etc. it can always generate the IV however it needs and store partial IVs in the buffer and the full IV/nonce in crp_iv and set CRYPTO_F_IV_SEPARATE. The layout of the buffer is now described via fields in cryptop. crp_aad_start and crp_aad_length define the boundaries of any AAD. Previously with GCM and CCM you defined an auth crd with this range, but for ETA your auth crd had to span both the AAD and plaintext (and they had to be adjacent). crp_payload_start and crp_payload_length define the boundaries of the plaintext/ciphertext. Modes that only do a single operation (COMPRESS, CIPHER, DIGEST) should only use this region and leave the AAD region empty. If a digest is present (or should be generated), it's starting location is marked by crp_digest_start. Instead of using the CRD_F_ENCRYPT flag to determine the direction of the operation, cryptop now includes an 'op' field defining the operation to perform. For digests I've added a new VERIFY digest mode which assumes a digest is present in the input and fails the request with EBADMSG if it doesn't match the internally-computed digest. GCM and CCM already assumed this, and the new AEAD mode requires this for decryption. The new ETA mode now also requires this for decryption, so IPsec and GELI no longer do their own authentication verification. Simple DIGEST operations can also do this, though there are no in-tree consumers. To eventually support some refcounting to close races, the session cookie is now passed to crypto_getop() and clients should no longer set crp_sesssion directly. - Assymteric crypto operation structures should be allocated via crypto_getkreq() and freed via crypto_freekreq(). This permits the crypto layer to track open asym requests and close races with a driver trying to unregister while asym requests are in flight. - crypto_copyback, crypto_copydata, crypto_apply, and crypto_contiguous_subsegment now accept the 'crp' object as the first parameter instead of individual members. This makes it easier to deal with different buffer types in the future as well as separate input and output buffers. It's also simpler for driver writers to use. - bus_dmamap_load_crp() loads a DMA mapping for a crypto buffer. This understands the various types of buffers so that drivers that use DMA do not have to be aware of different buffer types. - Helper routines now exist to build an auth context for HMAC IPAD and OPAD. This reduces some duplicated work among drivers. - Key buffers are now treated as const throughout the framework and in device drivers. However, session key buffers provided when a session is created are expected to remain alive for the duration of the session. - GCM and CCM sessions now only specify a cipher algorithm and a cipher key. The redundant auth information is not needed or used. - For cryptosoft, split up the code a bit such that the 'process' callback now invokes a function pointer in the session. This function pointer is set based on the mode (in effect) though it simplifies a few edge cases that would otherwise be in the switch in 'process'. It does split up GCM vs CCM which I think is more readable even if there is some duplication. - I changed /dev/crypto to support GMAC requests using CRYPTO_AES_NIST_GMAC as an auth algorithm and updated cryptocheck to work with it. - Combined cipher and auth sessions via /dev/crypto now always use ETA mode. The COP_F_CIPHER_FIRST flag is now a no-op that is ignored. This was actually documented as being true in crypto(4) before, but the code had not implemented this before I added the CIPHER_FIRST flag. - I have not yet updated /dev/crypto to be aware of explicit modes for sessions. I will probably do that at some point in the future as well as teach it about IV/nonce and tag lengths for AEAD so we can support all of the NIST KAT tests for GCM and CCM. - I've split up the exising crypto.9 manpage into several pages of which many are written from scratch. - I have converted all drivers and consumers in the tree and verified that they compile, but I have not tested all of them. I have tested the following drivers: - cryptosoft - aesni (AES only) - blake2 - ccr and the following consumers: - cryptodev - IPsec - ktls_ocf - GELI (lightly) I have not tested the following: - ccp - aesni with sha - hifn - kgssapi_krb5 - ubsec - padlock - safe - armv8_crypto (aarch64) - glxsb (i386) - sec (ppc) - cesa (armv7) - cryptocteon (mips64) - nlmsec (mips64) Discussed with: cem Relnotes: yes Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D23677
2020-03-27 18:25:23 +00:00
.Fa crp
for DMA transfers.
The
.Dv BUS_DMA_NOWAIT
flag is implied, thus no callback deferral will happen.
Add support for optional separate output buffers to in-kernel crypto. Some crypto consumers such as GELI and KTLS for file-backed sendfile need to store their output in a separate buffer from the input. Currently these consumers copy the contents of the input buffer into the output buffer and queue an in-place crypto operation on the output buffer. Using a separate output buffer avoids this copy. - Create a new 'struct crypto_buffer' describing a crypto buffer containing a type and type-specific fields. crp_ilen is gone, instead buffers that use a flat kernel buffer have a cb_buf_len field for their length. The length of other buffer types is inferred from the backing store (e.g. uio_resid for a uio). Requests now have two such structures: crp_buf for the input buffer, and crp_obuf for the output buffer. - Consumers now use helper functions (crypto_use_*, e.g. crypto_use_mbuf()) to configure the input buffer. If an output buffer is not configured, the request still modifies the input buffer in-place. A consumer uses a second set of helper functions (crypto_use_output_*) to configure an output buffer. - Consumers must request support for separate output buffers when creating a crypto session via the CSP_F_SEPARATE_OUTPUT flag and are only permitted to queue a request with a separate output buffer on sessions with this flag set. Existing drivers already reject sessions with unknown flags, so this permits drivers to be modified to support this extension without requiring all drivers to change. - Several data-related functions now have matching versions that operate on an explicit buffer (e.g. crypto_apply_buf, crypto_contiguous_subsegment_buf, bus_dma_load_crp_buf). - Most of the existing data-related functions operate on the input buffer. However crypto_copyback always writes to the output buffer if a request uses a separate output buffer. - For the regions in input/output buffers, the following conventions are followed: - AAD and IV are always present in input only and their fields are offsets into the input buffer. - payload is always present in both buffers. If a request uses a separate output buffer, it must set a new crp_payload_start_output field to the offset of the payload in the output buffer. - digest is in the input buffer for verify operations, and in the output buffer for compute operations. crp_digest_start is relative to the appropriate buffer. - Add a crypto buffer cursor abstraction. This is a more general form of some bits in the cryptosoft driver that tried to always use uio's. However, compared to the original code, this avoids rewalking the uio iovec array for requests with multiple vectors. It also avoids allocate an iovec array for mbufs and populating it by instead walking the mbuf chain directly. - Update the cryptosoft(4) driver to support separate output buffers making use of the cursor abstraction. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24545
2020-05-25 22:12:04 +00:00
.It Fn bus_dmamap_load_crp_buffer "dmat" "map" "cb" "callback" "callback_arg" \
"flags"
This is a variation of
.Fn bus_dmamap_load
which maps the crypto data buffer pointed to by
.Fa cb
for DMA transfers.
The
.Dv BUS_DMA_NOWAIT
flag is implied, thus no callback deferral will happen.
.It Fn bus_dmamap_load_mbuf "dmat" "map" "mbuf" "callback2" "callback_arg" \
"flags"
This is a variation of
.Fn bus_dmamap_load
which maps mbuf chains
for DMA transfers.
A
.Vt bus_size_t
argument is also passed to the callback routine, which
contains the mbuf chain's packet header length.
The
2006-09-18 15:24:20 +00:00
.Dv BUS_DMA_NOWAIT
flag is implied, thus no callback deferral will happen.
.Pp
Mbuf chains are assumed to be in kernel virtual address space.
.Pp
Beside the error values listed for
.Fn bus_dmamap_load ,
.Er EINVAL
will be returned if the size of the mbuf chain exceeds the maximum limit of the
DMA tag.
2005-01-15 20:50:52 +00:00
.It Fn bus_dmamap_load_mbuf_sg "dmat" "map" "mbuf" "segs" "nsegs" "flags"
This is just like
.Fn bus_dmamap_load_mbuf
except that it returns immediately without calling a callback function.
It is provided for efficiency.
2005-01-15 20:50:52 +00:00
The scatter/gather segment array
.Va segs
is provided by the caller and filled in directly by the function.
The
.Va nsegs
argument is returned with the number of segments filled in.
Returns the same errors as
.Fn bus_dmamap_load_mbuf .
.It Fn bus_dmamap_load_uio "dmat" "map" "uio" "callback2" "callback_arg" "flags"
This is a variation of
.Fn bus_dmamap_load
which maps buffers pointed to by
.Fa uio
for DMA transfers.
A
.Vt bus_size_t
argument is also passed to the callback routine, which contains the size of
.Fa uio ,
i.e.
.Fa uio->uio_resid .
The
2006-09-18 15:24:20 +00:00
.Dv BUS_DMA_NOWAIT
flag is implied, thus no callback deferral will happen.
Returns the same errors as
.Fn bus_dmamap_load .
.Pp
If
.Fa uio->uio_segflg
is
.Dv UIO_USERSPACE ,
then it is assumed that the buffer,
.Fa uio
is in
.Fa "uio->uio_td->td_proc" Ns 's
address space.
User space memory must be in-core and wired prior to attempting a map
load operation.
Pages may be locked using
.Xr vslock 9 .
.It Fn bus_dmamap_unload "dmat" "map"
Unloads a DMA map.
Arguments are as follows:
2006-09-18 15:24:20 +00:00
.Bl -tag -width ".Fa dmam"
.It Fa dmat
DMA tag used to allocate
.Fa map .
.It Fa map
The DMA map that is to be unloaded.
.El
.Pp
.Fn bus_dmamap_unload
will not perform any implicit synchronization of DMA buffers.
This must be done explicitly by a call to
.Fn bus_dmamap_sync
prior to unloading the map.
.It Fn bus_dmamap_sync "dmat" "map" "op"
Performs synchronization of a device visible mapping with the CPU visible
memory referenced by that mapping.
Arguments are as follows:
2006-09-18 15:24:20 +00:00
.Bl -tag -width ".Fa dmat"
.It Fa dmat
DMA tag used to allocate
.Fa map .
.It Fa map
The DMA mapping to be synchronized.
.It Fa op
Type of synchronization operation to perform.
See the definition of
.Vt bus_dmasync_op_t
for a description of the acceptable values for
.Fa op .
.El
.Pp
The
.Fn bus_dmamap_sync
function
is the method used to ensure that CPU's and device's direct
memory access (DMA) to shared
memory is coherent.
For example, the CPU might be used to set up the contents of a buffer
that is to be made available to a device.
To ensure that the data are visible via the device's mapping of that
memory, the buffer must be loaded and a DMA sync operation of
.Dv BUS_DMASYNC_PREWRITE
must be performed after the CPU has updated the buffer and before the device
access is initiated.
If the CPU modifies this buffer again later, another
.Dv BUS_DMASYNC_PREWRITE
sync operation must be performed before an additional device
access.
Conversely, suppose a device updates memory that is to be read by a CPU.
In this case, the buffer must be loaded, and a DMA sync operation of
.Dv BUS_DMASYNC_PREREAD
must be performed before the device access is initiated.
The CPU will only be able to see the results of this memory update
once the DMA operation has completed and a
.Dv BUS_DMASYNC_POSTREAD
sync operation has been performed.
.Pp
If read and write operations are not preceded and followed by the
appropriate synchronization operations, behavior is undefined.
.It Fn bus_dmamem_alloc "dmat" "**vaddr" "flags" "*mapp"
Allocates memory that is mapped into KVA at the address returned
in
.Fa vaddr
and that is permanently loaded into the newly created
.Vt bus_dmamap_t
returned via
.Fa mapp .
Arguments are as follows:
2006-09-18 15:24:20 +00:00
.Bl -tag -width ".Fa alignment"
.It Fa dmat
DMA tag describing the constraints of the DMA mapping.
.It Fa vaddr
Pointer to a pointer that will hold the returned KVA mapping of
the allocated region.
.It Fa flags
Flags are defined as follows:
2006-09-18 15:24:20 +00:00
.Bl -tag -width ".Dv BUS_DMA_NOWAIT"
.It Dv BUS_DMA_WAITOK
The routine can safely wait (sleep) for resources.
.It Dv BUS_DMA_NOWAIT
The routine is not allowed to wait for resources.
If resources are not available,
.Dv ENOMEM
is returned.
.It Dv BUS_DMA_COHERENT
Attempt to map this memory in a coherent fashion.
See
.Fn bus_dmamap_create
above for a description of this flag.
For
.Fn bus_dmamem_alloc ,
the
2006-09-18 15:24:20 +00:00
.Dv BUS_DMA_COHERENT
flag is currently implemented on arm and arm64.
2003-07-27 14:05:29 +00:00
.It Dv BUS_DMA_ZERO
Causes the allocated memory to be set to all zeros.
.It Dv BUS_DMA_NOCACHE
The allocated memory will not be cached in the processor caches.
All memory accesses appear on the bus and are executed
without reordering.
For
.Fn bus_dmamem_alloc ,
the
.Dv BUS_DMA_NOCACHE
flag is currently implemented on amd64 and i386 where it results in the
Strong Uncacheable PAT to be set for the allocated virtual address range.
.El
.It Fa mapp
Pointer to a
.Vt bus_dmamap_t
where the resulting DMA map will be stored.
.El
.Pp
The size of memory to be allocated is
.Fa maxsize
as specified in the call to
.Fn bus_dma_tag_create
for
.Fa dmat .
.Pp
The current implementation of
.Fn bus_dmamem_alloc
will allocate all requests as a single segment.
.Pp
An initial load operation is required to obtain the bus address of the allocated
memory, and an unload operation is required before freeing the memory, as
described below in
.Fn bus_dmamem_free .
Maps are automatically handled by this function and should not be explicitly
allocated or destroyed.
.Pp
Although an explicit load is not required for each access to the memory
referenced by the returned map, the synchronization requirements
as described in the
.Fn bus_dmamap_sync
2006-12-14 14:33:13 +00:00
section still apply and should be used to achieve portability on architectures
without coherent buses.
.Pp
Returns
.Er ENOMEM
if sufficient memory is not available for completing
the operation.
.It Fn bus_dmamem_free "dmat" "*vaddr" "map"
Frees memory previously allocated by
.Fn bus_dmamem_alloc .
Any mappings
will be invalidated.
Arguments are as follows:
2006-09-18 15:24:20 +00:00
.Bl -tag -width ".Fa vaddr"
.It Fa dmat
DMA tag.
.It Fa vaddr
Kernel virtual address of the memory.
.It Fa map
DMA map to be invalidated.
.El
.El
.Sh RETURN VALUES
Behavior is undefined if invalid arguments are passed to
any of the above functions.
If sufficient resources cannot be allocated for a given
transaction,
.Er ENOMEM
is returned.
All
routines that are not of type
.Vt void
will return 0 on success or an error
code on failure as discussed above.
.Pp
All
.Vt void
routines will succeed if provided with valid arguments.
.Sh LOCKING
Two locking protocols are used by
.Nm .
The first is a private global lock that is used to synchronize access to the
bounce buffer pool on the architectures that make use of them.
This lock is strictly a leaf lock that is only used internally to
.Nm
and is not exposed to clients of the API.
.Pp
The second protocol involves protecting various resources stored in the tag.
Since almost all
.Nm
operations are done through requests from the driver that created the tag,
the most efficient way to protect the tag resources is through the lock that
the driver uses.
In cases where
.Nm
acts on its own without being called by the driver, the lock primitive
specified in the tag is acquired and released automatically.
An example of this is when the
.Fn bus_dmamap_load
2006-04-29 00:43:23 +00:00
callback function is called from a deferred context instead of the driver
context.
This means that certain
.Nm
2006-09-18 15:24:20 +00:00
functions must always be called with the same lock held that is specified in the
tag.
These functions include:
.Pp
.Bl -item -offset indent -compact
.It
.Fn bus_dmamap_load
.It
.Fn bus_dmamap_load_bio
.It
.Fn bus_dmamap_load_ccb
2006-09-18 15:24:20 +00:00
.It
.Fn bus_dmamap_load_mbuf
.It
.Fn bus_dmamap_load_mbuf_sg
.It
.Fn bus_dmamap_load_uio
.It
2006-09-18 15:24:20 +00:00
.Fn bus_dmamap_unload
.It
.Fn bus_dmamap_sync
.El
.Pp
There is one exception to this rule.
It is common practice to call some of these functions during driver start-up
without any locks held.
So long as there is a guarantee of no possible concurrent use of the tag by
different threads during this operation, it is safe to not hold a lock for
these functions.
.Pp
Certain
.Nm
operations should not be called with the driver lock held, either because
they are already protected by an internal lock, or because they might sleep
2006-09-18 15:24:20 +00:00
due to memory or resource allocation.
The following functions must not be
called with any non-sleepable locks held:
.Pp
2006-09-18 15:24:20 +00:00
.Bl -item -offset indent -compact
.It
.Fn bus_dma_tag_create
.It
.Fn bus_dmamap_create
.It
.Fn bus_dmamem_alloc
.El
.Pp
All other functions do not have a locking protocol and can thus be
called with or without any system or driver locks held.
.Sh SEE ALSO
.Xr devclass 9 ,
.Xr device 9 ,
.Xr driver 9 ,
.Xr rman 9 ,
.Xr vslock 9
.Pp
.Rs
.%A "Jason R. Thorpe"
.%T "A Machine-Independent DMA Framework for NetBSD"
.%J "Proceedings of the Summer 1998 USENIX Technical Conference"
.%Q "USENIX Association"
.%D "June 1998"
.Re
.Sh HISTORY
The
.Nm
interface first appeared in
.Nx 1.3 .
.Pp
The
.Nm
API was adopted from
.Nx
for use in the CAM SCSI subsystem.
The alterations to the original API were aimed to remove the need for
a
.Vt bus_dma_segment_t
array stored in each
.Vt bus_dmamap_t
while allowing callers to queue up on scarce resources.
.Sh AUTHORS
The
.Nm
interface was designed and implemented by
.An Jason R. Thorpe
of the Numerical Aerospace Simulation Facility, NASA Ames Research Center.
Additional input on the
.Nm
design was provided by
.An -nosplit
.An Chris Demetriou ,
.An Charles Hannum ,
.An Ross Harvey ,
.An Matthew Jacob ,
.An Jonathan Stone ,
and
.An Matt Thomas .
.Pp
The
.Nm
interface in
.Fx
benefits from the contributions of
.An Justin T. Gibbs ,
.An Peter Wemm ,
.An Doug Rabson ,
.An Matthew N. Dodd ,
.An Sam Leffler ,
.An Maxime Henrion ,
.An Jake Burkholder ,
.An Takahashi Yoshihiro ,
.An Scott Long
and many others.
.Pp
This manual page was written by
.An Hiten M. Pandya
and
.An Justin T. Gibbs .