Inline Encryption

Background

Inline encryption hardware sits logically between memory and disk, and can en/decrypt data as it goes in/out of the disk. For each I/O request, software can control exactly how the inline encryption hardware will en/decrypt the data in terms of key, algorithm, data unit size (the granularity of en/decryption), and data unit number (a value that determines the initialization vector(s)).

Some inline encryption hardware accepts all encryption parameters including raw keys directly in low-level I/O requests. However, most inline encryption hardware instead has a fixed number of "keyslots" and requires that the key, algorithm, and data unit size first be programmed into a keyslot. Each low-level I/O request then just contains a keyslot index and data unit number.

Note that inline encryption hardware is very different from traditional crypto accelerators, which are supported through the kernel crypto API. Traditional crypto accelerators operate on memory regions, whereas inline encryption hardware operates on I/O requests. Thus, inline encryption hardware needs to be managed by the block layer, not the kernel crypto API.

Inline encryption hardware is also very different from "self-encrypting drives", such as those based on the TCG Opal or ATA Security standards. Self-encrypting drives don't provide fine-grained control of encryption and provide no way to verify the correctness of the resulting ciphertext. Inline encryption hardware provides fine-grained control of encryption, including the choice of key and initialization vector for each sector, and can be tested for correctness.

Objective

We want to support inline encryption in the kernel. To make testing easier, we also want support for falling back to the kernel crypto API when actual inline encryption hardware is absent. We also want inline encryption to work with layered devices like device-mapper and loopback (i.e. we want to be able to use the inline encryption hardware of the underlying devices if present, or else fall back to crypto API en/decryption).

Constraints and notes

  • We need a way for upper layers (e.g. filesystems) to specify an encryption context to use for en/decrypting a bio, and device drivers (e.g. UFSHCD) need to be able to use that encryption context when they process the request. Encryption contexts also introduce constraints on bio merging; the block layer needs to be aware of these constraints.

  • Different inline encryption hardware has different supported algorithms, supported data unit sizes, maximum data unit numbers, etc. We call these properties the "crypto capabilities". We need a way for device drivers to advertise crypto capabilities to upper layers in a generic way.

  • Inline encryption hardware usually (but not always) requires that keys be programmed into keyslots before being used. Since programming keyslots may be slow and there may not be very many keyslots, we shouldn't just program the key for every I/O request, but rather keep track of which keys are in the keyslots and reuse an already-programmed keyslot when possible.

  • Upper layers typically define a specific end-of-life for crypto keys, e.g. when an encrypted directory is locked or when a crypto mapping is torn down. At these times, keys are wiped from memory. We must provide a way for upper layers to also evict keys from any keyslots they are present in.

  • When possible, device-mapper devices must be able to pass through the inline encryption support of their underlying devices. However, it doesn't make sense for device-mapper devices to have keyslots themselves.

Basic design

We introduce struct blk_crypto_key to represent an inline encryption key and how it will be used. This includes the actual bytes of the key; the size of the key; the algorithm and data unit size the key will be used with; and the number of bytes needed to represent the maximum data unit number the key will be used with.

We introduce struct bio_crypt_ctx to represent an encryption context. It contains a data unit number and a pointer to a blk_crypto_key. We add pointers to a bio_crypt_ctx to struct bio and struct request; this allows users of the block layer (e.g. filesystems) to provide an encryption context when creating a bio and have it be passed down the stack for processing by the block layer and device drivers. Note that the encryption context doesn't explicitly say whether to encrypt or decrypt, as that is implicit from the direction of the bio; WRITE means encrypt, and READ means decrypt.

We also introduce struct blk_crypto_profile to contain all generic inline encryption-related state for a particular inline encryption device. The blk_crypto_profile serves as the way that drivers for inline encryption hardware advertise their crypto capabilities and provide certain functions (e.g., functions to program and evict keys) to upper layers. Each device driver that wants to support inline encryption will construct a blk_crypto_profile, then associate it with the disk's request_queue.

The blk_crypto_profile also manages the hardware's keyslots, when applicable. This happens in the block layer, so that users of the block layer can just specify encryption contexts and don't need to know about keyslots at all, nor do device drivers need to care about most details of keyslot management.

Specifically, for each keyslot, the block layer (via the blk_crypto_profile) keeps track of which blk_crypto_key that keyslot contains (if any), and how many in-flight I/O requests are using it. When the block layer creates a struct request for a bio that has an encryption context, it grabs a keyslot that already contains the key if possible. Otherwise it waits for an idle keyslot (a keyslot that isn't in-use by any I/O), then programs the key into the least-recently-used idle keyslot using the function the device driver provided. In both cases, the resulting keyslot is stored in the crypt_keyslot field of the request, where it is then accessible to device drivers and is released after the request completes.

struct request also contains a pointer to the original bio_crypt_ctx. Requests can be built from multiple bios, and the block layer must take the encryption context into account when trying to merge bios and requests. For two bios/requests to be merged, they must have compatible encryption contexts: both unencrypted, or both encrypted with the same key and contiguous data unit numbers. Only the encryption context for the first bio in a request is retained, since the remaining bios have been verified to be merge-compatible with the first bio.

To make it possible for inline encryption to work with request_queue based layered devices, when a request is cloned, its encryption context is cloned as well. When the cloned request is submitted, it is then processed as usual; this includes getting a keyslot from the clone's target device if needed.

blk-crypto-fallback

It is desirable for the inline encryption support of upper layers (e.g. filesystems) to be testable without real inline encryption hardware, and likewise for the block layer's keyslot management logic. It is also desirable to allow upper layers to just always use inline encryption rather than have to implement encryption in multiple ways.

Therefore, we also introduce blk-crypto-fallback, which is an implementation of inline encryption using the kernel crypto API. blk-crypto-fallback is built into the block layer, so it works on any block device without any special setup. Essentially, when a bio with an encryption context is submitted to a block_device that doesn't support that encryption context, the block layer will handle en/decryption of the bio using blk-crypto-fallback.

For encryption, the data cannot be encrypted in-place, as callers usually rely on it being unmodified. Instead, blk-crypto-fallback allocates bounce pages, fills a new bio with those bounce pages, encrypts the data into those bounce pages, and submits that "bounce" bio. When the bounce bio completes, blk-crypto-fallback completes the original bio. If the original bio is too large, multiple bounce bios may be required; see the code for details.

For decryption, blk-crypto-fallback "wraps" the bio's completion callback (bi_complete) and private data (bi_private) with its own, unsets the bio's encryption context, then submits the bio. If the read completes successfully, blk-crypto-fallback restores the bio's original completion callback and private data, then decrypts the bio's data in-place using the kernel crypto API. Decryption happens from a workqueue, as it may sleep. Afterwards, blk-crypto-fallback completes the bio.

In both cases, the bios that blk-crypto-fallback submits no longer have an encryption context. Therefore, lower layers only see standard unencrypted I/O.

blk-crypto-fallback also defines its own blk_crypto_profile and has its own "keyslots"; its keyslots contain struct crypto_skcipher objects. The reason for this is twofold. First, it allows the keyslot management logic to be tested without actual inline encryption hardware. Second, similar to actual inline encryption hardware, the crypto API doesn't accept keys directly in requests but rather requires that keys be set ahead of time, and setting keys can be expensive; moreover, allocating a crypto_skcipher can't happen on the I/O path at all due to the locks it takes. Therefore, the concept of keyslots still makes sense for blk-crypto-fallback.

Note that regardless of whether real inline encryption hardware or blk-crypto-fallback is used, the ciphertext written to disk (and hence the on-disk format of data) will be the same (assuming that both the inline encryption hardware's implementation and the kernel crypto API's implementation of the algorithm being used adhere to spec and function correctly).

blk-crypto-fallback is optional and is controlled by the CONFIG_BLK_INLINE_ENCRYPTION_FALLBACK kernel configuration option.

API presented to users of the block layer

blk_crypto_config_supported() allows users to check ahead of time whether inline encryption with particular crypto settings will work on a particular block_device -- either via hardware or via blk-crypto-fallback. This function takes in a struct blk_crypto_config which is like blk_crypto_key, but omits the actual bytes of the key and instead just contains the algorithm, data unit size, etc. This function can be useful if blk-crypto-fallback is disabled.

blk_crypto_init_key() allows users to initialize a blk_crypto_key.

Users must call blk_crypto_start_using_key() before actually starting to use a blk_crypto_key on a block_device (even if blk_crypto_config_supported() was called earlier). This is needed to initialize blk-crypto-fallback if it will be needed. This must not be called from the data path, as this may have to allocate resources, which may deadlock in that case.

Next, to attach an encryption context to a bio, users should call bio_crypt_set_ctx(). This function allocates a bio_crypt_ctx and attaches it to a bio, given the blk_crypto_key and the data unit number that will be used for en/decryption. Users don't need to worry about freeing the bio_crypt_ctx later, as that happens automatically when the bio is freed or reset.

Finally, when done using inline encryption with a blk_crypto_key on a block_device, users must call blk_crypto_evict_key(). This ensures that the key is evicted from all keyslots it may be programmed into and unlinked from any kernel data structures it may be linked into.

In summary, for users of the block layer, the lifecycle of a blk_crypto_key is as follows:

  1. blk_crypto_config_supported() (optional)

  2. blk_crypto_init_key()

  3. blk_crypto_start_using_key()

  4. bio_crypt_set_ctx() (potentially many times)

  5. blk_crypto_evict_key() (after all I/O has completed)

  6. Zeroize the blk_crypto_key (this has no dedicated function)

If a blk_crypto_key is being used on multiple block_devices, then blk_crypto_config_supported() (if used), blk_crypto_start_using_key(), and blk_crypto_evict_key() must be called on each block_device.

API presented to device drivers

A device driver that wants to support inline encryption must set up a blk_crypto_profile in the request_queue of its device. To do this, it first must call blk_crypto_profile_init() (or its resource-managed variant devm_blk_crypto_profile_init()), providing the number of keyslots.

Next, it must advertise its crypto capabilities by setting fields in the blk_crypto_profile, e.g. modes_supported and max_dun_bytes_supported.

It then must set function pointers in the ll_ops field of the blk_crypto_profile to tell upper layers how to control the inline encryption hardware, e.g. how to program and evict keyslots. Most drivers will need to implement keyslot_program and keyslot_evict. For details, see the comments for struct blk_crypto_ll_ops.

Once the driver registers a blk_crypto_profile with a request_queue, I/O requests the driver receives via that queue may have an encryption context. All encryption contexts will be compatible with the crypto capabilities declared in the blk_crypto_profile, so drivers don't need to worry about handling unsupported requests. Also, if a nonzero number of keyslots was declared in the blk_crypto_profile, then all I/O requests that have an encryption context will also have a keyslot which was already programmed with the appropriate key.

If the driver implements runtime suspend and its blk_crypto_ll_ops don't work while the device is runtime-suspended, then the driver must also set the dev field of the blk_crypto_profile to point to the struct device that will be resumed before any of the low-level operations are called.

If there are situations where the inline encryption hardware loses the contents of its keyslots, e.g. device resets, the driver must handle reprogramming the keyslots. To do this, the driver may call blk_crypto_reprogram_all_keys().

Finally, if the driver used blk_crypto_profile_init() instead of devm_blk_crypto_profile_init(), then it is responsible for calling blk_crypto_profile_destroy() when the crypto profile is no longer needed.

Layered Devices

Request queue based layered devices like dm-rq that wish to support inline encryption need to create their own blk_crypto_profile for their request_queue, and expose whatever functionality they choose. When a layered device wants to pass a clone of that request to another request_queue, blk-crypto will initialize and prepare the clone as necessary.

Interaction between inline encryption and blk integrity

At the time of this patch, there is no real hardware that supports both these features. However, these features do interact with each other, and it's not completely trivial to make them both work together properly. In particular, when a WRITE bio wants to use inline encryption on a device that supports both features, the bio will have an encryption context specified, after which its integrity information is calculated (using the plaintext data, since the encryption will happen while data is being written), and the data and integrity info is sent to the device. Obviously, the integrity info must be verified before the data is encrypted. After the data is encrypted, the device must not store the integrity info that it received with the plaintext data since that might reveal information about the plaintext data. As such, it must re-generate the integrity info from the ciphertext data and store that on disk instead. Another issue with storing the integrity info of the plaintext data is that it changes the on disk format depending on whether hardware inline encryption support is present or the kernel crypto API fallback is used (since if the fallback is used, the device will receive the integrity info of the ciphertext, not that of the plaintext).

Because there isn't any real hardware yet, it seems prudent to assume that hardware implementations might not implement both features together correctly, and disallow the combination for now. Whenever a device supports integrity, the kernel will pretend that the device does not support hardware inline encryption (by setting the blk_crypto_profile in the request_queue of the device to NULL). When the crypto API fallback is enabled, this means that all bios with and encryption context will use the fallback, and IO will complete as usual. When the fallback is disabled, a bio with an encryption context will be failed.