Blobs Anatomy - Dynamic Links - part 2
Contents
This is the second post in the series of dynamic links anatomy. Last time I described the public layer of dynamic links. Now it’s time to dive into the private one.
As a quick recap - the public layer prevents propagation of incorrect data by performing cryptographic validation but is only allowed to work on the encrypted dataset. The private layer on the other hand works on the decrypted data relying on the validation that was already made on the public layer. Additional data validation checks are performed in this layer too to rule out inconsistencies that can only be detected with access to the plaintext data.
As in the previous post, before we start diving into details let’s bring up an accompanying picture:
Note that this is a simplified version of the whole private data flow but covers all the parts that I’ll discuss in this post. A more complete visualization will be available in the next post in this series 😉.
And as before, here’s an obligatory warning: this content requires some knowledge about cryptography and deals with non-trivial content. Some terms have links attached which may help exploring that knowledge a bit further.
Different keys used in dynamic links
Dynamic link operates on two sets of keys, so let’s make a clear distinction here:
- asymmetric keys for signature generation - those keys are used on the public layer and protect the public dataset from tampering attempts
- symmetric keys for plaintext data protection - those keys are used to encrypt and decrypt link’s data and protect the private section of the link.
Keep in mind that this post will mostly talk about the symmetric keys since we’re dealing with the private layer.
From plaintext to ciphertext and back again
Starting from the top - we begin with the link’s encrypted data stream. You may remember this part from the previous post about dynamic links. That is the last section placed at the end of the serialized dynamic link.
This section is protected by the link’s signature so we can safely assume that it was prepared by someone who knew the private key associated with the link i.e. the publisher.
Conversion between encrypted and plaintext form of link’s data is done through a well known encryption algorithm - XChaCha20.
Why XChaCha20?
Similarly to the current signature algorithm used by dynamic links, I’ve chosen a single fixed stream cipher for now so that I could start with a small codebase and have an algorithm that comes with certain properties. Why XChaCha20? I’ve selected that one for few reasons:
-
Stream cipher
By using stream cipher we can easily encrypt and decrypt data of arbitrary size (in bytes) without the need of any additional padding or requirement to process data in blocks rather than bytes. We leak out the size of the plaintext though which also happens on static blobs. This tells me that some blurring mechanism to hide this information may be necessary later.
-
Hardware support
Both ARM and x86 architectures have fast XChaCha20 algorithm implementations that do not need any dedicated CPU instructions. The way XChaCha20 is designed also minimizes side-channel attacks.
I initially considered AES with CTR block chaining mode as the encryption algorithm which is still a golden standard. But it could cause performance issues on mobile devices or non x86 CPUs that do not have AES-dedicated instructions.
-
IV size
The XChaCha20 variant uses 24-byte nonce. It is an extended version of the base ChaCha20 that uses only 12 bytes and would be insufficient for Cinode. XChaCha20 uses nonce as one of its inputs which is only required to be different for different plaintexts. The way IV is calculated is both unique and unpredictable (pseudorandom) thus it guarantees much more than what nonce needs. For that reason I will still refer to it as IV.
Due to the birthday paradox, the base ChaCha20 variant using 12-byte IV can be securely used for no more than 2^32 different plaintext values. That could be an issue for very frequently changing links. Generating over 2^32 different inputs is not really such a big deal nowadays and is within the reach of a common computer - e.g. if a new link is generated every millisecond then it would take roughly 50 days to cross the safe limit.
The
version
value in the link is 64-bit to avoid this issue and the IV should also be large enough to match such potential quantity of input values. XChaCha20 uses 24-byte IV and sustains 2^64 different input values. That’s already a lot. And if for any reason the amount of links generated would cross the magic 2^64 barrier by reusing version numbers, it is still possible to make a safe construct by creating recursive links that point to other links, which I’d like to explore a bit more in some future post.
Summary:
XChaCha20 is currently the only symmetric cipher used in dynamic links. It fulfils all technical requirements needed by those links. Also focusing on a single cipher simplifies the initial implementation.
Symmetric encryption Key and IV
To do a symmetric encryption and decryption, XChaCha20 requires both the key and the IV. There’s a specific way in which those values are generated but since that’s a broader topic I decided to write a separate post dedicated to generation of those values.
Authenticated data
Looking at the data flow it may be surprising why I’ve chosen to use a low-level stream cipher instead of using an authenticated encryption scheme such as XChaCha20Poly1305. To answer that question let’s recall what the authenticated encryption guarantees.
Authenticating the source message ensures that there’s no easy way for an attacker to inject a ciphertext of one’s choice into the system. Unauthenticated ciphers on the other hand blindly take the input treating it as valid ciphertext and decrypt it even if such decryption makes no sense. Of course it could be exploited leading to chosen ciphertext attacks which in the end may completely defeat the whole encryption.
When taking a broader look at dynamic links though and considering what happens on the public layer, it is clearly visible that the authentication part is realized through the signature. Malicious actor can not inject a custom ciphertext into the system because it will not have a correct signature and thus it will be rejected - exactly what is expected from the authenticated encryption scheme.
Summary:
Signature checked in the public part of the link authenticates the ciphertext thus the link as a whole uses authenticated encryption scheme.
Inside of the link’s plaintext data
After the data is decrypted, we’re a bit closer to our final goal - finding the information that points to the target blob. But there’s one more layer that has to be interpreted. Putting just the link data into the plain dataset would be too easy, wouldn’t it 😉?
Format version prefix
Similarly to the public link part, we start the data with a format version byte which indicates the way data in the plaintext is aligned. Currently it must be 0
and is a safety fuse in case the format has to be updated in the future. This format version byte is independent from the one on the public part of the link. And since we’re not using it anywhere to create a Blob Name, we can introduce fixes to this part of the design and the Blob Name can stay unchanged.
Summary:
Format version allows introducing fixes to the data format used for link’s plaintext.
Key validation block
This part of the link plaintext may look strange. I don’t recall seeing something similar in other cryptographic systems so far. Its purpose is to ensure that the symmetric encryption key for the link was generated properly. For now let’s assume it is just some magic black box full of random data and in the next post about dynamic links I’ll explain exactly why it is needed.
This field is length-prefixed and is related to the asymmetric key scheme used for the link. Using a different scheme would change the length of this field thus I assumed that it’s better to introduce the length field now even though we only use a single asymmetric key algorithm. Currently that length is encoded using a single byte and can hold values in the 0..127 range (highest bit is reserved for future extensions).
Summary:
Key validation block contains crucial information needed to verify the symmetric key that was used to decrypt the link’s data
Link data
Finally! Last but not least here is the essence of the dynamic link - the data that points to the target blob. All the other layers are merely to ensure that this section is protected against various attack vectors.
Again as in the case of the public part - it is at the end of data to allow processing it in a streaming fashion. At this level we don’t interpret it. Internally it uses protobuf-encoded identification of the target blob, but from the technical point of view it is not strictly required. It may actually be the case in the future that I’ll lift the restriction and dynamic links would in fact just store some arbitrary data just like static blobs and links to other blobs would be only one of their usages.
Summary:
Link data contains the target link information and uses protobuf for data serialization.
Next in the series
Let’s end our journey today. We still have some topics to cover regarding dynamic links so expect at least one more post dealing with dynamic links in the future.
Links to all articles from the series:
Author BYO
LastMod 2023-09-27