Blobs Anatomy - Dynamic Links - part 3

In this blog post, I’m reviving my series on the internals of data blobs in Cinode. Dynamic links are a cornerstone of Cinode’s design, enabling secure, updatable references to data in a decentralized network. So far, we’ve covered topics related to dynamic links, most notably:

Data layout and verification of dynamic links on the public layer where I describe how the network is protected against bogus data propagation and how the forward-progress rule is applied on dynamic links
Encryption of the link target where we can see how dynamic link encrypts the data and what is needed to do this safely

The previous post on encryption briefly introduced a peculiar construct called the ‘key validation block.’ Today’s post will delve into this enigmatic piece of data. With these additional details, the picture becomes significantly more intricate. Here it is:

Dynamic Links - Private Layer — View of dynamic link private layers with validation flows.

Before we go further, I’d like to emphasize that this content may be hard to understand without the knowledge of cryptographic terms. I also encourage you to read first two posts in the series.

Deterministic generation of keys and IVs

Cinode is designed for use in hostile environments where the trustworthiness of other network actors cannot be fully guaranteed. The majority of attack vectors are mitigated through encryption. All network data is transmitted securely, without plaintext, and the network rejects any data that cannot be verifiably authenticated. However, this represents just one layer of protection.

Another point of concern lies within the private layer—data transmitted between users. What if a publisher, even with the correct key pairs, cannot be trusted? This issue might not always stem from malicious intent. It could also arise due to implementation errors or technical limitations. A publisher could also be the target of a malicious attack. I’d like to now focus on one such technically challenging area: cryptographically secure random generation. Many cryptographic schemes require a robust random source. Any bias or predictability in this random data can severely compromise security.

In practice, generating strong random data is challenging. Think of it like rolling a die—if the die is biased, the outcomes aren’t truly random, and an attacker could predict the results. Serious flaws have been identified in system-wide random generators, such as those in Microsoft Windows or Debian’s OpenSSL library. On Linux, the transition to virtual machines has led to concerns about the quality of random data. There was also a debate on the Linux kernel mailing lists not so long ago about whether to disable the latest AMD CPU extensions for random generation due to bugs and stability issues.

There’s another property of random numbers—often a desired one—that simultaneously alarms me. Given some random value, we cannot distinguish between a truly random one and some arbitrary value injected there on purpose. That’s a perfect place to insert carefully crafted values introducing some kind of a backdoor. Even now, some people believe that certain magic numbers in popular ciphers were not really chosen randomly, questioning the security of those constructs.

Cinode often adopts a cautious, almost paranoid approach to using random sources. Maybe a bit too aggressively, but I’ve started avoiding random sources whenever possible. For example, the early version of the design allowed random keys for static blob encryption, whereas the current implementation strictly forbids such random keys; the reader must reject a blob if the encryption key was not generated in a deterministic and verifiable way.

Speaking of dynamic links, there are two very specific places where a replacement for random data is needed—the symmetric encryption key and the corresponding initialization vector (IV) value. Let’s take a look at both cases and see how the random source was replaced with a deterministic input.

IV generation

IV value is a bit simpler than the generation of the encryption key so let’s start with that one:

Recalling some knowledge from the cryptography world, the initialization vector (IV) must meet the requirement of being unique for different plaintext inputs when the same key is used. The construct chosen for dynamic links makes it not only unique for different data links but also unpredictable, making it a good choice not only for XChaCha20 but also for other ciphers requiring pseudorandom IV values.

Every dynamic link will only have one encryption key assigned for all its versions, which means that different versions of the same link need different IV values. How can we ensure the IV is unique for each input dataset? Let’s break this down:

Uniqueness Requirement: The IV must be unique to prevent reuse with the same key, which could expose patterns in the encrypted data.
Deterministic Derivation: The only way to make it deterministic is to derive it from the link data itself. Only the plaintext data of the link can be used since the encrypted version will require the IV to be generated.
Preventing Data Leakage: We have to be especially careful to avoid leaking the data of the link through the IV value.

The solution used here is a SHA-256 hashing function. For different link versions, the SHA-256 value will be different (otherwise, we’d break SHA-256), and thus the IV will also be unique. Also, the irreversible nature of a cryptographically secure hashing function prevents any information leakage.

Using a hashing function does not automatically mean that this part of the system is secure. Every time secure hashing is involved, we must ensure that the attacker cannot easily fool the system, forcing it to use the same input somewhere else. For that reason, the SHA-256 value is computed from the following fields:

Prefix indicating generation of IV - constant byte of a fixed value: 0x02, that way we avoid collisions of input data and thus same hashing function outputs for different types of input data
One byte indicating the used encryption scheme - constant byte of a fixed value: 0x00 - since the used encryption scheme is affecting the hash, switching that scheme to a different one will generate different IV value
One byte indicating the type of the link - constant byte of a fixed value: 0x02 - that way similar IV generation scheme could be used in the future for other blob types and would still avoid equal data on the input and thus same hashes
Blob name - similarly to how the signature of dynamic link is generated, deriving the hash from name of the blob too means that there’s no way to have the same IV for different links even if all other parameters are the same. This value is length-prefixed which is necessary to avoid issues in the future when the length of a blob name would have to be changed

The output of the SHA-256 function is 32 bytes where the IV value in case of XChaCha20 needs only 24 bytes for the IV. To convert one to the other we simply take the first 24 bytes of the SHA-256 digest and treat it as the IV value. This construct is still secure because a truncated output of a cryptographically secure hash retains security when the truncation size is sufficiently large.

Key generation

Now that I’ve shown how the IV is generated, let’s focus on the more complex part—the symmetric encryption key.

Similarly to the IV, the symmetric encryption key must be calculated deterministically to eliminate any risk of tampering with its quality.

We will use similar approach here - also with the output of a hashing function, but this time using the following source fields:

Prefix indicating generation of the key - constant byte of a fixed value: 0x01, to avoid collision with hashes used for other fields, e.g. IV generation
One byte indicating the used encryption scheme - constant byte of a fixed value: 0x00 - same as in case of the IV
One byte indicating the type of the link - constant byte of a fixed value: 0x02 - this also follows the same scheme as the IV
Key source signature - a mysterious data that I’ll write about later

Since the symmetric encryption key must be deterministic, it needs to be derived from something tied to the link’s identity. It cannot rely on the link content. If it did, then on every new version of the link, all the references containing both the blob name and the encryption key would have to be updated as well. This is against the idea of a dynamic link where the link’s reference must stay constant while the content of the link can change.

Another important requirement is that the symmetric encryption key must be secret; only authorized readers must be allowed to read the link data. To ensure this is the case, the key must be derived from some private data. Otherwise, anybody would be able to come up with the encryption key by simply scanning the public link’s data and decrypt the content behind such a link. This requirement, confronted with the previous one, means that the symmetric encryption key must be derived from the private key of the link.

Of course, the private key must be known only to the person who creates and later updates the link, not the reader who is supposed to do the validation of the symmetric encryption key. Deriving that symmetric encryption key must thus be done in such a way that the private link key does not leak. We could try something trivial such as hashing the private key representation. That itself would generate pseudorandom symmetric encryption keys. But then the reader wouldn’t be able to verify if the key was generated properly or was chosen on purpose. That’s because the private key would be necessary for such a verification, and only the publisher, not the reader, has access to it.

Because we cannot use the private key itself, let’s use something derived from it—we could use a signature generated with that key to create the symmetric encryption key without exposing the private key. How it works is that using the private key, we generate a signature of some input data, then use that signature as an input for a hashing function and take the result of the hashing function to derive the symmetric encryption key. Knowing the public key, the reader can ensure the signature was made by a person in possession of the private key. So far, so good, but what should be the input for the signature?

A good choice is the name of the blob. It contains all the unchanging parts of the link embedded. By signing it, we make the verification process very strong and deterministic. Of course, a signature over the blob name is not good enough. The same private key is still used for other signatures, so we must guarantee that the publisher is not fooled into signing something else that would, in the end, turn out to be the blob name. That would reveal the symmetric key. To avoid such a mistake, we still have to prepend a content type prefix here—a fixed value of 0x01.

Key Source Signature — Key source signature generation.

Now let’s recall how the link data is signed. The link’s data signature uses the same public/private key pair for signature generation. But for link data, the prefix value is 0x00, and additionally, the input is hashed before being signed. There’s a slight misalignment between the blob name signature used for symmetric key generation and the link data signature. Currently, it poses no risk since the signed messages differ in length, plus the hash used for link data makes it impossible to generate a signature of chosen input data. For sure, this is something that should be fixed in the updated format version of the dynamic link.

The last topic to cover is how the reader verifies that the symmetric key is built deterministically from the correct input. To prove that, the reader must attempt to recreate the key from the data it is able to access. For that reason, the signature used for symmetric key generation is added to the encrypted dataset of the link:

Key Validation Block — Reader-side key validation using the embedded key source signature.

Key verification occurs after the data is decrypted. It is performed in two steps—first, we check whether the signature is genuine by validating it using the public key of the link and the blob name. Once verified as genuine, the symmetric key is regenerated from the signature, and it must match the symmetric key used to decrypt the data itself. Even if the first step succeeds but the second one fails, the link data must be rejected as an invalid one.

Key validation is not verified on the public network layer. This is a caveat for which I have yet to find a solution. Ideally, the public network should also contribute to the rejection of such a link, but my gut feeling tells me that it would be easier to prove that it’s impossible to have such validation on the public network layer—due to the encrypted nature of the keys—than to find a way for the public network to reject invalid symmetric encryption keys. Potential mitigations could include rate‑limiting, publisher reputation, or future privacy‑preserving proofs.

Quick summary

In the last three posts from deep dive series I’ve covered all the internals of dynamic links. We went through public layer validation, link data encryption and decryption and finally we talked about additional mechanisms to ensure strength of encryption keys presented in this post.

Future changes to dynamic links

While working on that content I could already summarize potential future changes to dynamic links:

While the link signature is currently located at the beginning of the link data, it could also be added at the end of the link. Placing it at the beginning simplified stream processing for the reader but complicated it for the publisher. Storing the signature at the end would allow both reader and publisher to work in a truly streaming way with only slightly more complex code for the reader. This change would improve efficiency in data handling for both parties.
Private link keys are used for two different purposes—signing the link data and in the key validation block. The way those signatures are generated is inconsistent and should be unified

Coming soon to deep dive series

A new blob type is already in the works so if you liked the deep dive series so far there’s a lot more content coming up next. To reveal a little, the new construct will be even more complex than dynamic links but powerful enough to enable numerous new applications within the Cinode network. Stay tuned.

Links to all articles from the series:

Contents