Another step in the implementation journey

This time we’ll take a look at the implementation of blob encryption layer. Just before we start I give you one puzzle to solve - there’s one serious security flow in current implementation. I wonder if you’ll be able to spot it. I plan to show and fix it in the next post.

General idea

Blob encryption will be another layer of code, separate from the fist one - datastore. To keep it isolated, it will have it’s own namespace: blenc. (Yup, I was too lazy to come up with a better name ;))

What would be the purpose of this layer? It will handle three major operations:

  1. Encrypt data
  2. Decrypt data
  3. Generate encryption parameters (keys, ciphers, IVs)

Encryption and decryption is pretty obvious here: on the one end of the layer we’ll be talking in plain text, on the other side we’ll only see the encrypted stuff.

This layer also handles selection of ciphers and their parameters. Since we’re handling both cipher selection and encryption parameters in a code close to each other, it will be easier to match them correctly and avoid subtle bugs leading to security disasters.

Ciphers

I decided to use stream ciphers. It simplifies implementation since no secure padding is needed to round data to block size. Also, the size of encrypted blob and the plaintext one is the same. This is both good and bad - good since we’re not occupying more space due to encryption, bad because it reveals the size of the original data. It’s not yet a security flow but something to think about in layers on higher levels where we can handle the issue much more efficiently.

Currently there are two well-known ciphers supported in the code: AES256-CTR and ChaCha20. AES was obvious for it’s wide adoption and years of cryptoanalysis. Used with CTR mode it creates really strong stream cipher. ChaCha20 turns out to be good choice too. It’s strength is in it’s great performance (especially on mobiles where there’s no hardware acceleration for AES). If you wonder whether anybody you know is using it in production now, take a look at Cloudflare and Google.

I have selected ChaCha20 to be the default encryption cipher. I believe it has a great potential and will become more widely adopted in the world of smaller devices (IoT is just around the corner). The implementation you can find in go is also very fast, easily outperforming AES if no hardware acceleration is present.

KeyGens

Key generation was a bit tricky. First I implemented trivial key generator that always returns predefined key data. Because the same IV is used for every blob, this implementation is really unsafe and fundamentally broken - it will generate the same key+IV pairs for many blobs. Because of that, the implementation is internal to the module and I use it only for testing purposes.

Another simple implementation was a random key generator. It’s almost as simple as the constant-key one but it does provide high quality random keys. As long as we’re not relying on data deduplication, this method is the best way to generate keys.

The last key generator I implemented in this iteration was a contents-based key generator. It’s purpose is to create key from the hash of blob’s contents. It is secure because it will generate different keys for different blobs - that requirement is guaranteed by collision resistance property of cryptographic hash function. This generator is the only one so far which requires reading whole data buffer (to calculate hash) before we start encrypting it. Data source is given as only io.Reader interface - meaning we can not seek back. This forces us to store the data in some kind of temporary buffer - and this was the hardest part to implement for this key generation scheme.

Tricky buffer

There are few naïve approaches to temporary buffer. First is to store everything in memory. Of course this would limit the size of blobs we could handle. Another idea is to store the data in some temporary file and then read id back. It sounds like a much better approach and has indeed been used in my implementation but if implemented incorrectly it would lead to serious security flows.

First of all, the data stored in temporary files could accidentally be made readable by other users on the same system revealing plaintext data to them. Second, this data would end up being written down onto local filesystem and HDD - secure erasure of data is not a trivial problem and in case of flash-based memory may not be possible at all. That’s why in my implementation I’m storing the data in an encrypted form. The key is random and is kept inside memory only. Losing the key means losing access to the plaintext which is exactly what we need. There’s a large encryption overhead though (3x: encrypt data when storing to temporary file, decrypt data when reading from temporary file and do the final encryption) but I believe it’s worth it. The implementation of secure temporary buffer is here.

Even if we store data encrypted on a temporary location we may still fail to protect against tempering of encrypted data - a common mistake if data integrity is not checked. To overcome this issue in my implementation, I created a reader validating hash of it’s data. It’s purpose is to check if the data read through it has given hash value and reject the stream if it’s not. Since we already have the hash of the original, unencrypted data (it will be used to generate encryption key), we can use it to check the integrity when we read back from temporary buffer.

Not all done yet

I didn’t implement everything in this layer yet. On the list of things left to do is the key generator using cryptographic signs and http interface. I leave those for some future updates - I believe enough has been implemented to move on to the next layer that would create a solid base for applications hosted in Cinode.

See you at the next post.