Exploring ideas - secretive folders

This blog post will be about an idea I had recently - something I called “secretive folders”. It’s still fresh in my head, and I haven’t deeply analyzed it yet - that’s what this blog post is for: to take you through such an explorational journey.

Idea inception

I was thinking about one disadvantage of the current CinodeFS API interface - the current API does not have any method to iterate over folder entries. For example, if there’s a folder with some photos, the CinodeFS interface won’t let one figure out what files are in this folder to show previews, etc. That seemed to me like a huge gap in the design. The only reason it was missing was that I didn’t need such functionality so far. But then I thought about this a bit more and started realizing that the crucial part here is the “no need”. Such directory listing was not needed, and I was still able to build pretty complex software such as Cinode Maps.

So maybe it is not a gap after all? Maybe the system should, in fact, forbid an operation such as listing? The listing itself may be a source of its own problems. Consider a gigantic folder with a lot of entries. Iterating over such a structure could become a performance bottleneck and might need a lot of local resources. Without the list operation, just looking up a specific entry can be easily optimized by using nested structures or hashing maps.

Hashing maps… those have made my jaw drop a few times already. I admire their simplicity and yet their algorithmic power. Cryptographic hashing functions are, in my humble opinion, pure beauty. In the case of CinodeFS, we could use hash maps to speed up large folder lookups. Usually, such a folder might store entries in the form of a sorted list. To find out whether an entry exists in this list, we could use a binary search algorithm, reducing the complexity to O(log(n)). We could also create a hash map structure, reducing the complexity even more - down to O(1). An entry in such a hash map would contain the file name and the target link.

But let’s think about this a bit more. The idea of a hash map is to take the key - the filename in our case - compute an index into the table from that key (I purposely ignore key collisions for now), and then do a simple lookup into the table to get the entry - the original filename and the target link. But since the name is already used to calculate the hash table index, do we really need to store it in the hash table entry? I don’t think so…

… and this is how the idea of secretive folders was conceived.

Properties of secretive folders

The essence of secretive folders is that they do not store their entries using filenames. Instead, a strong cryptographic hash is used to encode entry names. When looking up an entry with a specific name in such a folder, that name is first hashed through an unrecoverable hashing function, and then the lookup is done using that hash.

Filename hashing

We diverge a bit from a standard hash map. Using a cryptographic hash function means that we can ignore any potential collisions. But at the same time, it means that we cannot simply create a large hash map that can be looked up in constant time. It would still have to be a sorted list and require a logarithmic, binary search.

Since we’re only storing hashes of entry names instead of the names themselves, those names cannot be recovered from the raw folder data and are thus kept secret. But as with everything related to cryptography, there are some gotchas. Let’s dig in.

Salting to prevent rainbow table attacks

Rainbow table attacks use huge databases of precomputed hashes. Hashing folder names is a perfect example of where those would be of great use. File names are not that unique - consider most GitHub projects as an example. One will find an entry called README.md in most of them. This means that using a naive hash of the filename itself may still leak a lot of information since one could create a giant rainbow table consisting of hashes of well-known file names.

A common technique to protect against rainbow table attacks, a must-have for password-based authentication, is to use a random salt that is used as additional data fed into the hashing function. Such a salt can be stored in plaintext form, and in the case of authentication systems, it is usually stored as an internal property of a given user. Using a different salt for every user means that even if two users use the same password, their password hashes will be different.

In the case of Cinode and hashing of folder entries, we must work in the context of the whole folder. Storing salt per file entry would be impractical because we would lose the ability to quickly look up entries - to find an entry, one would have to scan entries one by one. The most straightforward alternative is then to add a random salt as a property of a whole folder.

Assuming every folder stored in Cinode uses a different random salt (which would imply a sufficiently large salt to overcome the birthday paradox), two folders would generate different hashes even if their entries were the same or overlapped.

Filename hashing with salt

The security of such a solution is then guarded by the quality of the randomness of the salt value. Similarly to other parts of the system, Cinode tries to reduce the use of random data to ensure that attacks relying on weak random sources cannot be exploited. This rule should be applied to secretive folders as well. While I worked on that part, it turned out to be a pretty large topic, so I decided to deal with it in another blog post.

Protecting the link itself

Hashing entry names ensures that the name has to be known to find the proper entry. But can we do more? Can we protect the target link itself? The goal would be to avoid leaking any information about the entry as long as it is not requested with a correct filename. Let’s try to draft an encryption scheme for the entry point with all the data dependencies:

Target link encryption

In this design, the target link is encrypted using a key that is derived through a Key-Derivation-Function (KDF) from the filename and the salt. Using the filename here means that every entry will use a different key. Adding additional salt ensures that two folders with different salts will produce different keys even if the filename is the same.

Next to the key, we also supply the Initialization Vector, which is taken from the hashed filename. This IV value must be different for the same encryption key. In our case, every entry should have its own unique key; thus, the IV could even be constant, but it does not hurt to take it from another unpredictable data in the system.

Security Enhancements Summary

Here’s what we’ve been able to achieve so far:

Hashing directory entries prevents name leakage.
Salting hashes ensures that even with different folders, hashes of names won’t match.
Encrypting the entry point link with the key derived from the entry name additionally protects the target info.

This effectively means that one has to know the entry name in order to access any data about that directory entry.

Is this good protection? Well, it depends.

Filename entries are not that unique, as certain filenames, such as README.md or index.html, are commonly used across many systems and projects. Even with salt, one can simply check these common names, trying to guess entries that would be most likely. The good thing, though, is that a weak name for one entry does not automatically mean that other entries will be weak as well.

Let’s think more about this and try to transform this into an advantage. A folder may have a lot of entries, some of them easy to guess. Those easy-to-guess entries could be just a kind of smokescreen protecting the more important entries. In fact, secretive folders have a property of deniability. Without knowing the filename, one cannot prove that such a filename exists in the folder. For example, if a folder contains an entry named confidential.txt, its hash would be indistinguishable from a hash of a non-existent file like randomfile123.txt. It is possible to add a lot of fake entries made of completely random and non-existing data, significantly increasing the effort needed for any kind of brute-force attacks.

Another interesting idea is to keep such easy-to-guess entries as a honeypot. A honeypot is a decoy mechanism designed to attract and detect unauthorized access attempts. In the context of secretive folders, these entries should never be accessed during standard folder usage. Instead, they would target some blobs that look normal but are tainted in such a way that any attempt to access them would trigger an alert. Such an alert would mean that someone tried guessing folder entries, helping to identify potential attackers.

While secretive folders offer strong confidentiality, they come with certain drawbacks that may limit their applicability:

Performance impact:
- Additional hashes require extra computation.
- Decryption adds overhead to processing.
- Verification steps consume CPU resources.
Cannot traverse folders - it would block operations such as:
- Automatic backup of entire folder structures.
- Issues expressing the folder through standard interfaces such as FUSE.
- Issues with processing based on folder content - such as finding all photos and creating miniatures.

It all depends on the use case. For now, I would consider secretive folders a powerful but niche tool, particularly useful in scenarios where confidentiality is paramount, such as storing sensitive configuration files, protecting intellectual property in software projects, or securing access to private datasets in collaborative environments.

This post is an explorational one and does not end up with the implementation yet. In a future post, I’d like to take a look at the salt value and see if we could protect it against weak random source attacks.

Contents

Idea inception

Properties of secretive folders

Salting to prevent rainbow table attacks

Protecting the link itself

Security Enhancements Summary