One of the rules I did set in the previous post was to practically test the theory that is forming here. Thus it is high time to make use of what has been said so far before moving forward.

What I have discussed so far can perfectly fit into static web page where every page consists of rarely changing blobs of data. An example of such page is this blog - it’s compiled using hugo and the result can easily be served using a simple web server such as apache or nginx.

Since the data compiles down to a static set of files, it can be stored in a form of datastore with encrypted content.

Live cinode - the goal

The goal is to do the following:

  1. Compile the blog into a static web site
  2. Convert the plain web page to an encrypted datastore
  3. Exposed the result through a simple web server that can read and decrypt the datastore

Of course there’s a lot of small details which I will not focus on right now. All is done with docker containers, built through gitlab-ci, deployed using skaffold and kubernetes and finally there’s a cloudflare proxy in front of the web server (to ensure the server doesn’t die from some trivial DOS attack). But to keep things focused let me skip those in this post.

Structure of directories

The web page consists of two types of objects - files and directories. Files are pretty simple - those are simple streams of bytes with extra properties such as mime type. Directories are a bit more complex - their content must be interpreted by the web server and thus some data structure that can list directory entries is needed.

My first attempt used some simple hand-written serialization format. However after thinking about this I decided to use an existing well-established serialization method - protobuf. It is efficient, extensible and well-tested in mission-critical servers thus it is a perfect fit for cinode.

The definition of a directory structure can be found here. It’s really trivial so let’s bring it here:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
syntax = "proto3";

option go_package = ".;structure";

// Directory represents a content of a static directory
message Directory {
    message Entry {
        string bid = 1;
        string key = 2;
        string mimeType = 3;
    }
    map<string, Entry> entries = 1;
}

Each directory consists of a map of entries where the key is the name of the file and the the value contains the blob name (bid), the decryption key (key) and the mime type of given entry (mimeType). Because this data structure is encoded using protobuf it can be easily extended to handle more complex scenarios.

Building up the datastore

Server does need an existing datastore to serve the data. Most of the machinery to create an encrypted datastore is already there, built in previous steps. What’s missing is a tool that brings everything together. The heart of the compiler code is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
func compileOneLevel(path string, be blenc.BE) (string, string, error) {
    st, err := os.Stat(path)
    if err != nil {
        return "", "", fmt.Errorf("Couldn't check path: %w", err)
    }

    if st.IsDir() {
        return compileDir(path, be)
    }

    if st.Mode().IsRegular() {
        return compileFile(path, be)
    }

    return "", "", fmt.Errorf("Neither dir nor a regular file: %v", path)
}

func compileFile(path string, be blenc.BE) (string, string, error) {
    fmt.Println(" *", path)
    fl, err := os.Open(path)
    if err != nil {
        return "", "", fmt.Errorf("Couldn't read file %v: %w", path, err)
    }
    return be.Save(fl, blenc.ContentsHashKey())
}

func compileDir(p string, be blenc.BE) (string, string, error) {
    fileList, err := ioutil.ReadDir(p)
    if err != nil {
        return "", "", fmt.Errorf("Couldn't read contents of dir %v: %w", p, err)
    }
    dirStruct := structure.Directory{
        Entries: make(map[string]*structure.Directory_Entry),
    }
    for _, e := range fileList {
        subPath := path.Join(p, e.Name())
        name, key, err := compileOneLevel(subPath, be)
        if err != nil {
            return "", "", err
        }
        contentType := "application/cinode-dir"
        if !e.IsDir() {
            contentType = mime.TypeByExtension(filepath.Ext(e.Name()))
            if contentType == "" {
                file, err := os.Open(subPath)
                if err != nil {
                    return "", "", fmt.Errorf("Can not detect content type for %v: %w", subPath, err)
                }
                buffer := make([]byte, 512)
                n, err := io.ReadFull(file, buffer)
                file.Close()
                if err != nil && err != io.ErrUnexpectedEOF {
                    return "", "", fmt.Errorf("Can not detect content type for %v: %w", subPath, err)
                }
                contentType = http.DetectContentType(buffer[:n])
            }
        }
        dirStruct.Entries[e.Name()] = &structure.Directory_Entry{
            Bid:      name,
            Key:      key,
            MimeType: contentType,
        }
    }

    data, err := proto.Marshal(&dirStruct)
    if err != nil {
        return "", "", fmt.Errorf("Can not serialize directory %v: %w", p, err)
    }

    return be.Save(ioutil.NopCloser(bytes.NewReader(data)), blenc.ContentsHashKey())
}

The algorithm is a simple recursive function: compileOneLevel. It’s purpose is to build datastore blob for one path on a local filesystem. If the path points to a file, it’s corresponding blob is generated in compileFile. For directories, this function calls compileDir - a bit more complex function generating datastore representation of a directory.

Generation of blob for a directory is done by listing all its files and sub-directories adding one datastore entry per one physical location. For each entry, blob id and key is calculated through recursive call to compileOneLevel, the only thing left is the detection of content type which is equivalent to what golang’s standard library does in the default implementation of its file server.

The only missing peace of the puzzle is the entrypoint to the root directory. The server must somehow know where to start - needs to know the root blob id and its key. At this moment, this information is stored in a file called entrypoint.txt - it is read by the server during initialization. Of course this is just a simple workaround. Knowing the entrypoint allows decrypting whole tree stored in datastore. In the future, the entrypoint will be extracted to a separate, more secure location.

Serving content from datastore

The initial server implementation is as simple as possible. There’s no caching, preloading, no ability to upload the content, no access control etc. Current step is just to prove that the content can be reliably served from an encrypted datastore.

The main function of server’s implementation is the handleDir one:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
func handleDir(be blenc.BE, bid, key string, w http.ResponseWriter, r *http.Request, subPath string) {

    if subPath == "" {
        subPath = "index.html"
    }

    pathParts := strings.SplitN(subPath, "/", 2)

    dirData, err := be.Open(bid, key)
    if err != nil {
        http.Error(w, "Internal error", http.StatusInternalServerError)
        return
    }

    dirBytes, err := ioutil.ReadAll(dirData)
    dirData.Close()
    if err != nil {
        http.Error(w, "Internal error", http.StatusInternalServerError)
        return
    }

    dir := structure.Directory{}
    if err := proto.Unmarshal(dirBytes, &dir); err != nil {
        http.Error(w, "Internal error", http.StatusInternalServerError)
        return
    }

    entry, exists := dir.GetEntries()[pathParts[0]]
    if !exists {
        http.NotFound(w, r)
        return
    }

    if entry.GetMimeType() == "application/cinode-dir" {
        if len(pathParts) == 0 {
            http.Redirect(w, r, r.URL.Path+"/", http.StatusPermanentRedirect)
            return
        }
        handleDir(be, entry.GetBid(), entry.GetKey(), w, r, pathParts[1])
        return
    }

    if len(pathParts) > 1 {
        http.NotFound(w, r)
        return
    }

    data, err := be.Open(entry.GetBid(), entry.GetKey())
    if err != nil {
        http.Error(w, "Internal error", http.StatusInternalServerError)
        return
    }
    defer data.Close()
    w.Header().Set("Content-Type", entry.GetMimeType())
    if _, err = io.Copy(w, data); err != nil {
        // TODO: Log this, can't send an error back, it's too late
    }
}

This function does parse directory blob and tries to find it’s entry for the next part of URL path. Depending on whether the entry is a directory itself, a sub-directory is recursively scanned, otherwise the path is expected to be a single file. The only thing left to have a functional server is an internal redirection of a directory listing to an index.html file (lack of this file will end up with a 404 response).

Live in action

To see the server in action, simply go to https://blog.cinodenet.org - the content served there is generated through cinode. There is the cloudflare proxy in front but it does not generate the content itself.

Furthermore, the raw datastore can be viewed through https://blog-datastore.cinodenet.org address although without the decryption layer the data won’t be of any use. Currently one can find the entrypoint file there so the whole tree can be decrypted.