Today we will do an exercise and I will show how to create own Cinode dataset form a static web page. For the source of http page let’s use a simple blog similar to the one you’re viewing right now. In the example below I’m using hugo but it can be any statically-generated web page. Even a hand-written html will work fine here 😉.

To make things simple (well, that was my goal, what’s the result you must judge yourself) I’ve prepared ready-to-use docker images. I also started using semantic versioning for tags to make things easier, the current version is 0.0.2 (docker images in the 0.0.1 one were too hard to use).

Preparing static content

First let’s compile our hugo page into a static dataset:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
$ hugo
Start building sites …
hugo v0.110.0-e32a493b7826d02763c3b79623952e625402b168+extended linux/amd64 BuildDate=2023-01-17T12:16:09Z VendorInfo=snap:0.110.0

                   | EN
-------------------+-----
  Pages            | 34
  Paginator pages  |  2
  Non-page files   |  0
  Static files     |  1
  Processed images |  0
  Aliases          |  1
  Sitemaps         |  1
  Cleaned          |  0

Total in 129 ms

$ ls public
404.html  ananke  categories  images  index.html  index.xml  posts  sitemap.xml  tags

By default the result of hugo compilation is stored in the public folder. Depending on your static page generator it may be something different. The point is to know the folder name where all those static files are generated.

Compiling cinode dataset

To prepare cinode datastore we’ll use the static_datastore_builder docker image with the compile option:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
$ docker pull ghcr.io/cinode/static_datastore_builder:0.0.2
0.0.2: Pulling from cinode/static_datastore_builder
63b65145d645: Already exists
e4e39906bc75: Pull complete
4f4fb700ef54: Pull complete
7c00bcb8de40: Pull complete
Digest: sha256:2a941ab6ebcc3cc76dcf961978ad70a29ac7e5d19702642d4feda146e9c8563f
Status: Downloaded newer image for ghcr.io/cinode/static_datastore_builder:0.0.2
ghcr.io/cinode/static_datastore_builder:0.0.2

$ docker run --rm -i ghcr.io/cinode/static_datastore_builder:0.0.2 compile --help

The compile command can be used to create an encrypted datastore from
a content with static files that can then be used to serve through a
simple http server.

Usage:
  static_datastore compile --source <src_dir> --destination <dst_dir> [flags]

Flags:
  -d, --destination string        Destination directory for blobs
  -h, --help                      help for compile
  -r, --raw-filesystem            If set to true, use raw filesystem instead of the optimized one, can be used to create dataset for a standard http server
  -s, --source string             Source directory with content to compile
  -t, --static                    If set to true, compile only the static dataset, do not create or update dynamic link
  -w, --writer-info string        Writer info for the root dynamic link, if neither writer info nor writer info file is specified, a random writer info will be generated and printed out
  -f, --writer-info-file string   Name of the file containing writer info for the root dynamic link, if neither writer info nor writer info file is specified, a random writer info will be generated and printed out

Here’s the complete invocation:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
$ docker run \
    --rm \
    --interactive \
    --volume "$(pwd):/data" \
    --user $(id -u ${USER}):$(id -g ${USER}) \
    ghcr.io/cinode/static_datastore_builder:0.0.2 \
    compile \
    --source /data/public \
    --destination /data/encrypted \
    --raw-filesystem
{
  "entrypoint": "9g7Ffa****************L1",
  "result": "OK",
  "writer-info": "9Q*****************************************************************T"
}
2023/02/18 21:27:44 DONE

Let me quickly outline some of the used docker options:

  • --volume "$(pwd):/data" - this docker option mounts local folder into docker container
  • --user $(id -u ${USER}):$(id -g ${USER}) - inside docker container use the same user as on the host - that way files generated by the builder will be owned by the current user

And some of the options used by the datastore compiler itself:

  • --source /data/public - compile the contents of the public folder generated by hugo
  • --destination /data/encrypted - put the result into the encrypted folder
  • --raw-filesystem - use raw filesystem output mode.

The last option may need a bit of an explanation here. Files created using raw filesystem mode can be exposed directly with any http server and other cinode network members can just talk to such http server as if it was a cinode network member. We’ll use nginx in the following steps so the raw filesystem is necessary. The default behavior on the other hand is to generate optimized folder layout by splitting data into sub-folders. By doing so, we won’t run into issues due to a large number of files stored in a single directory (definitely not something to worry about in this exercise). Such optimized folder layout needs a cinode datastore binary that is a connection point for other network members.

The compilation result generated in the encrypted folder consists of encrypted blobs:

1
2
3
4
5
6
7
$ ls encrypted
21gHBhvtzZ7q3boH9W46DiUJhP4P8yBxWSdEoi9c6T11Td  bE6RVudVXhgt9zAyw4XjMv7D8Mr5NVDFm3vKxE5tS9Lrr  q96QT9Q6PTt1uwJKpeXbMtfy5j9qXhJ1XYCFx1Gks35AG
233gjbv8ihNStTxSzczcJBcpqG2xi4itvuWdhkjSDnuV8P  Bjndo6kGJdqLYM7nLvpkSDAHLbGes2pAKhSx6gkXdwG5R  qJEsr6N7AW5uyTBvqARPSc2AvEtfSuxHnxxmDcJTLVHtW
236hrThNqDJJfEz69hHpjChnazWpmKzmFcig5rTpp5PhnH  bXVsTGTtPVSq7DKXb8DQUfSeEaAf7Hxap9ttHPQcsJy8v  QLyJtrZesCyUgd7r3XLLf9YMnKMHucMUnxMq9DFu8kpEY
23rzJ8J81TFedYH3bhZEobVartTatpSi3ps5EFzmvTidb6  Ci9DspRXjc5b61pfRoVyBiWSxHkhwDo5mmCNV7FyCXYWG  Qopm7aj3ate8uTWuExn7XKpZureshdTpZBdCmTtNWffox
25RR5feUxq2jEkf4nmBPFkjLB6VCVHE8eBh6gUQdUnimLi  E5Qd6YnEvY79mTWcXa5Ry8H9oJyZEpzkseCmGFm5zcHAV  qPJbscUDpb97dzmMP2Jeiwvq4jbpJdKomLuk9woC6VAG9
....

You may also have noticed that the compilation process finished with some json output. writer-info and entrypoint are purposefully masked here and in the following code snippets - make sure to replace those with your values and protect from unauthorized access.

1
2
3
4
5
{
  "entrypoint": "9g7Ffa****************L1",
  "result": "OK",
  "writer-info": "9Q*****************************************************************T"
}

The entrypoint and writer-info are not written into the output folder but are necessary to read the content of encrypted blobs (entrypoint) or update the content (with writer-info).

Expose data through a web proxy

Before doing some more complex setup, let’s see if the encrypted dataset is valid:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
$ docker pull ghcr.io/cinode/web_proxy:0.0.2
0.0.2: Pulling from cinode/web_proxy
008cd561e933: Pull complete
046531da0d00: Pull complete
Digest: sha256:a4a54cbaa05a516d72644f3c9ecd19074dfd42aa9ce9e4784bef7d8c04d14501
Status: Downloaded newer image for ghcr.io/cinode/web_proxy:0.0.2
ghcr.io/cinode/web_proxy:0.0.2

$ docker run \
    --rm \
    --interactive \
    --publish 8080:8080 \
    --volume "$(pwd)/encrypted:/encrypted" \
    --env CINODE_ENTRYPOINT="9g7Ffa****************L1" \
    --env CINODE_MAIN_DATASTORE="file-raw:///encrypted" \
    ghcr.io/cinode/web_proxy:0.0.2
2023/02/18 21:39:28 Listening on http://localhost:8080

Now if you open http://localhost:8080 it will show the web page 🎉.

Important options used here:

  • --env CINODE_ENTRYPOINT="9g7Ffak...." - cinode expects the entrypoint information in either environment variable or a file (file name is in CINODE_ENTRYPOINT_FILE env var then), here we’re passing it directly
  • --env CINODE_MAIN_DATASTORE="file-raw:///encrypted" - here we specify that the encrypted data is in /encrypted folder, it is inside the docker container and points to a volume mounted through the --volume docker option

Updating the content

By default, the dataset compilation process will create a dynamic link for the directory root to ensure the content can be updated later (using the --static switch would skip this step and give the entrypoint to the static blob representing the root folder instead). Because of that, we can update the content without changing the entrypoint information. It requires passing the writer-info value through an extra argument:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
$ hugo # rebuild static files after some changes
....
$ docker run \
    --rm \
    --interactive \
    --volume "$(pwd):/data" \
    --user $(id -u ${USER}):$(id -g ${USER}) \
    ghcr.io/cinode/static_datastore_builder:0.0.2 \
    compile \
    --source /data/public \
    --destination /data/encrypted \
    --raw-filesystem \
    --writer-info "9Q*****************************************************************T"
2023/02/18 21:43:21 DONE
{
  "entrypoint": "9g7Ffa****************L1",
  "result": "OK"
}

The entrypoint returned will stay the same, we don’t have to pass it because it is embedded in the writer-info data.

Note: A single encrypted datasets folder may contain many different compiled filesystem, the limit is only the underlying storage. What differentiates those folders is the entrypoint information. You can also accumulate different versions of the dataset for the same entrypoint - the root link will be overwritten with newer data, new static blobs will be added but those that did not change will be shared between versions (an interesting side effect of using content hash as the blob name).

Publishing encrypted dataset

Creating datastore node from encrypted dataset created with --raw-filesystem option is as simple as publishing the encrypted directory using some http server. Let’s try it out with a simple nginx docker image. The web_proxy server can then talk to such http server to retrieve blob data.

Here’s the dockerfile, just two lines:

1
2
FROM nginx
COPY encrypted /usr/share/nginx/html

Let’s build it then:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
$ docker build --tag cinode_datastore . -f Dockerfile
[+] Building 3.1s (7/7) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                             0.1s
 => => transferring dockerfile: 85B                                                                                                                              0.0s
 => [internal] load .dockerignore                                                                                                                                0.1s
 => => transferring context: 2B                                                                                                                                  0.0s
 => [internal] load metadata for docker.io/library/nginx:latest                                                                                                  2.4s
 => [internal] load build context                                                                                                                                0.1s
 => => transferring context: 889.10kB                                                                                                                            0.0s
 => [1/2] FROM docker.io/library/nginx@sha256:6650513efd1d27c1f8a5351cbd33edf85cc7e0d9d0fcb4ffb23d8fa89b601ba8                                                   0.4s
 => => resolve docker.io/library/nginx@sha256:6650513efd1d27c1f8a5351cbd33edf85cc7e0d9d0fcb4ffb23d8fa89b601ba8                                                   0.1s
 => => sha256:6650513efd1d27c1f8a5351cbd33edf85cc7e0d9d0fcb4ffb23d8fa89b601ba8 1.86kB / 1.86kB                                                                   0.0s
 => => sha256:7f797701ded5055676d656f11071f84e2888548a2e7ed12a4977c28ef6114b17 1.57kB / 1.57kB                                                                   0.0s
 => => sha256:3f8a00f137a0d2c8a2163a09901e28e2471999fde4efc2f9570b91f1c30acf94 7.66kB / 7.66kB                                                                   0.0s
 => [2/2] COPY encrypted /usr/share/nginx/html                                                                                                                   0.1s
 => exporting to image                                                                                                                                           0.1s
 => => exporting layers                                                                                                                                          0.1s
 => => writing image sha256:3e8ad01c9b00b759be8a734def2d4ad7c29c0c316051057bccf2607d87ed9ea0                                                                     0.0s
 => => naming to docker.io/library/cinode_datastore                                                                                                              0.0s

We will use internal docker network - that way the cinode_proxy can connect to the datastore that’s running as another container within the docker environment:

1
2
$ docker network create cinode
fd1675531b64c1a822823b37e1d9eec0972e7266869878ff15f0452e194a3e2a

Now we can run the datastore container, the --name option will set up an internal hostname within the docker network so that the datastore container can be easily accessed later.

1
2
3
4
5
6
7
$ docker run \
    --rm \
    --detach \
    --network cinode \
    --name datastore \
    cinode_datastore
5c948468e2b7c6da8f2d05729d7c4335820469f7db479e433defd245c79b3b35

Next let’s run the cinode proxy node, it is in the same network as datastore and thus it can access that datastore through the http://datastore address.

Note that we’re using CINODE_ADDITIONAL_DATASTORE_1 environment variable here instead of the CINODE_MAIN_DATASTORE one - this ensures that the proxy has its own in-memory datastore that acts as a cache and minimizes the amount of network requests for data.

1
2
3
4
5
6
7
8
9
$ docker run \
    --rm \
    --interactive \
    --publish 8080:8080 \
    --network cinode \
    --env CINODE_ENTRYPOINT="9g7Ffa****************L1" \
    --env CINODE_ADDITIONAL_DATASTORE_1="http://datastore/" \
    ghcr.io/cinode/web_proxy:0.0.2
  ...

What just happened is that we’ve created a simple cinode network - one node serves encrypted blobs, the other one knows only the entrypoint and it asks the first node for blobs it does not know. This is essentially the same setup as in the previous exercise.

Once we’re finished let’s stop the datastore container that was running in the background:

1
2
$ docker stop datastore
datastore

All-in-one dockerfile

And here’s one more bonus Dockerfile - that one will run hugo and cinode compilation and finally produce the nginx-based image, all in a single docker build:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
FROM klakegg/hugo:0.101.0-onbuild AS hugo

FROM ghcr.io/cinode/static_datastore_builder:0.0.2 AS cinode
COPY --from=hugo /target /cinode/public
ARG WRITER_INFO
RUN static_datastore_builder \
    compile \
    --source=/cinode/public \
    --destination=/cinode/encrypted \
    --raw-filesystem \
    --writer-info="${WRITER_INFO}"

FROM nginx
COPY --from=cinode /cinode/encrypted /usr/share/nginx/html

To build the image:

1
2
3
$ docker build . \
    --build-arg WRITER_INFO="9Q*****************************************************************T" \
    --tag cinode_datastore

It can then be used in place of the previous cinode_datastore.

Note that the writer info is passed through the build arg here. It could be done with docker secret as well, but in both cases the final image will not leak writer info.

Compose

Now since we have such a nice dockerfile, let’s wrap up the whole setup in a single docker-compose file:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
services:
  datastore:
    build:
      context: .
      args:
        WRITER_INFO: "9Q*****************************************************************T"
    restart: always

  web-proxy:
    image: ghcr.io/cinode/web_proxy:0.0.2
    environment:
      CINODE_ENTRYPOINT: "9g7Ffa****************L1"
      CINODE_ADDITIONAL_DATASTORE_1: "http://datastore/"
    depends_on:
      - datastore
    ports:
      - "8080:8080"
    restart: always

Now after docker-compose up everything works like a charm:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
$ docker-compose up
Building datastore
[+] Building 6.8s (16/16) FINISHED                                                                                                                                    
 => [internal] load .dockerignore                                                                                                                                0.1s
 => => transferring context: 2B                                                                                                                                  0.0s
 => [internal] load build definition from Dockerfile                                                                                                             0.1s
 => => transferring dockerfile: 440B                                                                                                                             0.0s
 => [internal] load metadata for docker.io/library/nginx:latest                                                                                                  2.1s
 => [internal] load metadata for docker.io/klakegg/hugo:0.101.0-onbuild                                                                                          2.0s
 => [internal] load metadata for ghcr.io/cinode/static_datastore_builder:0.0.2                                                                                   2.2s
 => [cinode 1/3] FROM ghcr.io/cinode/static_datastore_builder:0.0.2@sha256:2a941ab6ebcc3cc76dcf961978ad70a29ac7e5d19702642d4feda146e9c8563f                      1.5s
 => => resolve ghcr.io/cinode/static_datastore_builder:0.0.2@sha256:2a941ab6ebcc3cc76dcf961978ad70a29ac7e5d19702642d4feda146e9c8563f                             0.1s
  ...
web-proxy_1  | 2023/02/18 23:08:25 Listening on http://localhost:8080
 ...