Today we will do an exercise and I will show how to create own Cinode dataset form a static web page. For the source of http page let’s use a simple blog similar to the one you’re viewing right now. In the example below I’m using hugo but it can be any statically-generated web page. Even a hand-written html will work fine here 😉.
To make things simple (well, that was my goal, what’s the result you must judge yourself) I’ve prepared ready-to-use docker images. I also started using semantic versioning for tags to make things easier, the current version is 0.0.2 (docker images in the 0.0.1 one were too hard to use).
Preparing static content
First let’s compile our hugo page into a static dataset:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
$ hugo
Start building sites …
hugo v0.110.0-e32a493b7826d02763c3b79623952e625402b168+extended linux/amd64 BuildDate=2023-01-17T12:16:09Z VendorInfo=snap:0.110.0
| EN
-------------------+-----
Pages | 34
Paginator pages | 2
Non-page files | 0
Static files | 1
Processed images | 0
Aliases | 1
Sitemaps | 1
Cleaned | 0
Total in 129 ms
$ ls public
404.html ananke categories images index.html index.xml posts sitemap.xml tags
|
By default the result of hugo compilation is stored in the public
folder. Depending on your static page generator it may be something different. The point is to know the folder name where all those static files are generated.
Compiling cinode dataset
To prepare cinode datastore we’ll use the static_datastore_builder docker image with the compile
option:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
|
$ docker pull ghcr.io/cinode/static_datastore_builder:0.0.2
0.0.2: Pulling from cinode/static_datastore_builder
63b65145d645: Already exists
e4e39906bc75: Pull complete
4f4fb700ef54: Pull complete
7c00bcb8de40: Pull complete
Digest: sha256:2a941ab6ebcc3cc76dcf961978ad70a29ac7e5d19702642d4feda146e9c8563f
Status: Downloaded newer image for ghcr.io/cinode/static_datastore_builder:0.0.2
ghcr.io/cinode/static_datastore_builder:0.0.2
$ docker run --rm -i ghcr.io/cinode/static_datastore_builder:0.0.2 compile --help
The compile command can be used to create an encrypted datastore from
a content with static files that can then be used to serve through a
simple http server.
Usage:
static_datastore compile --source <src_dir> --destination <dst_dir> [flags]
Flags:
-d, --destination string Destination directory for blobs
-h, --help help for compile
-r, --raw-filesystem If set to true, use raw filesystem instead of the optimized one, can be used to create dataset for a standard http server
-s, --source string Source directory with content to compile
-t, --static If set to true, compile only the static dataset, do not create or update dynamic link
-w, --writer-info string Writer info for the root dynamic link, if neither writer info nor writer info file is specified, a random writer info will be generated and printed out
-f, --writer-info-file string Name of the file containing writer info for the root dynamic link, if neither writer info nor writer info file is specified, a random writer info will be generated and printed out
|
Here’s the complete invocation:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
$ docker run \
--rm \
--interactive \
--volume "$(pwd):/data" \
--user $(id -u ${USER}):$(id -g ${USER}) \
ghcr.io/cinode/static_datastore_builder:0.0.2 \
compile \
--source /data/public \
--destination /data/encrypted \
--raw-filesystem
{
"entrypoint": "9g7Ffa****************L1",
"result": "OK",
"writer-info": "9Q*****************************************************************T"
}
2023/02/18 21:27:44 DONE
|
Let me quickly outline some of the used docker options:
--volume "$(pwd):/data"
- this docker option mounts local folder into docker container
--user $(id -u ${USER}):$(id -g ${USER})
- inside docker container use the same user as on the host - that way files generated by the builder will be owned by the current user
And some of the options used by the datastore compiler itself:
--source /data/public
- compile the contents of the public
folder generated by hugo
--destination /data/encrypted
- put the result into the encrypted
folder
--raw-filesystem
- use raw filesystem output mode.
The last option may need a bit of an explanation here. Files created using raw filesystem mode can be exposed directly with any http server and other cinode network members can just talk to such http server as if it was a cinode network member. We’ll use nginx in the following steps so the raw filesystem is necessary. The default behavior on the other hand is to generate optimized folder layout by splitting data into sub-folders. By doing so, we won’t run into issues due to a large number of files stored in a single directory (definitely not something to worry about in this exercise). Such optimized folder layout needs a cinode datastore binary that is a connection point for other network members.
The compilation result generated in the encrypted
folder consists of encrypted blobs:
1
2
3
4
5
6
7
|
$ ls encrypted
21gHBhvtzZ7q3boH9W46DiUJhP4P8yBxWSdEoi9c6T11Td bE6RVudVXhgt9zAyw4XjMv7D8Mr5NVDFm3vKxE5tS9Lrr q96QT9Q6PTt1uwJKpeXbMtfy5j9qXhJ1XYCFx1Gks35AG
233gjbv8ihNStTxSzczcJBcpqG2xi4itvuWdhkjSDnuV8P Bjndo6kGJdqLYM7nLvpkSDAHLbGes2pAKhSx6gkXdwG5R qJEsr6N7AW5uyTBvqARPSc2AvEtfSuxHnxxmDcJTLVHtW
236hrThNqDJJfEz69hHpjChnazWpmKzmFcig5rTpp5PhnH bXVsTGTtPVSq7DKXb8DQUfSeEaAf7Hxap9ttHPQcsJy8v QLyJtrZesCyUgd7r3XLLf9YMnKMHucMUnxMq9DFu8kpEY
23rzJ8J81TFedYH3bhZEobVartTatpSi3ps5EFzmvTidb6 Ci9DspRXjc5b61pfRoVyBiWSxHkhwDo5mmCNV7FyCXYWG Qopm7aj3ate8uTWuExn7XKpZureshdTpZBdCmTtNWffox
25RR5feUxq2jEkf4nmBPFkjLB6VCVHE8eBh6gUQdUnimLi E5Qd6YnEvY79mTWcXa5Ry8H9oJyZEpzkseCmGFm5zcHAV qPJbscUDpb97dzmMP2Jeiwvq4jbpJdKomLuk9woC6VAG9
....
|
You may also have noticed that the compilation process finished with some json output. writer-info
and entrypoint
are purposefully masked here and in the following code snippets - make sure to replace those with your values and protect from unauthorized access.
1
2
3
4
5
|
{
"entrypoint": "9g7Ffa****************L1",
"result": "OK",
"writer-info": "9Q*****************************************************************T"
}
|
The entrypoint
and writer-info
are not written into the output folder but are necessary to read the content of encrypted blobs (entrypoint
) or update the content (with writer-info
).
Expose data through a web proxy
Before doing some more complex setup, let’s see if the encrypted dataset is valid:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
$ docker pull ghcr.io/cinode/web_proxy:0.0.2
0.0.2: Pulling from cinode/web_proxy
008cd561e933: Pull complete
046531da0d00: Pull complete
Digest: sha256:a4a54cbaa05a516d72644f3c9ecd19074dfd42aa9ce9e4784bef7d8c04d14501
Status: Downloaded newer image for ghcr.io/cinode/web_proxy:0.0.2
ghcr.io/cinode/web_proxy:0.0.2
$ docker run \
--rm \
--interactive \
--publish 8080:8080 \
--volume "$(pwd)/encrypted:/encrypted" \
--env CINODE_ENTRYPOINT="9g7Ffa****************L1" \
--env CINODE_MAIN_DATASTORE="file-raw:///encrypted" \
ghcr.io/cinode/web_proxy:0.0.2
2023/02/18 21:39:28 Listening on http://localhost:8080
|
Now if you open http://localhost:8080 it will show the web page 🎉.
Important options used here:
--env CINODE_ENTRYPOINT="9g7Ffak...."
- cinode expects the entrypoint information in either environment variable or a file (file name is in CINODE_ENTRYPOINT_FILE
env var then), here we’re passing it directly
--env CINODE_MAIN_DATASTORE="file-raw:///encrypted"
- here we specify that the encrypted data is in /encrypted
folder, it is inside the docker container and points to a volume mounted through the --volume
docker option
Updating the content
By default, the dataset compilation process will create a dynamic link for the directory root to ensure the content can be updated later (using the --static
switch would skip this step and give the entrypoint to the static blob representing the root folder instead). Because of that, we can update the content without changing the entrypoint information. It requires passing the writer-info
value through an extra argument:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
$ hugo # rebuild static files after some changes
....
$ docker run \
--rm \
--interactive \
--volume "$(pwd):/data" \
--user $(id -u ${USER}):$(id -g ${USER}) \
ghcr.io/cinode/static_datastore_builder:0.0.2 \
compile \
--source /data/public \
--destination /data/encrypted \
--raw-filesystem \
--writer-info "9Q*****************************************************************T"
2023/02/18 21:43:21 DONE
{
"entrypoint": "9g7Ffa****************L1",
"result": "OK"
}
|
The entrypoint
returned will stay the same, we don’t have to pass it because it is embedded in the writer-info
data.
Note: A single encrypted datasets folder may contain many different compiled filesystem, the limit is only the underlying storage. What differentiates those folders is the entrypoint information. You can also accumulate different versions of the dataset for the same entrypoint - the root link will be overwritten with newer data, new static blobs will be added but those that did not change will be shared between versions (an interesting side effect of using content hash as the blob name).
Publishing encrypted dataset
Creating datastore node from encrypted dataset created with --raw-filesystem
option is as simple as publishing the encrypted
directory using some http server. Let’s try it out with a simple nginx docker image. The web_proxy server can then talk to such http server to retrieve blob data.
Here’s the dockerfile, just two lines:
1
2
|
FROM nginx
COPY encrypted /usr/share/nginx/html
|
Let’s build it then:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
$ docker build --tag cinode_datastore . -f Dockerfile
[+] Building 3.1s (7/7) FINISHED
=> [internal] load build definition from Dockerfile 0.1s
=> => transferring dockerfile: 85B 0.0s
=> [internal] load .dockerignore 0.1s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/nginx:latest 2.4s
=> [internal] load build context 0.1s
=> => transferring context: 889.10kB 0.0s
=> [1/2] FROM docker.io/library/nginx@sha256:6650513efd1d27c1f8a5351cbd33edf85cc7e0d9d0fcb4ffb23d8fa89b601ba8 0.4s
=> => resolve docker.io/library/nginx@sha256:6650513efd1d27c1f8a5351cbd33edf85cc7e0d9d0fcb4ffb23d8fa89b601ba8 0.1s
=> => sha256:6650513efd1d27c1f8a5351cbd33edf85cc7e0d9d0fcb4ffb23d8fa89b601ba8 1.86kB / 1.86kB 0.0s
=> => sha256:7f797701ded5055676d656f11071f84e2888548a2e7ed12a4977c28ef6114b17 1.57kB / 1.57kB 0.0s
=> => sha256:3f8a00f137a0d2c8a2163a09901e28e2471999fde4efc2f9570b91f1c30acf94 7.66kB / 7.66kB 0.0s
=> [2/2] COPY encrypted /usr/share/nginx/html 0.1s
=> exporting to image 0.1s
=> => exporting layers 0.1s
=> => writing image sha256:3e8ad01c9b00b759be8a734def2d4ad7c29c0c316051057bccf2607d87ed9ea0 0.0s
=> => naming to docker.io/library/cinode_datastore 0.0s
|
We will use internal docker network - that way the cinode_proxy can connect to the datastore that’s running as another container within the docker environment:
1
2
|
$ docker network create cinode
fd1675531b64c1a822823b37e1d9eec0972e7266869878ff15f0452e194a3e2a
|
Now we can run the datastore container, the --name
option will set up an internal hostname within the docker network so that the datastore container can be easily accessed later.
1
2
3
4
5
6
7
|
$ docker run \
--rm \
--detach \
--network cinode \
--name datastore \
cinode_datastore
5c948468e2b7c6da8f2d05729d7c4335820469f7db479e433defd245c79b3b35
|
Next let’s run the cinode proxy node, it is in the same network as datastore and thus it can access that datastore through the http://datastore
address.
Note that we’re using CINODE_ADDITIONAL_DATASTORE_1
environment variable here instead of the CINODE_MAIN_DATASTORE
one - this ensures that the proxy has its own in-memory datastore that acts as a cache and minimizes the amount of network requests for data.
1
2
3
4
5
6
7
8
9
|
$ docker run \
--rm \
--interactive \
--publish 8080:8080 \
--network cinode \
--env CINODE_ENTRYPOINT="9g7Ffa****************L1" \
--env CINODE_ADDITIONAL_DATASTORE_1="http://datastore/" \
ghcr.io/cinode/web_proxy:0.0.2
...
|
What just happened is that we’ve created a simple cinode network - one node serves encrypted blobs, the other one knows only the entrypoint and it asks the first node for blobs it does not know. This is essentially the same setup as in the previous exercise.
Once we’re finished let’s stop the datastore container that was running in the background:
1
2
|
$ docker stop datastore
datastore
|
All-in-one dockerfile
And here’s one more bonus Dockerfile
- that one will run hugo and cinode compilation and finally produce the nginx-based image, all in a single docker build:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
FROM klakegg/hugo:0.101.0-onbuild AS hugo
FROM ghcr.io/cinode/static_datastore_builder:0.0.2 AS cinode
COPY --from=hugo /target /cinode/public
ARG WRITER_INFO
RUN static_datastore_builder \
compile \
--source=/cinode/public \
--destination=/cinode/encrypted \
--raw-filesystem \
--writer-info="${WRITER_INFO}"
FROM nginx
COPY --from=cinode /cinode/encrypted /usr/share/nginx/html
|
To build the image:
1
2
3
|
$ docker build . \
--build-arg WRITER_INFO="9Q*****************************************************************T" \
--tag cinode_datastore
|
It can then be used in place of the previous cinode_datastore
.
Note that the writer info is passed through the build arg here. It could be done with docker secret as well, but in both cases the final image will not leak writer info.
Compose
Now since we have such a nice dockerfile, let’s wrap up the whole setup in a single docker-compose file:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
services:
datastore:
build:
context: .
args:
WRITER_INFO: "9Q*****************************************************************T"
restart: always
web-proxy:
image: ghcr.io/cinode/web_proxy:0.0.2
environment:
CINODE_ENTRYPOINT: "9g7Ffa****************L1"
CINODE_ADDITIONAL_DATASTORE_1: "http://datastore/"
depends_on:
- datastore
ports:
- "8080:8080"
restart: always
|
Now after docker-compose up
everything works like a charm:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
$ docker-compose up
Building datastore
[+] Building 6.8s (16/16) FINISHED
=> [internal] load .dockerignore 0.1s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.1s
=> => transferring dockerfile: 440B 0.0s
=> [internal] load metadata for docker.io/library/nginx:latest 2.1s
=> [internal] load metadata for docker.io/klakegg/hugo:0.101.0-onbuild 2.0s
=> [internal] load metadata for ghcr.io/cinode/static_datastore_builder:0.0.2 2.2s
=> [cinode 1/3] FROM ghcr.io/cinode/static_datastore_builder:0.0.2@sha256:2a941ab6ebcc3cc76dcf961978ad70a29ac7e5d19702642d4feda146e9c8563f 1.5s
=> => resolve ghcr.io/cinode/static_datastore_builder:0.0.2@sha256:2a941ab6ebcc3cc76dcf961978ad70a29ac7e5d19702642d4feda146e9c8563f 0.1s
...
web-proxy_1 | 2023/02/18 23:08:25 Listening on http://localhost:8080
...
|