Tough decisions

Before I jump to description of the code itself, let’s first clarify what technology I’ll be using to write Cinode prototype. I decided to use golang. I find it rather nice to work with but it also has some thorns here and there. Why would I like to use it then? It turns out to be very practical, especially in the field of network services. Goroutines are just great - no need to think in terms of callbacks anymore, just straight, sequential code.

Golang also has everything we’ll need to build the prototype - all crypto primitives and nice toolset for a http-based services. I don’t expect this prototype to be a performance rocket but the speed that golang would give us should be enough.

I must also admit that I don’t yet care to much about cross-process memory safety. There will be a lot of keys and passphrases in the memory but we don’t want to protect against attacker who can read that, it’s still a prototype, right? Keeping that in mind, golang is just a perfect match here.

I’ll be developing on Linux platform - that’s my primary OS but I’ll try not to use any os-specific constructs. Of course I strongly recommend Linux for any project dealing with network, especially if it supports our freedom. But I won’t force you to use it. Freedom to choose what you want is something I do respect. If for some reason this project won’t work well on other OS-es, please keep in ming that this wasn’t intentional. I also kindly ask you to let me know about it.

To avoid such OS misbehavior and to have good quality of the codebase, I’ll do test driven development, maybe not in it’s strict sense, but my goal is to have everything well tested. Golang has very good support for testing and coverage reports so we’ll save some time to setup correct tools.

The source code will be stored in github repository, tests will be done through Travis CI. Coverage analysis will be done using Coveralls. I decided to use those tools to get most of the stuff running up quickly. Even though I had some initial issues with setting everything up, the time win is huge here. Besides, having everything public in public tools creates higher pressure to keep high level of quality from the beginning.

Let’s have some baseline

Before any serious code was put down, I’ve set few rules I wanted to follow:

  • Build clean layered architecture - split the system into layers (matching
    Cinode’s design). Each layer should be a separate module with it’s own namespace in go.
  • Each module must have a clear set of interfaces it provides to communicate with other modules and application code.
  • Each module’s interface must have appropriate set of test cases covering all interface methods
  • For each module there should be a reference implementation of it’s interfaces - such implementation does not have to store data permanently between sessions. It will be used during testing as a reference behavior of the interface.
  • If there are multiple implementations of the module interface, all interface tests must be executed on all interface implementations and pass cleanly. Additional tests should be used to cover corner cases of specific interface implementations.
  • I opt for high code coverage from the beginning. Although it does not prove that the code is bug-free, it would at least ensure most of code paths were executed while testing.

Enough of boring talk, let’s have some code!

CAS interface

I first started with draft interface. In CAS we need some obvious functionality: uploading blob and downloading existing one. We also need to delete existing blob and check if one does exist. I also knew I wouldn’t use buffers to specify data of blobs, streams would fit much better here - that way we can put and get blobs of arbitrary size without worrying about the amount of consumed memory.

I started with a trivial implementation that was storing data inside memory. It was my base when I was writing interface tests.

Once I felt I have enough of the interface covered, I started working on more practical implementation storing data on local filesystem. During implementation I also made small change to the interface - I split uploading blob into two separate functions. First would be used to upload blob when it’s name is known - such upload would fail if name does not match the contents. I also made an alternative method where we only give data and get the calculated name back.

When everything was up and running I wanted to add one more functionality to the module. In order to make first usable application I wanted to let us play with cas using plain web browser. For this purpose I made web interface. It’s purpose is to take any implementation of CAS interface and some configuration and expose CAS functionality through http. Implementation worked like a charm. It is very simple and follows REST design:

  • GET reads a blob
  • POST uploads new blob if we don’t know the name
  • PUT uploads blob if we know the name
  • DELETE removes blob
  • HEAD finds out if blob exists

Initially POST and PUT were taking raw post data as post body. I extended it later to also support multipart/form-data content types to allow uploading from HTML form.

Once the Web Interface was ready, I needed an easy way to test it. What would perfectly make sense here is a new implementation of CAS interface that talks to some remotely available Web backend with compatible protocol (so a web interface). Once I did it I could use common interface testing code. Tests were run on CAS implementation which connected to local web interface. This web interface on then was attached to local memory-based implementation.

Let’s make it practical

Ok, we have all those blocks of code but none of them is any useful standalone application yet. Just bunch of code inside cas library which can’t be used without writing some go code. We can expose CAS through HTTP, so I created a simple application that exposes data from a local directory (so filesystem-based CAS implementation) in a form of a simple web server. Let’s give it a try:

1
2
3
4
5
6
7
# Fetch and compile server binary
go install github.com/cinode/go/cas/cmd/cinode_cas_fileserver

# Start the server, configuration is passed through environmental variables
CN_CAS_LISTEN_ADDR=127.0.0.1:8080 \
CN_CAS_DATA_FOLDER=/tmp/cinode/ \
  $GOPATH/bin/cinode_cas_fileserver

In a separate shell:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# Upload some data blob (extra echo needed since CAS server doesn't output
# newline character)
curl -d "Hello world!" http://127.0.0.1:8080/; echo
# Will output: XB5P3GWTHnHTud55BijFsGmBjHZtwtc44es4HsZnqMGM

# Download the blob (extra echo needed for reasons same as above)
curl http://127.0.0.1:8080/XB5P3GWTHnHTud55BijFsGmBjHZtwtc44es4HsZnqMGM; echo
# Will output: Hello world!

# Upload some file
echo "Hello world from a file" > /tmp/upload_file.txt
curl --form "fileupload=@/tmp/upload_file.txt;filename=file.txt" \
  http://127.0.0.1:8080/; echo
# Will output: WHa8NVuf1RfUVrqsyqmo2Vh1xyuVTqJqJePhjSxHwZmy

# Download the blob (no need for extra echo since the uploaded file had newline)
curl http://127.0.0.1:8080/WHa8NVuf1RfUVrqsyqmo2Vh1xyuVTqJqJePhjSxHwZmy
# Will output: Hello world from a file

A small note about those strange blob names - they don’t look like hex string nor base64-encoded values. Instead I decided to use Base58 - it’s commonly used in BitCoin and encodes any binary data to a string that always consists of alphanumeric characters - a perfect selection if we don’t want to deal with encoding incompatibilities in the future. This encoding may not be very fast but our blob names are not that long anyway.

Ok, once we have a web server and can upload some data into it, maybe we should make some more practical use of it? First let’s try to use CAS as a storage for html pages:

1
2
3
4
5
6
curl -d '<!doctype html><html><body><h1>Hello world!</h1></html>' \
  http://127.0.0.1:8080/; echo
# Will output: QH9EmJveKGD1ppUgDVjBqZ8XKw4crBuSNHApgjK9RbgR

# Open in web browser
x-www-browser http://127.0.0.1:8080/QH9EmJveKGD1ppUgDVjBqZ8XKw4crBuSNHApgjK9RbgR

I also made a simple dataset with html that lets you upload files directly from the web browser:

1
2
3
4
5
6
git clone https://github.com/cinode/testdata-cas-filesystem.git /tmp/test-data
cd /tmp/test-data
git checkout v2016-07-29
CN_CAS_LISTEN_ADDR=127.0.0.1:8080 \
CN_CAS_DATA_FOLDER=/tmp/test-data/ \
  $GOPATH/bin/cinode_cas_fileserver
1
x-www-browser http://127.0.0.1:8080/bR9AEqo79uoYLoGx2Xf1JViBbTz5aZWBy9qkHMZxeWYM

That’s it for today. Apart from CAS there’s still an encryption layer to be implemented. I hope to cover it in next post.

Bye