Non-python workers, Multiple Executors

Tierkreis works easiest when running with python workers. Still it supports using arbitrary workers that provide a binary to run. In this example we will look at using a shell script as a worker. Conceptually, this also works with other programming languages, for example if you want to use a legacy HPC application.

To make this work we need to introduce three changes:

  1. (Optional) We should provide an interface definition using a type spec. This allows tierkreis to generate the api and use them during graph construction.

  2. Rewrite or wrap the script such that it follows the tierkreis contract. In short we have to change how we read inputs and write outputs.

  3. Use a suitable executor. There are two flavors we will discuss in a bit.

Preliminaries

For this example we’re going to use the auth worker which already has some predefined functionality. Make sure to familiarize yourself with its contents. In this example we’re going to use a shell script with the following contents as a worker, which is contained in the openssl_worker.

#!/usr/bin/env bash
pk_file=$1
passphrase=$2
numbits=$3

openssl genrsa -out $pk_file -aes128 -passout "file:$passphrase" $numbits
openssl rsa -in $pk_file -passin "file:$passphrase" -pubout -out public-out

Defining the interface

To do this we can define a TypeSpec dsl.

@portmapping
model Outputs {
    private_key: bytes
    public_key: bytes
}

interface openssl_worker {
    genrsa (
        numbits: int,
        passphrase: bytes
    ): Outputs 
}

It defines two outputs private_key and public_key consuming a passphrase and numbits. The most generic type for such values is bytes, as we will be reading and writing from file directly. Now we can generate the stubs using the cli tkr init stubs or from python.

%pip install tierkreis
/home/runner/work/tierkreis/tierkreis/.venv/bin/python3: No module named pip
Note: you may need to restart the kernel to use updated packages.
from pathlib import Path
from tierkreis.namespace import Namespace


if __name__ == "__main__":
    tsp_path = (
        Path().parent / "example_workers" / "openssl_worker" / "src" / "schema.tsp"
    )
    namespace = Namespace.from_spec_file(tsp_path)
    namespace.write_stubs(tsp_path.parent / "api" / "stubs.py")
1 file reformatted
Found 10 errors (10 fixed, 0 remaining).

Adapting the script to the tierkreis contract

Tierkreis writes intermediate values to storage and expects outputs to be written there as well. To make this less verbose, tierkreis will export the corresponding locations as environment variables with the following schema. For an input e.g. numbits there will be a variable $input_numbits_file ($input_<name>_file) which you can read as numbits=$(cat $input_numbits_file) in your script. Analogous, for each output e.g. private_key there will be a variable $input_private_key_file ($output_<name>_file) which you can use in your script e.g. with tee. This is convenient if your scripts already reads and writes to files. Instead, if you want to use the values directly, you can also set up the executors (we will see this later) to pass the values in the variables. This is only available for inputs which will then have the form $input_<name>_value.

Adding these changes to the script above yields:

numbits=$(cat $input_numbits_file)
openssl genrsa -out $output_private_key_file -aes128 -passout "file:$input_passphrase_file" $numbits
openssl rsa -in $output_private_key_file -passin "file:$input_passphrase_file" -pubout -out $output_public_key_file

Building a graph using the script

Were going to build a graph that checks whether we successfully signed a message with a generated private key.

from tierkreis.builder import GraphBuilder
from tierkreis.models import TKR, EmptyModel

from auth_worker import sign, verify
from openssl_worker import genrsa, Outputs


def signing_graph():
    g = GraphBuilder(EmptyModel, TKR[bool])
    message = g.const("dummymessage")
    passphrase = g.const(b"dummypassphrase")

    key_pair: Outputs = g.task(genrsa(g.const(4096), passphrase))
    private_key: TKR[bytes] = key_pair.private_key
    public_key: TKR[bytes] = key_pair.public_key

    signing_result = g.task(sign(private_key, passphrase, message)).hex_signature
    verification_result = g.task(verify(public_key, signing_result, message))
    g.outputs(verification_result)

    return g

which we now can run. Before we continue we have to set up the storage.

from uuid import UUID
from tierkreis.storage import FileStorage

storage = FileStorage(UUID(int=105))
storage.clean_graph_files()

Setting up the correct executors

In the graph above, we use two types of workers: python and shell. This means we need to also set up the executors accordingly. For the python ones we can use the UVExecutor as before, while for shell scripts there is the ShellExecutor. Since we can only provide a single executor to the run we have to combine them using the MultipleExecutor.

from tierkreis.executor import MultipleExecutor, UvExecutor, ShellExecutor

registry_path = Path().parent / "example_workers"
uv = UvExecutor(registry_path, storage.logs_path)
shell = ShellExecutor(
    registry_path, storage.workflow_dir
)  # export_values=True enables passing values via env vars
executor = MultipleExecutor(uv, {"shell": shell}, {"openssl_worker": "shell"})

The MultipleExecutor uses a default executor uv and has a named list of executors {"shell": shell} and a mapping from worker to executor {"openssl_worker": "shell"}. Withe the executor defined we now can run the graph.

from tierkreis.storage import read_outputs
from tierkreis import run_graph

run_graph(storage, executor, signing_graph().get_data(), {})
is_verified = read_outputs(signing_graph().get_data(), storage)
print(is_verified)
tee: 00000000-0000-0000-0000-000000000069/-.N3/logs: No such file or directory
tee: 00000000-0000-0000-0000-000000000069/-.N3/logs: No such file or directory
True

Running simple scripts

The above example is the most common way of running scripts with multiple inputs and outputs. There is an even simpler way if your script meets the following conditions:

  • It has a single input which it reads from stdin

  • It has a single output which it writes to stdout

For example the shell build in tee does exactly that. For such scripts, we can use the script function in conjunction with the StdInOut executor.

from tierkreis.builder import script
from tierkreis.controller.executor.stdinout import StdInOut


def stdinout_graph():
    g = GraphBuilder(EmptyModel, TKR[bytes])
    message = g.const(b"dummymessage")
    output = g.task(script("tee", message))

    g.outputs(output)
    return g


storage.clean_graph_files()
stdinout = StdInOut(registry_path, storage.workflow_dir)
run_graph(storage, stdinout, stdinout_graph().get_data(), {})
out = read_outputs(stdinout_graph().get_data(), storage)
print(out)
dummymessage