Non-python workers, Multiple Executors¶
Tierkreis works easiest when running with python workers. Still it supports using arbitrary workers that provide a binary to run. In this example we will look at using a shell script as a worker. Conceptually, this also works with other programming languages, for example if you want to use a legacy HPC application.
To make this work we need to introduce three changes:
(Optional) We should provide an interface definition using a type spec. This allows tierkreis to generate the api and use them during graph construction.
Rewrite or wrap the script such that it follows the tierkreis contract. In short we have to change how we read inputs and write outputs.
Use a suitable executor. There are two flavors we will discuss in a bit.
Preliminaries¶
For this example we’re going to use the auth worker which already has some predefined functionality.
Make sure to familiarize yourself with its contents.
In this example we’re going to use a shell script with the following contents as a worker, which is contained in the openssl_worker.
#!/usr/bin/env bash
pk_file=$1
passphrase=$2
numbits=$3
openssl genrsa -out $pk_file -aes128 -passout "file:$passphrase" $numbits
openssl rsa -in $pk_file -passin "file:$passphrase" -pubout -out public-out
Defining the interface¶
To do this we can define a TypeSpec dsl.
@portmapping
model Outputs {
private_key: bytes
public_key: bytes
}
interface openssl_worker {
genrsa (
numbits: int,
passphrase: bytes
): Outputs
}
It defines two outputs private_key and public_key consuming a passphrase and numbits.
The most generic type for such values is bytes, as we will be reading and writing from file directly.
Now we can generate the stubs using the cli tkr init stubs or from python.
%pip install tierkreis
/home/runner/work/tierkreis/tierkreis/.venv/bin/python3: No module named pip
Note: you may need to restart the kernel to use updated packages.
from pathlib import Path
from tierkreis.namespace import Namespace
if __name__ == "__main__":
tsp_path = (
Path().parent / "example_workers" / "openssl_worker" / "src" / "schema.tsp"
)
namespace = Namespace.from_spec_file(tsp_path)
namespace.write_stubs(tsp_path.parent / "api" / "stubs.py")
1 file reformatted
Found 10 errors (10 fixed, 0 remaining).
Adapting the script to the tierkreis contract¶
Tierkreis writes intermediate values to storage and expects outputs to be written there as well.
To make this less verbose, tierkreis will export the corresponding locations as environment variables with the following schema.
For an input e.g. numbits there will be a variable $input_numbits_file ($input_<name>_file) which you can read as numbits=$(cat $input_numbits_file) in your script.
Analogous, for each output e.g. private_key there will be a variable $input_private_key_file ($output_<name>_file) which you can use in your script e.g. with tee.
This is convenient if your scripts already reads and writes to files.
Instead, if you want to use the values directly, you can also set up the executors (we will see this later) to pass the values in the variables.
This is only available for inputs which will then have the form $input_<name>_value.
Adding these changes to the script above yields:
numbits=$(cat $input_numbits_file)
openssl genrsa -out $output_private_key_file -aes128 -passout "file:$input_passphrase_file" $numbits
openssl rsa -in $output_private_key_file -passin "file:$input_passphrase_file" -pubout -out $output_public_key_file
Building a graph using the script¶
Were going to build a graph that checks whether we successfully signed a message with a generated private key.
from tierkreis.builder import GraphBuilder
from tierkreis.models import TKR, EmptyModel
from auth_worker import sign, verify
from openssl_worker import genrsa, Outputs
def signing_graph():
g = GraphBuilder(EmptyModel, TKR[bool])
message = g.const("dummymessage")
passphrase = g.const(b"dummypassphrase")
key_pair: Outputs = g.task(genrsa(g.const(4096), passphrase))
private_key: TKR[bytes] = key_pair.private_key
public_key: TKR[bytes] = key_pair.public_key
signing_result = g.task(sign(private_key, passphrase, message)).hex_signature
verification_result = g.task(verify(public_key, signing_result, message))
g.outputs(verification_result)
return g
which we now can run. Before we continue we have to set up the storage.
from uuid import UUID
from tierkreis.storage import FileStorage
storage = FileStorage(UUID(int=105))
storage.clean_graph_files()
Setting up the correct executors¶
In the graph above, we use two types of workers: python and shell.
This means we need to also set up the executors accordingly.
For the python ones we can use the UVExecutor as before, while for shell scripts there is the ShellExecutor.
Since we can only provide a single executor to the run we have to combine them using the MultipleExecutor.
from tierkreis.executor import MultipleExecutor, UvExecutor, ShellExecutor
registry_path = Path().parent / "example_workers"
uv = UvExecutor(registry_path, storage.logs_path)
shell = ShellExecutor(
registry_path, storage.workflow_dir
) # export_values=True enables passing values via env vars
executor = MultipleExecutor(uv, {"shell": shell}, {"openssl_worker": "shell"})
The MultipleExecutor uses a default executor uv and has a named list of executors {"shell": shell} and a mapping from worker to executor {"openssl_worker": "shell"}.
Withe the executor defined we now can run the graph.
from tierkreis.storage import read_outputs
from tierkreis import run_graph
run_graph(storage, executor, signing_graph().get_data(), {})
is_verified = read_outputs(signing_graph().get_data(), storage)
print(is_verified)
tee: 00000000-0000-0000-0000-000000000069/-.N3/logs: No such file or directory
tee: 00000000-0000-0000-0000-000000000069/-.N3/logs: No such file or directory
True
Running simple scripts¶
The above example is the most common way of running scripts with multiple inputs and outputs. There is an even simpler way if your script meets the following conditions:
It has a single input which it reads from
stdinIt has a single output which it writes to
stdout
For example the shell build in tee does exactly that.
For such scripts, we can use the script function in conjunction with the StdInOut executor.
from tierkreis.builder import script
from tierkreis.controller.executor.stdinout import StdInOut
def stdinout_graph():
g = GraphBuilder(EmptyModel, TKR[bytes])
message = g.const(b"dummymessage")
output = g.task(script("tee", message))
g.outputs(output)
return g
storage.clean_graph_files()
stdinout = StdInOut(registry_path, storage.workflow_dir)
run_graph(storage, stdinout, stdinout_graph().get_data(), {})
out = read_outputs(stdinout_graph().get_data(), storage)
print(out)
dummymessage