Errors and Debugging

The most common type of errors at runtime stems from worker tasks. A first measure to preventing this is using the correct types, but still something might go wrong. In this example we will produce an error and investigate it with the debugging tools in tierkreis.

Worker Errors

Worker errors can occur in multiple ways. For python workers an error occurs when an uncaught exception raises. For other workers (including python) a non-zero exit code will also produce an error.

Defining a graph that will always run an error:

from tierkreis.builder import GraphBuilder
from tierkreis.controller.data.core import EmptyModel
from tierkreis.controller.data.models import TKR

from error_worker import fail


def error_graph() -> GraphBuilder:
    g = GraphBuilder(EmptyModel, TKR[str])
    output = g.task(fail())
    g.outputs(output)
    return g

The task fail will raise an TierkreisError ("I refuse!") when running:

from pathlib import Path
from uuid import UUID

from tierkreis.controller import run_graph
from tierkreis.controller.executor.uv_executor import UvExecutor
from tierkreis.controller.storage.filestorage import ControllerFileStorage
from tierkreis.exceptions import TierkreisError

workflow_id = UUID(int=103)
storage = ControllerFileStorage(workflow_id, name="error_handling", do_cleanup=True)

registry_path = Path().parent / "example_workers"
executor = UvExecutor(registry_path=registry_path, logs_path=storage.logs_path)
try:
    run_graph(
        storage,
        executor,
        error_graph().data,
        {"value": "world!"},
        polling_interval_seconds=0.1,
    )
except TierkreisError:  # we will catch this here
    output = storage.read_errors()

Debugging

In this example we will only investigate the root cause of the error. In the next one we will see how we can resume a graph from its checkpoint.

The first avenue for debugging is enabling fine grained logging. The tierkreis logging inherits properties from the root logger so it suffices to set a basicConfig which changes only the logger of the controller. When running a python worker, Tierkreis will check the environment variables $TKR_LOG_LEVEL, $TKR_LOG_FORMAT and $TKR_DATE_FORMAT for logger information as detailed here.

import contextlib
import logging

logging.basicConfig(
    format="%(asctime)s: %(message)s",
    datefmt="%Y-%m-%dT%H:%M:%S%z",
    level=logging.DEBUG,
)

storage.clean_graph_files()
with contextlib.suppress(TierkreisError):
    run_graph(
        storage,
        executor,
        error_graph(),
        {"value": "world!"},
        polling_interval_seconds=0.1,
    )
2026-03-02T16:36:18+0000: START error_worker 00000000-0000-0000-0000-000000000067/-.N0/definition
2026-03-02T16:36:18+0000: Node -.N0 has encountered an error.
2026-03-02T16:36:18+0000: 

Graph finished with errors.
2026-03-02T16:36:18+0000: I refuse!Traceback (most recent call last):
  File "/home/runner/work/tierkreis/tierkreis/tierkreis/tierkreis/worker/worker.py", line 187, in run
    function(node_definition)
  File "/home/runner/work/tierkreis/tierkreis/tierkreis/tierkreis/worker/worker.py", line 156, in wrapper
    results = func(**kwargs)
              ^^^^^^^^^^^^^^
  File "/home/runner/work/tierkreis/tierkreis/docs/source/examples/example_workers/error_worker/src/main.py", line 13, in fail
    raise Exception(msg)
Exception: I refuse!

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/runner/work/tierkreis/tierkreis/docs/source/examples/example_workers/error_worker/src/main.py", line 18, in <module>
    worker.app(argv)
  File "/home/runner/work/tierkreis/tierkreis/tierkreis/tierkreis/worker/worker.py", line 214, in app
    self.run(Path(argv[1]))
  File "/home/runner/work/tierkreis/tierkreis/tierkreis/tierkreis/worker/worker.py", line 198, in run
    raise TierkreisWorkerError(
tierkreis.worker.worker.TierkreisWorkerError: Worker error_worker encountered error when executing fail.
2026-03-02T16:36:18+0000: Node: '-.N0' encountered an error.
2026-03-02T16:36:18+0000: Stderr information is available at /home/runner/.tierkreis/checkpoints/00000000-0000-0000-0000-000000000067/-.N0/logs.
2026-03-02T16:36:18+0000: --- Tierkreis graph errors above this line. ---

For most use cases, tierkreis can also leverage python breakpoint debugging. The condition for this to work is that the graph only uses python workers. To do this you can use an alternative executor that stores the graph information in memory

from tierkreis.controller.executor.in_memory_executor import InMemoryExecutor
from tierkreis.storage import InMemoryStorage

storage = InMemoryStorage(UUID(int=103))
executor = InMemoryExecutor(registry_path, storage)

try:
    run_graph(
        storage,
        executor,
        error_graph().data,
        {"value": "world!"},
    )
except Exception:  # Note the different exception type here
    pass
2026-03-02T16:36:18+0000: Could not log to file, logging to std out instead.
2026-03-02T16:36:18+0000: START error_worker 00000000-0000-0000-0000-000000000067/-.N0/definition