{ "cells": [ { "cell_type": "markdown", "id": "921ca20d", "metadata": {}, "source": [ "# Errors and Debugging\n", "\n", "The most common type of errors at runtime stems from worker tasks.\n", "A first measure to preventing this is using the correct types, but still something might go wrong.\n", "In this example we will produce an error and investigate it with the debugging tools in tierkreis.\n", "\n", "## Worker Errors\n", "\n", "Worker errors can occur in multiple ways.\n", "For python workers an error occurs when an uncaught exception raises.\n", "For other workers (including python) a non-zero exit code will also produce an error.\n", "\n", "Defining a graph that will always run an error:" ] }, { "cell_type": "code", "execution_count": null, "id": "5a90e33e", "metadata": {}, "outputs": [], "source": [ "from tierkreis.builder import GraphBuilder\n", "from tierkreis.controller.data.core import EmptyModel\n", "from tierkreis.controller.data.models import TKR\n", "\n", "from error_worker import fail\n", "\n", "\n", "def error_graph() -> GraphBuilder:\n", " g = GraphBuilder(EmptyModel, TKR[str])\n", " output = g.task(fail())\n", " g.outputs(output)\n", " return g" ] }, { "cell_type": "markdown", "id": "b1420449", "metadata": {}, "source": [ "The task `fail` will raise an `TierkreisError` (`\"I refuse!\"`) when running:" ] }, { "cell_type": "code", "execution_count": null, "id": "11955c82", "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "from uuid import UUID\n", "\n", "from tierkreis.controller import run_graph\n", "from tierkreis.controller.executor.uv_executor import UvExecutor\n", "from tierkreis.controller.storage.filestorage import ControllerFileStorage\n", "from tierkreis.exceptions import TierkreisError\n", "\n", "workflow_id = UUID(int=103)\n", "storage = ControllerFileStorage(workflow_id, name=\"error_handling\", do_cleanup=True)\n", "\n", "registry_path = Path().parent / \"example_workers\"\n", "executor = UvExecutor(registry_path=registry_path, logs_path=storage.logs_path)\n", "try:\n", " run_graph(\n", " storage,\n", " executor,\n", " error_graph().data,\n", " {\"value\": \"world!\"},\n", " polling_interval_seconds=0.1,\n", " )\n", "except TierkreisError: # we will catch this here\n", " output = storage.read_errors()" ] }, { "cell_type": "markdown", "id": "642644d9", "metadata": {}, "source": [ "## Debugging\n", "\n", "In this example we will only investigate the root cause of the error.\n", "In the next one we will see how we can resume a graph from its checkpoint.\n", "\n", "The first avenue for debugging is enabling fine grained logging.\n", "The tierkreis logging inherits properties from the root logger so it suffices to set a `basicConfig` which changes **only** the logger of the controller.\n", "When running a python worker, Tierkreis will check the environment variables `$TKR_LOG_LEVEL`, `$TKR_LOG_FORMAT` and `$TKR_DATE_FORMAT` for logger information as detailed [here](../logging_and_errors.md)." ] }, { "cell_type": "code", "execution_count": null, "id": "bdc800bd", "metadata": {}, "outputs": [], "source": [ "import contextlib\n", "import logging\n", "\n", "logging.basicConfig(\n", " format=\"%(asctime)s: %(message)s\",\n", " datefmt=\"%Y-%m-%dT%H:%M:%S%z\",\n", " level=logging.DEBUG,\n", ")\n", "\n", "storage.clean_graph_files()\n", "with contextlib.suppress(TierkreisError):\n", " run_graph(\n", " storage,\n", " executor,\n", " error_graph(),\n", " {\"value\": \"world!\"},\n", " polling_interval_seconds=0.1,\n", " )" ] }, { "cell_type": "markdown", "id": "1039bebc", "metadata": {}, "source": [ "For most use cases, tierkreis can also leverage python breakpoint debugging.\n", "The condition for this to work is that the graph only uses python workers.\n", "To do this you can use an alternative executor that stores the graph information in memory" ] }, { "cell_type": "code", "execution_count": null, "id": "e989e15b", "metadata": {}, "outputs": [], "source": [ "from tierkreis.controller.executor.in_memory_executor import InMemoryExecutor\n", "from tierkreis.storage import InMemoryStorage\n", "\n", "storage = InMemoryStorage(UUID(int=103))\n", "executor = InMemoryExecutor(registry_path, storage)\n", "\n", "try:\n", " run_graph(\n", " storage,\n", " executor,\n", " error_graph().data,\n", " {\"value\": \"world!\"},\n", " )\n", "except Exception: # Note the different exception type here\n", " pass" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.11" } }, "nbformat": 4, "nbformat_minor": 5 }