infoxtractor/src/ix/pipeline/step.py
Dirk Riemann dcd1bc764a
All checks were successful
tests / test (push) Successful in 56s
tests / test (pull_request) Successful in 1m7s
feat(pipeline): Step ABC + Pipeline runner + Timer (spec §3, §4)
Adds the transport-agnostic pipeline orchestrator. Each step implements
async validate + process; the runner wraps both in a Timer, writes
per-step entries to response.metadata.timings, and aborts on the first
IXException by writing response.error.

- Step exposes a step_name property (defaults to class name) so tests and
  logs label steps consistently.
- Timer is a plain context manager that appends one {step, elapsed_seconds}
  entry on exit regardless of whether the body raised, so the timeline
  stays reconstructable for failed steps.
- 9 unit tests cover ordering, skip-on-false, IXException in validate vs.
  process, timings populated for every executed step, and shared-response
  mutation across steps. Non-IX exceptions propagate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:06:46 +02:00

58 lines
2 KiB
Python

"""Step ABC — the pipeline-step contract (spec §3).
Every pipeline step implements two async hooks:
* :meth:`Step.validate` — returns ``True`` when the step should run for this
request, ``False`` when it should be silently skipped. May raise
:class:`~ix.errors.IXException` to abort the pipeline with an error code.
* :meth:`Step.process` — does the work; mutates the shared ``response_ix``
and returns it. May raise :class:`~ix.errors.IXException` to abort.
Both hooks are async so steps that need I/O (file download, OCR, LLM) can
cooperate with the asyncio event loop without sync-async conversion dances.
"""
from __future__ import annotations
from abc import ABC, abstractmethod
from ix.contracts import RequestIX, ResponseIX
class Step(ABC):
"""Abstract base for every pipeline step.
Subclasses override both hooks. The pipeline runner guarantees
``validate`` is called before ``process`` for a given step, and that
``process`` runs iff ``validate`` returned ``True``.
The :attr:`step_name` property controls the label written to
``metadata.timings``. Defaults to the class name so production steps
(``SetupStep``, ``OCRStep``, …) log under their own name; test doubles
override it with the value under test.
"""
@property
def step_name(self) -> str:
"""Label used in ``metadata.timings``. Default: class name."""
return type(self).__name__
@abstractmethod
async def validate(self, request_ix: RequestIX, response_ix: ResponseIX) -> bool:
"""Return ``True`` to run :meth:`process`, ``False`` to skip silently.
Raise :class:`~ix.errors.IXException` to abort the pipeline with an
error code on ``response_ix.error``.
"""
@abstractmethod
async def process(
self, request_ix: RequestIX, response_ix: ResponseIX
) -> ResponseIX:
"""Run the step; return the (same, mutated) ``response_ix``.
Raising :class:`~ix.errors.IXException` aborts the pipeline.
"""
__all__ = ["Step"]