Adds the two Protocol-based client contracts the pipeline steps depend on, plus test-oriented fakes. Real engines (Surya, Ollama) land in Chunk 4. - ix.ocr.client.OCRClient — runtime_checkable Protocol with async ocr(). - ix.genai.client.GenAIClient — runtime_checkable Protocol with async invoke(); GenAIInvocationResult + GenAIUsage dataclasses carry the parsed model, token usage, and model name. - FakeOCRClient / FakeGenAIClient: return canned results; both expose a raise_on_call hook for error-path tests. 8 unit tests across tests/unit/test_ocr_fake.py + test_genai_fake.py confirm protocol conformance, canned-return behaviour, usage/model-name defaults, and raise_on_call propagation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
32 lines
972 B
Python
32 lines
972 B
Python
"""OCRClient Protocol (spec §6.2).
|
|
|
|
Structural typing: any object with an async ``ocr(pages) -> OCRResult``
|
|
method satisfies the Protocol. :class:`~ix.pipeline.ocr_step.OCRStep`
|
|
depends on the Protocol, not a concrete class, so swapping engines
|
|
(``FakeOCRClient`` in tests, ``SuryaOCRClient`` in prod) stays a wiring
|
|
change at the app factory.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
from typing import Protocol, runtime_checkable
|
|
|
|
from ix.contracts import OCRResult, Page
|
|
|
|
|
|
@runtime_checkable
|
|
class OCRClient(Protocol):
|
|
"""Async OCR backend.
|
|
|
|
Implementations receive the flat page list the pipeline built in
|
|
:class:`~ix.pipeline.setup_step.SetupStep` and return an
|
|
:class:`~ix.contracts.OCRResult` with one :class:`~ix.contracts.Page`
|
|
per input page (in the same order).
|
|
"""
|
|
|
|
async def ocr(self, pages: list[Page]) -> OCRResult:
|
|
"""Run OCR over the input pages; return the structured result."""
|
|
...
|
|
|
|
|
|
__all__ = ["OCRClient"]
|