Runs Surya's detection + recognition over PIL images rendered from each
Page's source file (PDFs via PyMuPDF, images via Pillow). Lazy warm_up
so FastAPI lifespan start stays predictable. Deferred Surya/torch
imports keep the base install slim — the heavy deps stay under [ocr].
Extends OCRClient Protocol with optional files + page_metadata kwargs
so the engine can resolve each page back to its on-disk source; Fake
accepts-and-ignores to keep hermetic tests unchanged.
selfcheck() runs the predictors on a 1x1 PIL image — wired into /healthz
by Task 4.3.
Tests: 6 hermetic unit tests (Surya predictors mocked, no model
download); 2 live tests gated on IX_TEST_OLLAMA=1 (never run in CI).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Real GenAIClient for the production pipeline. Sends `format=<pydantic JSON
schema>`, `stream=false`, and mapped options (`temperature`; drops
`reasoning_effort`). Content-parts lists joined to a single string since
MVP models don't speak native content-parts. Error mapping per spec:
connection/timeout/5xx → IX_002_000, schema violations → IX_002_001.
`selfcheck()` probes /api/tags with a fixed 5 s timeout for /healthz.
Tests: 10 hermetic pytest-httpx unit tests; 2 live tests gated on
IX_TEST_OLLAMA=1 (never run in CI).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>