# InfoXtractor (ix) MVP — Design Date: 2026-04-18 Reference: `docs/spec-core-pipeline.md` (full, aspirational spec — MVP is a strict subset) Status: Design approved (sections 1–8 walked through and accepted 2026-04-18) ## 0. One-paragraph summary ix is an on-prem, async, LLM-powered microservice that extracts structured JSON from documents (PDFs, images, text) given a named *use case* (a Pydantic schema + system prompt). It returns the extracted fields together with per-field provenance (OCR segment IDs, bounding boxes, extracted-value agreement flags) that let calling services decide how much to trust each value. The MVP ships one use case (`bank_statement_header`), one OCR engine (Surya, pluggable), one LLM backend (Ollama, pluggable), and two transports in parallel (REST with optional webhook callback + a Postgres queue). Cloud services are explicitly forbidden. The first consumer is mammon, which uses ix as a fallback for `needs_parser` documents. ## 1. Guiding principles - **On-prem always.** No OpenAI, Anthropic, Azure (DI/CV/OpenAI), AWS (Bedrock/Textract), Google Document AI, Mistral, etc. LLM = Ollama (:11434). OCR = local engines only. Secrets never leave the home server. The spec's cloud references are examples to replace, not inherit. - **Grounded extraction, not DB truth.** ix returns best-effort fields with segment citations, provenance verification, and cross-OCR agreement flags. ix does *not* claim DB-grade truth. The reliability decision (trust / stage for review / reject) belongs to the caller. - **Transport-agnostic pipeline core.** The pipeline (`RequestIX` → `ResponseIX`) knows nothing about HTTP, queues, or databases. Adapters (REST, Postgres queue) run alongside the core; both converge on one shared job store. - **YAGNI for all spec features the MVP doesn't need.** Kafka, Config Server, Azure/AWS clients, vision, word-level provenance, reasoning-effort routing, Prometheus/OTEL exporters: deferred. Architecture leaves the interfaces so they can be added without touching the pipeline core. ## 2. Architecture ``` ┌──────────────────────────────────────────────────────────────────┐ │ infoxtractor container (Docker on 192.168.68.42, port 8994) │ │ │ │ ┌──────────────────┐ ┌──────────────────────────┐ │ │ │ rest_adapter │ │ pg_queue_adapter │ │ │ │ (FastAPI) │ │ (asyncio worker) │ │ │ │ POST /jobs │ │ LISTEN ix_jobs_new + │ │ │ │ GET /jobs/{id} │ │ SELECT ... FOR UPDATE │ │ │ │ + callback_url │ │ SKIP LOCKED │ │ │ └────────┬─────────┘ └────────┬─────────────────┘ │ │ │ │ │ │ └──────────┬──────────────┘ │ │ ▼ │ │ ┌────────────────┐ │ │ │ ix_jobs table │ ── postgis :5431, DB=infoxtractor│ │ └────────┬───────┘ │ │ ▼ │ │ ┌─────────────────────────────┐ │ │ │ Pipeline core (spec §3–§4) │ │ │ │ │ │ │ │ SetupStep → OCRStep → │ │ │ │ GenAIStep → ReliabilityStep│ │ │ │ → ResponseHandler │ │ │ │ │ │ │ │ Uses: OCRClient (Surya), │ │ │ │ GenAIClient (Ollama),│ │ │ │ UseCaseRegistry │ │ │ └─────────────────────────────┘ │ │ │ └──────────────────────────────────────────────────────────────────┘ │ ▲ ▼ │ host.docker.internal:11434 mammon or any on-prem caller — Ollama (gpt-oss:20b default) polls GET /jobs/{id} until done ``` **Key shapes:** - Spec's four steps + a new fifth: `ReliabilityStep` runs between `GenAIStep` and `ResponseHandlerStep`, computes per-field `provenance_verified` and `text_agreement` flags. Isolated so callers and tests can reason about reliability signals independently. - Single worker at MVP (`PIPELINE_WORKER_CONCURRENCY=1`). Ollama + Surya both want the GPU serially. - Two transports, one job store. REST is the primary; pg queue is scaffolded, uses the same table, same lifecycle. - Use case registry in-repo (`ix/use_cases/__init__.py`); adding a new use case = new module + one registry line. ## 3. Data contracts Subset of spec §2 / §9.3. Dropped fields are no-ops under the MVP's feature set. ### RequestIX ```python class RequestIX(BaseModel): use_case: str # registered name, e.g. "bank_statement_header" ix_client_id: str # caller tag for logs/metrics, e.g. "mammon" request_id: str # caller's correlation id; echoed back ix_id: Optional[str] # caller MUST NOT set; transport assigns a 16-char hex id context: Context options: Options = Options() callback_url: Optional[str] # optional webhook delivery (one-shot, no retry) class Context(BaseModel): files: list[Union[str, FileRef]] = [] # URLs, file:// paths, or FileRef objects (for auth headers) texts: list[str] = [] # extra text (e.g. Paperless OCR output) class FileRef(BaseModel): """Used when a file URL requires auth headers (e.g. Paperless Token auth) or per-file overrides.""" url: str # http(s):// or file:// headers: dict[str, str] = {} # e.g. {"Authorization": "Token …"} max_bytes: Optional[int] = None # per-file override; defaults to IX_FILE_MAX_BYTES class Options(BaseModel): ocr: OCROptions = OCROptions() gen_ai: GenAIOptions = GenAIOptions() provenance: ProvenanceOptions = ProvenanceOptions() class OCROptions(BaseModel): use_ocr: bool = True ocr_only: bool = False include_ocr_text: bool = False include_geometries: bool = False service: Literal["surya"] = "surya" # kept so the adapter point is visible class GenAIOptions(BaseModel): gen_ai_model_name: Optional[str] = None # None → use-case default → IX_DEFAULT_MODEL class ProvenanceOptions(BaseModel): include_provenance: bool = True # default ON (reliability is the point) max_sources_per_field: int = 10 ``` **Dropped from spec (no-ops under MVP):** `OCROptions.computer_vision_scaling_factor`, `include_page_tags` (always on), `GenAIOptions.use_vision`/`vision_scaling_factor`/`vision_detail`/`reasoning_effort`, `ProvenanceOptions.granularity`/`include_bounding_boxes`/`source_type`/`min_confidence`, `RequestIX.version`. ### ResponseIX Identical to spec §2.2 except `FieldProvenance` gains two fields: ```python class FieldProvenance(BaseModel): field_name: str field_path: str value: Any sources: list[ExtractionSource] confidence: Optional[float] = None # reserved; always None in MVP provenance_verified: bool # NEW: cited segment actually contains value (normalized) text_agreement: Optional[bool] # NEW: value appears in RequestIX.context.texts; None if no texts ``` `quality_metrics` gains two counters: `verified_fields`, `text_agreement_fields`. ### Job envelope (in `ix_jobs` table; returned by REST) ```python class Job(BaseModel): job_id: UUID ix_id: str client_id: str request_id: str status: Literal["pending", "running", "done", "error"] request: RequestIX response: Optional[ResponseIX] callback_url: Optional[str] callback_status: Optional[Literal["pending", "delivered", "failed"]] attempts: int = 0 created_at: datetime started_at: Optional[datetime] finished_at: Optional[datetime] ``` ### Job lifecycle state machine ``` POST /jobs (or INSERT+NOTIFY) │ ▼ ┌────────┐ worker claims ┌─────────┐ pipeline returns ┌──────┐ │pending │ ─────────────────▶ │ running │ ──────────────────▶ │ done │ └────────┘ └────┬────┘ (response.error └──────┘ ▲ │ is None) │ │ │ pipeline raised / │ response_ix.error set │ pipeline returned │ │ response_ix.error ▼ │ ┌───────┐ │ │ error │ │ └───────┘ │ │ worker startup sweep: rows with status='running' AND │ started_at < now() - 2 × IX_PIPELINE_REQUEST_TIMEOUT_SECONDS │ are reset to 'pending' and attempts++ └─────────────────────────────────── ``` - `status='done'` iff `Job.response.error is None`. Any non-None `error` in the response → `status='error'`. Both terminal states are stable; nothing moves out of them. - Worker startup sweep protects against "row stuck in `running`" after a crash mid-job. Orphan detection is time-based (2× the per-job timeout), so a still-running worker never reclaims its own job. - After terminal state, if `callback_url` is set, the worker makes one HTTP POST attempt and records `callback_status` (never changes `status`). Callback failure does not undo the terminal state. ## 4. Job store ```sql CREATE DATABASE infoxtractor; -- on the existing postgis container CREATE TABLE ix_jobs ( job_id UUID PRIMARY KEY, ix_id TEXT NOT NULL, client_id TEXT NOT NULL, request_id TEXT NOT NULL, status TEXT NOT NULL, request JSONB NOT NULL, response JSONB, callback_url TEXT, callback_status TEXT, attempts INT NOT NULL DEFAULT 0, created_at TIMESTAMPTZ NOT NULL DEFAULT now(), started_at TIMESTAMPTZ, finished_at TIMESTAMPTZ ); CREATE INDEX ix_jobs_status_created ON ix_jobs (status, created_at) WHERE status = 'pending'; CREATE UNIQUE INDEX ix_jobs_client_request ON ix_jobs (client_id, request_id); -- Postgres NOTIFY channel used by the pg_queue_adapter: 'ix_jobs_new' ``` Callers that prefer direct SQL (the `pg_queue_adapter` contract): insert a row with `status='pending'`, then `NOTIFY ix_jobs_new, ''`. The worker also falls back to a 10 s poll so a missed notify or ix restart doesn't strand a job. ## 5. REST surface | Method | Path | Purpose | |---|---|---| | `POST` | `/jobs` | Body = `RequestIX` (+ optional `callback_url`). → `201 {job_id, ix_id, status: "pending"}`. Idempotent on `(ix_client_id, request_id)` — same pair returns the existing `job_id` with `200`. | | `GET` | `/jobs/{job_id}` | → full `Job`. Source of truth regardless of submission path or callback outcome. | | `GET` | `/jobs?client_id=…&request_id=…` | Lookup-by-correlation (caller idempotency helper). The pair is UNIQUE in the table → at most one match. Returns the job or `404`. | | `GET` | `/healthz` | `{postgres, ollama, ocr}`. See below for semantics. Used by `infrastructure` monitoring dashboard. | | `GET` | `/metrics` | Counters over the last 24 hours: `jobs_pending`, `jobs_running`, `jobs_done_24h`, `jobs_error_24h`, per-use-case avg seconds over the same window. Plain JSON, no Prometheus format for MVP. | **`/healthz` semantics:** - `postgres`: `SELECT 1` on the job store pool; `ok` iff the query returns within 2 s. - `ollama`: `GET {IX_OLLAMA_URL}/api/tags` within 5 s; `ok` iff reachable AND the default model (`IX_DEFAULT_MODEL`) is listed in the tags response; `degraded` iff reachable but the model is missing (ops action: run `ollama pull ` on the host); `fail` on any other error. - `ocr`: `SuryaOCRClient.selfcheck()` — returns `ok` iff CUDA is available and the Surya text-recognition model is loaded into GPU memory at process start. `fail` on any error. - Overall HTTP status: `200` iff all three are `ok`; `503` otherwise. The monitoring dashboard only surfaces `200`/`non-200`. **Callback delivery** (when `callback_url` is set): one POST of the full `Job` body, 10 s timeout. 2xx → `callback_status='delivered'`. Anything else → `'failed'`. No retry. Callers always have `GET /jobs/{id}` as the authoritative fallback. ## 6. Pipeline steps Interface per spec §3 (`async validate` + `async process`). Pipeline orchestration per spec §4 (first error aborts; each step wrapped in a `Timer` landing in `Metadata.timings`). ### SetupStep - **validate**: `request_ix` non-null; `context.files` or `context.texts` non-empty. - **process**: - Copy `request_ix.context.texts` → `response_ix.context.texts`. - Normalize each `context.files` entry: plain `str` → `FileRef(url=str, headers={})`. `file://` URLs are read locally; `http(s)://` URLs are downloaded with the per-file `headers`. - Download files to `/tmp/ix//` in parallel (asyncio + httpx). Per-file: connect timeout 10 s, read timeout 30 s, size cap `min(FileRef.max_bytes, IX_FILE_MAX_BYTES)` (default 50 MB). Any fetch failure (non-2xx, timeout, size exceeded) → `IX_000_007` with the offending URL and cause in the message. No retry. - MIME detection via `python-magic` on the downloaded bytes (do not trust URL extension). Supported: PDF (`application/pdf`), PNG (`image/png`), JPEG (`image/jpeg`), TIFF (`image/tiff`). Unsupported → `IX_000_005`. - Load use case: `entry = REGISTRY.get(request_ix.use_case)`; if `None` → `IX_001_001`. Store `(use_case_request, use_case_response)` instances in `response_ix.context`. Echo `use_case_request.use_case_name` → `response_ix.use_case_name`. - Build flat `response_ix.context.pages`: one entry per PDF page (via PyMuPDF), one per image frame, one per text entry. Hard cap 100 pages/PDF → `IX_000_006` on violation. ### OCRStep - **validate**: - If `(include_geometries or include_ocr_text or ocr_only) and not context.files` → raise `IX_000_004` (the caller asked for OCR artifacts but gave nothing to OCR). - Else return `True` iff `(use_ocr or include_geometries or include_ocr_text or ocr_only) and context.files`. Otherwise `False` → step skipped (text-only requests). - If `use_ocr=False` but any of `include_geometries`/`include_ocr_text`/`ocr_only` is set, OCR still runs — the flag triad controls what is *retained*, not whether OCR happens. - **process**: `ocr_result = await OCRClient.ocr(context.pages)` → `response_ix.ocr_result`. Always inject `` tags (simplifies grounding). If `include_provenance`: build `SegmentIndex` (line granularity, normalized bboxes 0-1) and store in `context.segment_index`. - **OCRClient interface**: ```python class OCRClient(Protocol): async def ocr(self, pages: list[Page]) -> OCRResult: ... ``` MVP implementation: `SuryaOCRClient` (GPU via `surya-ocr` PyPI package, CUDA on the RTX 3090). ### GenAIStep - **validate**: `ocr_only` → `False` (skip). Use case must exist. OCR text or `context.texts` must be non-empty (else `IX_001_000`). - **process**: - System prompt = `use_case_request.system_prompt`. If `include_provenance`: append spec §9.2 citation instruction verbatim. - User text: segment-tagged (`[p1_l0] …`) when provenance is on; plain concatenated OCR + texts otherwise. - Response schema: `UseCaseResponse` directly, or the dynamic `ProvenanceWrappedResponse(result=..., segment_citations=...)` per spec §7.2e when provenance is on. - Model: `request_ix.options.gen_ai.gen_ai_model_name` → `use_case_request.default_model` → `IX_DEFAULT_MODEL`. - Call `GenAIClient.invoke(request_kwargs, response_schema)`; parsed model → `ix_result.result`, usage + model_name → `ix_result.meta_data`. - If provenance: call `ProvenanceUtils.map_segment_refs_to_provenance(...)` per spec §9.4, write `response_ix.provenance`. - **GenAIClient interface**: ```python class GenAIClient(Protocol): async def invoke(self, request_kwargs: dict, response_schema: type[BaseModel]) -> GenAIInvocationResult: ... ``` MVP implementation: `OllamaClient` — `POST {IX_OLLAMA_URL}/api/chat` with `format = ` (Ollama structured outputs). Per-call timeout: `IX_GENAI_CALL_TIMEOUT_SECONDS` (default 1500 s, distinct from the per-job timeout so a frozen model doesn't eat the full 45-minute budget). - **Failure modes (no retry on MVP, both surface as pipeline error):** - Connection refused / timeout / 5xx → `IX_002_000` ("inference backend unavailable") with model name + endpoint. - 2xx response body cannot be parsed against the Pydantic schema (Ollama structured output violated the schema) → `IX_002_001` ("structured output parse failed") with a snippet of the offending body. ### ReliabilityStep (new; runs when `include_provenance` is True) For each `FieldProvenance` in `response_ix.provenance.fields`: - **`provenance_verified`**: for each cited segment, compare `text_snippet` to the extracted `value` after normalization (see below). If any cited segment agrees → `True`. Else `False`. - **`text_agreement`**: if `request_ix.context.texts` is empty → `None`. Else run the same comparison against the concatenated texts → `True` / `False`. **Per-field-type dispatch** (picks the comparator based on the Pydantic field annotation on the use-case response schema): | Python type annotation | Comparator | |---|---| | `str` | String normalizer (NFKC, casefold, collapse whitespace, strip common punctuation); substring check | | `int`, `float`, `Decimal` | Digits-and-sign only (strip currency symbols, thousands separators, decimal variants); exact match at 2 decimal places | | `date`, `datetime` | Parse *both* sides with `dateutil.parser(dayfirst=True)`; compare as ISO strings | | IBAN (str with `account_iban`-like names) | Upper-case, strip whitespace; exact match | | `Literal[...]` | **Skipped** — verification is `None` (caller-controlled enum labels rarely appear verbatim in the source text). `text_agreement` also `None`. | | `None` / unset value | **Skipped** — `provenance_verified = None`, `text_agreement = None`. Field still appears in provenance output. | **Short-value skip rule** (applies after comparator selection): if the stringified `value` is ≤ 2 chars, or a numeric `|value| < 10`, `text_agreement` is skipped (→ `None`). `provenance_verified` still runs — the bbox-anchored cite is stronger than a global text scan for short values. Records are mutations to the provenance structure only; does **not** drop fields. Caller sees every extracted field + the flags. Writes `quality_metrics.verified_fields` (count where `provenance_verified=True`) and `quality_metrics.text_agreement_fields` (count where `text_agreement=True`) summary counters; fields with `None` flags are not counted as either success or failure. ### ResponseHandlerStep Per spec §8, unchanged. Attach flat OCR text when `include_ocr_text`; strip `ocr_result.pages` unless `include_geometries`; delete `context` before serialization. ## 7. Use case registry ``` ix/use_cases/ __init__.py # REGISTRY: dict[str, tuple[type[UseCaseRequest], type[UseCaseResponse]]] bank_statement_header.py ``` Adding a use case = new module exporting a `Request(BaseModel)` (`use_case_name`, `default_model`, `system_prompt`) and a `UseCaseResponse(BaseModel)`, then one `REGISTRY[""] = (Request, UseCaseResponse)` line. ### First use case: `bank_statement_header` ```python class BankStatementHeader(BaseModel): bank_name: str account_iban: Optional[str] account_type: Optional[Literal["checking", "credit", "savings"]] currency: str statement_date: Optional[date] statement_period_start: Optional[date] statement_period_end: Optional[date] opening_balance: Optional[Decimal] closing_balance: Optional[Decimal] class Request(BaseModel): use_case_name: str = "Bank Statement Header" default_model: str = "gpt-oss:20b" system_prompt: str = ( "You extract header metadata from a single bank or credit-card statement. " "Return only facts that appear in the document; leave a field null if uncertain. " "Balances must use the document's numeric format (e.g. '1234.56' or '-123.45'); " "do not invent a currency symbol. Account type: 'checking' for current/Giro accounts, " "'credit' for credit-card statements, 'savings' otherwise. Always return the IBAN " "with spaces removed. Never fabricate a value to fill a required-looking field." ) ``` **Why these fields:** each appears at most once per document (one cite per field → strong `provenance_verified` signal); all reconcile against something mammon already stores (IBAN → `Account.iban`, period → verified-range chain, closing_balance → next month's opening_balance and `StatementBalance`); schema is flat (no nested arrays where Ollama structured output tends to drift). ## 8. Errors and warnings Error-code set (spec §12.2 subset + MVP-specific codes for on-prem failure modes): | Code | Trigger | |---|---| | `IX_000_000` | `request_ix` is None | | `IX_000_002` | No context (neither files nor texts) | | `IX_000_004` | `include_geometries`, `include_ocr_text`, or `ocr_only` set but `context.files` empty | | `IX_000_005` | File MIME type not supported (after byte-sniffing) | | `IX_000_006` | PDF page-count cap exceeded | | `IX_000_007` | File fetch failed (connect / timeout / non-2xx / size cap exceeded) | | `IX_001_000` | Extraction context empty after setup (OCR produced nothing AND `context.texts` empty) | | `IX_001_001` | Use case name not in `REGISTRY` | | `IX_002_000` | Inference backend unavailable (Ollama connect / timeout / 5xx) | | `IX_002_001` | Structured output parse failed (Ollama response body didn't match schema) | Warnings (non-fatal, appended to `response_ix.warning`): empty OCR result, provenance requested with `use_ocr=False`, requested model unavailable and falling back to `IX_DEFAULT_MODEL`, very short or Literal-typed field skipped during reliability check. ## 9. Configuration (`AppConfig` via `pydantic-settings`) | Key env var | Default | Meaning | |---|---|---| | `IX_POSTGRES_URL` | `postgresql+asyncpg://infoxtractor:@host.docker.internal:5431/infoxtractor` | Job store. Password must be set in `.env`; `.env.example` ships with `` as a placeholder. | | `IX_OLLAMA_URL` | `http://host.docker.internal:11434` | LLM backend | | `IX_DEFAULT_MODEL` | `gpt-oss:20b` | Fallback model | | `IX_OCR_ENGINE` | `surya` | Adapter selector (only value in MVP) | | `IX_TMP_DIR` | `/tmp/ix` | Download scratch | | `IX_PIPELINE_WORKER_CONCURRENCY` | `1` | Worker semaphore cap | | `IX_PIPELINE_REQUEST_TIMEOUT_SECONDS` | `2700` | Per-job timeout (45 min) | | `IX_GENAI_CALL_TIMEOUT_SECONDS` | `1500` | Per-LLM-call timeout (distinct from per-job) | | `IX_FILE_MAX_BYTES` | `52428800` | Default per-file download size cap (50 MB) | | `IX_FILE_CONNECT_TIMEOUT_SECONDS` | `10` | Per-file connect timeout | | `IX_FILE_READ_TIMEOUT_SECONDS` | `30` | Per-file read timeout | | `IX_RENDER_MAX_PIXELS_PER_PAGE` | `75000000` | Per-page render cap | | `IX_LOG_LEVEL` | `INFO` | | | `IX_CALLBACK_TIMEOUT_SECONDS` | `10` | Webhook POST timeout | No Azure, OpenAI, or AWS variables — those paths do not exist in the codebase. ## 10. Observability (minimal) - **Logs**: JSON-structured via `logging` + custom formatter. Every line carries `ix_id`, `client_id`, `request_id`, `use_case`. Steps emit `step_start` / `step_end` events with elapsed ms. - **Timings**: every step's elapsed-seconds recorded in `response_ix.metadata.timings` (same shape as spec §2). - **Traces**: OpenTelemetry span scaffolding present, no exporter wired. Drop-in later. - **Prometheus**: deferred. ## 11. Deployment - Repo: `goldstein/infoxtractor` on Forgejo, plus `server` bare-repo remote with `post-receive` hook mirroring mammon. - Port 8994 (LAN-only via UFW; not exposed publicly — internal service). No `infrastructure.docs_url` label, no VPS Caddy entry. - Postgres: new `infoxtractor` database on existing postgis container. - Ollama reached via `host.docker.internal:11434`. - Monitoring label: `infrastructure.web_url=http://192.168.68.42:8994`. - Backup: `backup.enable=true`, `backup.type=postgres`, `backup.name=infoxtractor`. - Dockerfile: CUDA-enabled base (`nvidia/cuda:12.4-runtime-ubuntu22.04` + Python 3.12) so Surya can use the 3090. CMD: `alembic upgrade head && uvicorn ix.app:create_app --factory --host 0.0.0.0 --port 8994`. - Docker Compose gives the container GPU access: `runtime: nvidia` + a `deploy.resources.reservations` GPU entry (same shape as Immich ML / monitoring). The `docker run` equivalent used by post-receive hooks must include `--gpus all`. - **Pre-deploy check:** the host must have `gpt-oss:20b` pulled into Ollama before first deploy (`ollama pull gpt-oss:20b`). If the model is missing at startup, `/healthz` returns `503` with `ollama: "degraded"` and the monitoring dashboard surfaces the failure. The `post-receive` hook probes `/healthz` for 60 s after container restart; a `503` that doesn't resolve fails the deploy. ## 12. Testing strategy Strict TDD — each unit is written test-first. 1. **Unit tests** (fast, hermetic): every `Step`, `SegmentIndex`, provenance-verification normalizers, `OCRClient` contract, `GenAIClient` contract, error mapping. No DB, no Ollama, no network. 2. **Integration tests** (DB + fakes): pipeline end-to-end with stub `OCRClient` (replays canned OCR results) and stub `GenAIClient` (replays canned LLM JSON). Covers step wiring + transports + job lifecycle + callback success/failure + pg queue notify + worker startup orphan-sweep. Run against a real postgres service container in Forgejo Actions (mammon CI pattern). **Forgejo Actions runners have neither GPU nor network access to the LAN Ollama/Surya instances; all inference in CI is stubbed.** Real-Ollama tests are gated behind `IX_TEST_OLLAMA=1` and run only from the Mac. 3. **E2E smoke against deployed app**: `scripts/e2e_smoke.py` on the Mac calls `POST http://192.168.68.42:8994/jobs` with a **synthetic** bank-statement fixture (`tests/fixtures/synthetic_giro.pdf` — generated from a template, no real personal data; a separate redacted-real-statement fixture lives outside git at `~/ix-fixtures/` if needed), polls `GET /jobs/{id}` until done, asserts: - `status == "done"` - `provenance.fields["result.closing_balance"].provenance_verified is True` - `text_agreement is True` when Paperless-style texts are submitted - Timings under 60 s Runs after every `git push server main` as the deploy gate. **Deploy-failure workflow:** if the smoke test fails, `git revert HEAD` creates a forward-commit that undoes the change, then push that revert commit to both `forgejo` and `server`. Never force-push to `main`; never rewrite history on deployed commits. ## 13. Mammon integration (sketch — outside this spec's scope, owned by mammon) Belongs in a mammon-side follow-up spec. Captured here only so readers of ix know the MVP's first consumer. - Paperless poller keeps current behavior for format-matched docs. - For `needs_parser` docs: submit to ix (`use_case="bank_statement_header"`, `files=[paperless_download_url]`, `texts=[paperless_content]`). - ix job id recorded on the `Import` row. A new poller on the mammon side checks `GET /jobs/{id}` until done. - Result is staged (new `pending_headers` table — not `StatementBalance`). A new "Investigate" panel surfaces staged headers with per-field `provenance_verified` + `text_agreement` flags. - User confirms → write to `StatementBalance`. Over time, when a deterministic parser is added for the bank, compare ix's past extractions against the deterministic output to measure ix accuracy. ## 14. Deferred from full spec (explicit) - Kafka transport (§15) - Config Server (§9.1 in full spec, §10 here): use cases are in-repo for MVP - `use_case` as URL or Base64-encoded definition (MVP accepts only registered-name strings) - Azure DI / Computer Vision OCR backends - OpenAI, Anthropic, AWS Bedrock GenAI backends - S3 adapter - `use_vision` + vision scaling/detail - Word-level provenance granularity - `reasoning_effort` parameter routing - Prometheus exporter (/metrics stays JSON for MVP) - OTEL gRPC exporter (spans present, no exporter) - Legacy aliases (`prompt_template_base`, `kwargs_use_case`) - Second-opinion multi-model ensembling - Schema `version` field - Per-request rate limiting - Callback retries (one-shot only for MVP; callers poll as fallback) - Multi-container workers (single worker in MVP; the `FOR UPDATE SKIP LOCKED` claim pattern is ready for horizontal scale when needed) The `quality_metrics` shape retains the reference-spec counters (`fields_with_provenance`, `total_fields`, `coverage_rate`, `invalid_references`) and adds the two MVP counters (`verified_fields`, `text_agreement_fields`). Every deferred item is additive: the `OCRClient` / `GenAIClient` / transport-adapter interfaces already leave the plug points, and the pipeline core is unaware of which implementation is in use. ## 15. Implementation workflow (habit reminder) Every unit of work follows the cross-project habit: 1. `git checkout -b feat/` 2. TDD: write failing test, write code, green, refactor 3. Commit in small logical chunks; update `AGENTS.md` / `README.md` / `docs/` in the same commit as the code 4. `git push forgejo feat/` 5. Create PR via Forgejo API 6. Wait for tests to pass 7. Merge 8. `git push server main` to deploy; run `scripts/e2e_smoke.py` against the live service Never skip hooks, never force-push main, never bypass tests.