Address spec review — auth, timeouts, lifecycle, error codes

- FileRef type added so callers (mammon/Paperless) can pass Authorization
  headers alongside URLs. context.files is now list[str | FileRef].
- Job lifecycle state machine pinned down, including worker-startup sweep
  for rows stuck in 'running' after a crash.
- Explicit IX_002_000 / IX_002_001 codes for Ollama unreachable and
  structured-output schema violations, with per-call timeout
  IX_GENAI_CALL_TIMEOUT_SECONDS distinct from the per-job timeout.
- IX_000_007 code for file-fetch failures; per-file size, connect, and
  read timeouts configurable via env.
- ReliabilityStep: Literal-typed fields and None values explicitly skipped
  from provenance verification (with reason); dates parse both sides
  before ISO comparison.
- /healthz semantics pinned down (CUDA + Surya loaded; Ollama reachable
  AND model available). /metrics window is last 24h.
- (client_id, request_id) is UNIQUE in ix_jobs, matching the idempotency
  claim.
- Deploy-failure workflow uses `git revert` forward commit, not
  force-push — aligned with AGENTS.md habits.
- Dockerfile / compose require --gpus all. Pre-deploy requires
  `ollama pull gpt-oss:20b`; /healthz verifies before deploy completes.
- CI clarified: Forgejo Actions runners are GPU-less and LAN-disconnected;
  all inference is stubbed there. Real-Ollama tests behind IX_TEST_OLLAMA=1.
- Fixture redaction stance: synthetic-template PDF committed; real
  redacted fixtures live out-of-repo.
- Deferred list picks up use_case URL/Base64, callback retries,
  multi-container workers. quality_metrics retains reference-spec counters
  plus the two new MVP ones.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Dirk Riemann 2026-04-18 10:28:43 +02:00
parent 124403252d
commit 5e007b138d

View file

@ -71,14 +71,20 @@ class RequestIX(BaseModel):
use_case: str # registered name, e.g. "bank_statement_header" use_case: str # registered name, e.g. "bank_statement_header"
ix_client_id: str # caller tag for logs/metrics, e.g. "mammon" ix_client_id: str # caller tag for logs/metrics, e.g. "mammon"
request_id: str # caller's correlation id; echoed back request_id: str # caller's correlation id; echoed back
ix_id: Optional[str] # transport-assigned short hex id ix_id: Optional[str] # caller MUST NOT set; transport assigns a 16-char hex id
context: Context context: Context
options: Options = Options() options: Options = Options()
callback_url: Optional[str] # optional webhook delivery (one-shot, no retry) callback_url: Optional[str] # optional webhook delivery (one-shot, no retry)
class Context(BaseModel): class Context(BaseModel):
files: list[str] = [] # URLs or file:// paths files: list[Union[str, FileRef]] = [] # URLs, file:// paths, or FileRef objects (for auth headers)
texts: list[str] = [] # extra text (e.g. Paperless OCR output) texts: list[str] = [] # extra text (e.g. Paperless OCR output)
class FileRef(BaseModel):
"""Used when a file URL requires auth headers (e.g. Paperless Token auth) or per-file overrides."""
url: str # http(s):// or file://
headers: dict[str, str] = {} # e.g. {"Authorization": "Token …"}
max_bytes: Optional[int] = None # per-file override; defaults to IX_FILE_MAX_BYTES
class Options(BaseModel): class Options(BaseModel):
ocr: OCROptions = OCROptions() ocr: OCROptions = OCROptions()
@ -138,6 +144,34 @@ class Job(BaseModel):
finished_at: Optional[datetime] finished_at: Optional[datetime]
``` ```
### Job lifecycle state machine
```
POST /jobs (or INSERT+NOTIFY)
┌────────┐ worker claims ┌─────────┐ pipeline returns ┌──────┐
│pending │ ─────────────────▶ │ running │ ──────────────────▶ │ done │
└────────┘ └────┬────┘ (response.error └──────┘
▲ │ is None)
│ │
│ pipeline raised / │ response_ix.error set
│ pipeline returned │
│ response_ix.error ▼
│ ┌───────┐
│ │ error │
│ └───────┘
│ worker startup sweep: rows with status='running' AND
│ started_at < now() - 2 × IX_PIPELINE_REQUEST_TIMEOUT_SECONDS
│ are reset to 'pending' and attempts++
└───────────────────────────────────
```
- `status='done'` iff `Job.response.error is None`. Any non-None `error` in the response → `status='error'`. Both terminal states are stable; nothing moves out of them.
- Worker startup sweep protects against "row stuck in `running`" after a crash mid-job. Orphan detection is time-based (2× the per-job timeout), so a still-running worker never reclaims its own job.
- After terminal state, if `callback_url` is set, the worker makes one HTTP POST attempt and records `callback_status` (never changes `status`). Callback failure does not undo the terminal state.
## 4. Job store ## 4. Job store
```sql ```sql
@ -159,7 +193,7 @@ CREATE TABLE ix_jobs (
finished_at TIMESTAMPTZ finished_at TIMESTAMPTZ
); );
CREATE INDEX ix_jobs_status_created ON ix_jobs (status, created_at) WHERE status = 'pending'; CREATE INDEX ix_jobs_status_created ON ix_jobs (status, created_at) WHERE status = 'pending';
CREATE INDEX ix_jobs_client_request ON ix_jobs (client_id, request_id); CREATE UNIQUE INDEX ix_jobs_client_request ON ix_jobs (client_id, request_id);
-- Postgres NOTIFY channel used by the pg_queue_adapter: 'ix_jobs_new' -- Postgres NOTIFY channel used by the pg_queue_adapter: 'ix_jobs_new'
``` ```
@ -171,9 +205,15 @@ Callers that prefer direct SQL (the `pg_queue_adapter` contract): insert a row w
|---|---|---| |---|---|---|
| `POST` | `/jobs` | Body = `RequestIX` (+ optional `callback_url`). → `201 {job_id, ix_id, status: "pending"}`. Idempotent on `(ix_client_id, request_id)` — same pair returns the existing `job_id` with `200`. | | `POST` | `/jobs` | Body = `RequestIX` (+ optional `callback_url`). → `201 {job_id, ix_id, status: "pending"}`. Idempotent on `(ix_client_id, request_id)` — same pair returns the existing `job_id` with `200`. |
| `GET` | `/jobs/{job_id}` | → full `Job`. Source of truth regardless of submission path or callback outcome. | | `GET` | `/jobs/{job_id}` | → full `Job`. Source of truth regardless of submission path or callback outcome. |
| `GET` | `/jobs?client_id=…&request_id=…` | Lookup-by-correlation (caller idempotency helper). Returns latest match or `404`. | | `GET` | `/jobs?client_id=…&request_id=…` | Lookup-by-correlation (caller idempotency helper). The pair is UNIQUE in the table → at most one match. Returns the job or `404`. |
| `GET` | `/healthz` | `{ollama: ok/fail, postgres: ok/fail, ocr: ok/fail}`. Used by `infrastructure` monitoring dashboard. | | `GET` | `/healthz` | `{postgres, ollama, ocr}`. See below for semantics. Used by `infrastructure` monitoring dashboard. |
| `GET` | `/metrics` | Counters: `jobs_pending`, `jobs_running`, `jobs_done_24h`, `jobs_error_24h`, per-use-case avg seconds. Plain JSON, no Prometheus format for MVP. | | `GET` | `/metrics` | Counters over the last 24 hours: `jobs_pending`, `jobs_running`, `jobs_done_24h`, `jobs_error_24h`, per-use-case avg seconds over the same window. Plain JSON, no Prometheus format for MVP. |
**`/healthz` semantics:**
- `postgres`: `SELECT 1` on the job store pool; `ok` iff the query returns within 2 s.
- `ollama`: `GET {IX_OLLAMA_URL}/api/tags` within 5 s; `ok` iff reachable AND the default model (`IX_DEFAULT_MODEL`) is listed in the tags response; `degraded` iff reachable but the model is missing (ops action: run `ollama pull <model>` on the host); `fail` on any other error.
- `ocr`: `SuryaOCRClient.selfcheck()` — returns `ok` iff CUDA is available and the Surya text-recognition model is loaded into GPU memory at process start. `fail` on any error.
- Overall HTTP status: `200` iff all three are `ok`; `503` otherwise. The monitoring dashboard only surfaces `200`/`non-200`.
**Callback delivery** (when `callback_url` is set): one POST of the full `Job` body, 10 s timeout. 2xx → `callback_status='delivered'`. Anything else → `'failed'`. No retry. Callers always have `GET /jobs/{id}` as the authoritative fallback. **Callback delivery** (when `callback_url` is set): one POST of the full `Job` body, 10 s timeout. 2xx → `callback_status='delivered'`. Anything else → `'failed'`. No retry. Callers always have `GET /jobs/{id}` as the authoritative fallback.
@ -186,13 +226,18 @@ Interface per spec §3 (`async validate` + `async process`). Pipeline orchestrat
- **validate**: `request_ix` non-null; `context.files` or `context.texts` non-empty. - **validate**: `request_ix` non-null; `context.files` or `context.texts` non-empty.
- **process**: - **process**:
- Copy `request_ix.context.texts``response_ix.context.texts`. - Copy `request_ix.context.texts``response_ix.context.texts`.
- Download each URL in `context.files` to `/tmp/ix/<ix_id>/` in parallel. MIME detection via `python-magic`. Supported: PDF, PNG, JPEG, TIFF. Unsupported → `IX_000_005`. - Normalize each `context.files` entry: plain `str``FileRef(url=str, headers={})`. `file://` URLs are read locally; `http(s)://` URLs are downloaded with the per-file `headers`.
- Load use case: `request_cls, response_cls = REGISTRY[request_ix.use_case]`. Store instances in `response_ix.context.use_case_request` / `use_case_response`. Echo `use_case_request.use_case_name``response_ix.use_case_name`. - Download files to `/tmp/ix/<ix_id>/` in parallel (asyncio + httpx). Per-file: connect timeout 10 s, read timeout 30 s, size cap `min(FileRef.max_bytes, IX_FILE_MAX_BYTES)` (default 50 MB). Any fetch failure (non-2xx, timeout, size exceeded) → `IX_000_007` with the offending URL and cause in the message. No retry.
- MIME detection via `python-magic` on the downloaded bytes (do not trust URL extension). Supported: PDF (`application/pdf`), PNG (`image/png`), JPEG (`image/jpeg`), TIFF (`image/tiff`). Unsupported → `IX_000_005`.
- Load use case: `entry = REGISTRY.get(request_ix.use_case)`; if `None``IX_001_001`. Store `(use_case_request, use_case_response)` instances in `response_ix.context`. Echo `use_case_request.use_case_name``response_ix.use_case_name`.
- Build flat `response_ix.context.pages`: one entry per PDF page (via PyMuPDF), one per image frame, one per text entry. Hard cap 100 pages/PDF → `IX_000_006` on violation. - Build flat `response_ix.context.pages`: one entry per PDF page (via PyMuPDF), one per image frame, one per text entry. Hard cap 100 pages/PDF → `IX_000_006` on violation.
### OCRStep ### OCRStep
- **validate**: returns `True` iff `(use_ocr or ocr_only or include_geometries or include_ocr_text) and context.files`. Otherwise `False` → step skipped (text-only requests). - **validate**:
- If `(include_geometries or include_ocr_text or ocr_only) and not context.files` → raise `IX_000_004` (the caller asked for OCR artifacts but gave nothing to OCR).
- Else return `True` iff `(use_ocr or include_geometries or include_ocr_text or ocr_only) and context.files`. Otherwise `False` → step skipped (text-only requests).
- If `use_ocr=False` but any of `include_geometries`/`include_ocr_text`/`ocr_only` is set, OCR still runs — the flag triad controls what is *retained*, not whether OCR happens.
- **process**: `ocr_result = await OCRClient.ocr(context.pages)``response_ix.ocr_result`. Always inject `<page file="{item_index}" number="{page_no}">` tags (simplifies grounding). If `include_provenance`: build `SegmentIndex` (line granularity, normalized bboxes 0-1) and store in `context.segment_index`. - **process**: `ocr_result = await OCRClient.ocr(context.pages)``response_ix.ocr_result`. Always inject `<page file="{item_index}" number="{page_no}">` tags (simplifies grounding). If `include_provenance`: build `SegmentIndex` (line granularity, normalized bboxes 0-1) and store in `context.segment_index`.
- **OCRClient interface**: - **OCRClient interface**:
```python ```python
@ -216,7 +261,10 @@ Interface per spec §3 (`async validate` + `async process`). Pipeline orchestrat
class GenAIClient(Protocol): class GenAIClient(Protocol):
async def invoke(self, request_kwargs: dict, response_schema: type[BaseModel]) -> GenAIInvocationResult: ... async def invoke(self, request_kwargs: dict, response_schema: type[BaseModel]) -> GenAIInvocationResult: ...
``` ```
MVP implementation: `OllamaClient``POST http://host.docker.internal:11434/api/chat` with `format = <JSON schema from Pydantic>` (Ollama structured outputs). MVP implementation: `OllamaClient``POST {IX_OLLAMA_URL}/api/chat` with `format = <JSON schema from Pydantic>` (Ollama structured outputs). Per-call timeout: `IX_GENAI_CALL_TIMEOUT_SECONDS` (default 1500 s, distinct from the per-job timeout so a frozen model doesn't eat the full 45-minute budget).
- **Failure modes (no retry on MVP, both surface as pipeline error):**
- Connection refused / timeout / 5xx → `IX_002_000` ("inference backend unavailable") with model name + endpoint.
- 2xx response body cannot be parsed against the Pydantic schema (Ollama structured output violated the schema) → `IX_002_001` ("structured output parse failed") with a snippet of the offending body.
### ReliabilityStep (new; runs when `include_provenance` is True) ### ReliabilityStep (new; runs when `include_provenance` is True)
@ -225,17 +273,22 @@ For each `FieldProvenance` in `response_ix.provenance.fields`:
- **`provenance_verified`**: for each cited segment, compare `text_snippet` to the extracted `value` after normalization (see below). If any cited segment agrees → `True`. Else `False`. - **`provenance_verified`**: for each cited segment, compare `text_snippet` to the extracted `value` after normalization (see below). If any cited segment agrees → `True`. Else `False`.
- **`text_agreement`**: if `request_ix.context.texts` is empty → `None`. Else run the same comparison against the concatenated texts → `True` / `False`. - **`text_agreement`**: if `request_ix.context.texts` is empty → `None`. Else run the same comparison against the concatenated texts → `True` / `False`.
**Normalization rules** (cheap, language-neutral, applied to both sides before `in`-check): **Per-field-type dispatch** (picks the comparator based on the Pydantic field annotation on the use-case response schema):
- Strings: Unicode NFKC, casefold, collapse whitespace, strip common punctuation. | Python type annotation | Comparator |
- Numbers (`int`, `float`, `Decimal` values): digits-and-sign only; strip currency symbols, thousands separators, decimal-separator variants (`.`/`,`); require exact match to 2 decimal places for amounts. |---|---|
- Dates: parse to ISO `YYYY-MM-DD`; compare as strings. Accept common German / Swiss / US formats. | `str` | String normalizer (NFKC, casefold, collapse whitespace, strip common punctuation); substring check |
- IBANs: uppercase, strip spaces. | `int`, `float`, `Decimal` | Digits-and-sign only (strip currency symbols, thousands separators, decimal variants); exact match at 2 decimal places |
- Very short values (≤ 2 chars, or numeric |value| < 10): `text_agreement` skipped (returns `None`) too noisy to be a useful signal. | `date`, `datetime` | Parse *both* sides with `dateutil.parser(dayfirst=True)`; compare as ISO strings |
| IBAN (str with `account_iban`-like names) | Upper-case, strip whitespace; exact match |
| `Literal[...]` | **Skipped** — verification is `None` (caller-controlled enum labels rarely appear verbatim in the source text). `text_agreement` also `None`. |
| `None` / unset value | **Skipped**`provenance_verified = None`, `text_agreement = None`. Field still appears in provenance output. |
**Short-value skip rule** (applies after comparator selection): if the stringified `value` is ≤ 2 chars, or a numeric `|value| < 10`, `text_agreement` is skipped (→ `None`). `provenance_verified` still runs — the bbox-anchored cite is stronger than a global text scan for short values.
Records are mutations to the provenance structure only; does **not** drop fields. Caller sees every extracted field + the flags. Records are mutations to the provenance structure only; does **not** drop fields. Caller sees every extracted field + the flags.
Writes `quality_metrics.verified_fields` and `quality_metrics.text_agreement_fields` summary counts. Writes `quality_metrics.verified_fields` (count where `provenance_verified=True`) and `quality_metrics.text_agreement_fields` (count where `text_agreement=True`) summary counters; fields with `None` flags are not counted as either success or failure.
### ResponseHandlerStep ### ResponseHandlerStep
@ -282,31 +335,38 @@ class Request(BaseModel):
## 8. Errors and warnings ## 8. Errors and warnings
Error-code subset from spec §12.2 (reusing codes as-is where meaning is identical): Error-code set (spec §12.2 subset + MVP-specific codes for on-prem failure modes):
| Code | Trigger | | Code | Trigger |
|---|---| |---|---|
| `IX_000_000` | `request_ix` is None | | `IX_000_000` | `request_ix` is None |
| `IX_000_002` | No context (neither files nor texts) | | `IX_000_002` | No context (neither files nor texts) |
| `IX_000_004` | OCR required but no files provided | | `IX_000_004` | `include_geometries`, `include_ocr_text`, or `ocr_only` set but `context.files` empty |
| `IX_000_005` | File MIME type not supported | | `IX_000_005` | File MIME type not supported (after byte-sniffing) |
| `IX_000_006` | PDF page-count cap exceeded | | `IX_000_006` | PDF page-count cap exceeded |
| `IX_001_000` | `use_case` empty, or extraction context (OCR + texts) empty after setup | | `IX_000_007` | File fetch failed (connect / timeout / non-2xx / size cap exceeded) |
| `IX_001_000` | Extraction context empty after setup (OCR produced nothing AND `context.texts` empty) |
| `IX_001_001` | Use case name not in `REGISTRY` | | `IX_001_001` | Use case name not in `REGISTRY` |
| `IX_002_000` | Inference backend unavailable (Ollama connect / timeout / 5xx) |
| `IX_002_001` | Structured output parse failed (Ollama response body didn't match schema) |
Warnings (non-fatal, appended to `response_ix.warning`): empty OCR result, provenance requested with `use_ocr=False`, unknown model falling back to default. Warnings (non-fatal, appended to `response_ix.warning`): empty OCR result, provenance requested with `use_ocr=False`, requested model unavailable and falling back to `IX_DEFAULT_MODEL`, very short or Literal-typed field skipped during reliability check.
## 9. Configuration (`AppConfig` via `pydantic-settings`) ## 9. Configuration (`AppConfig` via `pydantic-settings`)
| Key env var | Default | Meaning | | Key env var | Default | Meaning |
|---|---|---| |---|---|---|
| `IX_POSTGRES_URL` | `postgresql+asyncpg://infoxtractor:@host.docker.internal:5431/infoxtractor` | Job store | | `IX_POSTGRES_URL` | `postgresql+asyncpg://infoxtractor:<password>@host.docker.internal:5431/infoxtractor` | Job store. Password must be set in `.env`; `.env.example` ships with `<password>` as a placeholder. |
| `IX_OLLAMA_URL` | `http://host.docker.internal:11434` | LLM backend | | `IX_OLLAMA_URL` | `http://host.docker.internal:11434` | LLM backend |
| `IX_DEFAULT_MODEL` | `gpt-oss:20b` | Fallback model | | `IX_DEFAULT_MODEL` | `gpt-oss:20b` | Fallback model |
| `IX_OCR_ENGINE` | `surya` | Adapter selector (only value in MVP) | | `IX_OCR_ENGINE` | `surya` | Adapter selector (only value in MVP) |
| `IX_TMP_DIR` | `/tmp/ix` | Download scratch | | `IX_TMP_DIR` | `/tmp/ix` | Download scratch |
| `IX_PIPELINE_WORKER_CONCURRENCY` | `1` | Worker semaphore cap | | `IX_PIPELINE_WORKER_CONCURRENCY` | `1` | Worker semaphore cap |
| `IX_PIPELINE_REQUEST_TIMEOUT_SECONDS` | `2700` | Per-job timeout (45 min) | | `IX_PIPELINE_REQUEST_TIMEOUT_SECONDS` | `2700` | Per-job timeout (45 min) |
| `IX_GENAI_CALL_TIMEOUT_SECONDS` | `1500` | Per-LLM-call timeout (distinct from per-job) |
| `IX_FILE_MAX_BYTES` | `52428800` | Default per-file download size cap (50 MB) |
| `IX_FILE_CONNECT_TIMEOUT_SECONDS` | `10` | Per-file connect timeout |
| `IX_FILE_READ_TIMEOUT_SECONDS` | `30` | Per-file read timeout |
| `IX_RENDER_MAX_PIXELS_PER_PAGE` | `75000000` | Per-page render cap | | `IX_RENDER_MAX_PIXELS_PER_PAGE` | `75000000` | Per-page render cap |
| `IX_LOG_LEVEL` | `INFO` | | | `IX_LOG_LEVEL` | `INFO` | |
| `IX_CALLBACK_TIMEOUT_SECONDS` | `10` | Webhook POST timeout | | `IX_CALLBACK_TIMEOUT_SECONDS` | `10` | Webhook POST timeout |
@ -323,25 +383,27 @@ No Azure, OpenAI, or AWS variables — those paths do not exist in the codebase.
## 11. Deployment ## 11. Deployment
- Repo: `goldstein/infoxtractor` on Forgejo, plus `server` bare-repo remote with `post-receive` hook mirroring mammon. - Repo: `goldstein/infoxtractor` on Forgejo, plus `server` bare-repo remote with `post-receive` hook mirroring mammon.
- Port 8994 (LAN-only via UFW; not exposed publicly — internal service). - Port 8994 (LAN-only via UFW; not exposed publicly — internal service). No `infrastructure.docs_url` label, no VPS Caddy entry.
- Postgres: new `infoxtractor` database on existing postgis container. - Postgres: new `infoxtractor` database on existing postgis container.
- Ollama reached via `host.docker.internal:11434`. - Ollama reached via `host.docker.internal:11434`.
- Monitoring label: `infrastructure.web_url=http://192.168.68.42:8994`. - Monitoring label: `infrastructure.web_url=http://192.168.68.42:8994`.
- Backup: `backup.enable=true`, `backup.type=postgres`, `backup.name=infoxtractor`. - Backup: `backup.enable=true`, `backup.type=postgres`, `backup.name=infoxtractor`.
- Dockerfile: CUDA-enabled base (`nvidia/cuda:12.4-runtime-ubuntu22.04` + Python 3.12) so Surya can use the 3090. CMD: `alembic upgrade head && uvicorn ix.app:create_app --factory --host 0.0.0.0 --port 8994`. - Dockerfile: CUDA-enabled base (`nvidia/cuda:12.4-runtime-ubuntu22.04` + Python 3.12) so Surya can use the 3090. CMD: `alembic upgrade head && uvicorn ix.app:create_app --factory --host 0.0.0.0 --port 8994`.
- Docker Compose gives the container GPU access: `runtime: nvidia` + a `deploy.resources.reservations` GPU entry (same shape as Immich ML / monitoring). The `docker run` equivalent used by post-receive hooks must include `--gpus all`.
- **Pre-deploy check:** the host must have `gpt-oss:20b` pulled into Ollama before first deploy (`ollama pull gpt-oss:20b`). If the model is missing at startup, `/healthz` returns `503` with `ollama: "degraded"` and the monitoring dashboard surfaces the failure. The `post-receive` hook probes `/healthz` for 60 s after container restart; a `503` that doesn't resolve fails the deploy.
## 12. Testing strategy ## 12. Testing strategy
Strict TDD — each unit is written test-first. Strict TDD — each unit is written test-first.
1. **Unit tests** (fast, hermetic): every `Step`, `SegmentIndex`, provenance-verification normalizers, `OCRClient` contract, `GenAIClient` contract, error mapping. No DB, no Ollama, no network. 1. **Unit tests** (fast, hermetic): every `Step`, `SegmentIndex`, provenance-verification normalizers, `OCRClient` contract, `GenAIClient` contract, error mapping. No DB, no Ollama, no network.
2. **Integration tests** (DB + fakes): pipeline end-to-end with stub `OCRClient` (replays canned OCR results) and stub `GenAIClient` (replays canned LLM JSON). Covers step wiring + transports + job lifecycle + callback success/failure + pg queue notify. Run against a real postgres service container in Forgejo Actions (mammon CI pattern). 2. **Integration tests** (DB + fakes): pipeline end-to-end with stub `OCRClient` (replays canned OCR results) and stub `GenAIClient` (replays canned LLM JSON). Covers step wiring + transports + job lifecycle + callback success/failure + pg queue notify + worker startup orphan-sweep. Run against a real postgres service container in Forgejo Actions (mammon CI pattern). **Forgejo Actions runners have neither GPU nor network access to the LAN Ollama/Surya instances; all inference in CI is stubbed.** Real-Ollama tests are gated behind `IX_TEST_OLLAMA=1` and run only from the Mac.
3. **E2E smoke against deployed app**: `scripts/e2e_smoke.py` on the Mac calls `POST http://192.168.68.42:8994/jobs` with a redacted bank-statement fixture (`tests/fixtures/dkb_giro_2026_03.pdf`), polls `GET /jobs/{id}` until done, asserts: 3. **E2E smoke against deployed app**: `scripts/e2e_smoke.py` on the Mac calls `POST http://192.168.68.42:8994/jobs` with a **synthetic** bank-statement fixture (`tests/fixtures/synthetic_giro.pdf` — generated from a template, no real personal data; a separate redacted-real-statement fixture lives outside git at `~/ix-fixtures/` if needed), polls `GET /jobs/{id}` until done, asserts:
- `status == "done"` - `status == "done"`
- `provenance.fields["result.closing_balance"].provenance_verified is True` - `provenance.fields["result.closing_balance"].provenance_verified is True`
- `text_agreement is True` when Paperless-style texts are submitted - `text_agreement is True` when Paperless-style texts are submitted
- Timings under 60 s - Timings under 60 s
Runs after every `git push server main` as the deploy gate. If it fails, the commit is reverted before merging the deploy PR. Runs after every `git push server main` as the deploy gate. **Deploy-failure workflow:** if the smoke test fails, `git revert HEAD` creates a forward-commit that undoes the change, then push that revert commit to both `forgejo` and `server`. Never force-push to `main`; never rewrite history on deployed commits.
## 13. Mammon integration (sketch — outside this spec's scope, owned by mammon) ## 13. Mammon integration (sketch — outside this spec's scope, owned by mammon)
@ -357,6 +419,7 @@ Belongs in a mammon-side follow-up spec. Captured here only so readers of ix kno
- Kafka transport (§15) - Kafka transport (§15)
- Config Server (§9.1 in full spec, §10 here): use cases are in-repo for MVP - Config Server (§9.1 in full spec, §10 here): use cases are in-repo for MVP
- `use_case` as URL or Base64-encoded definition (MVP accepts only registered-name strings)
- Azure DI / Computer Vision OCR backends - Azure DI / Computer Vision OCR backends
- OpenAI, Anthropic, AWS Bedrock GenAI backends - OpenAI, Anthropic, AWS Bedrock GenAI backends
- S3 adapter - S3 adapter
@ -369,6 +432,10 @@ Belongs in a mammon-side follow-up spec. Captured here only so readers of ix kno
- Second-opinion multi-model ensembling - Second-opinion multi-model ensembling
- Schema `version` field - Schema `version` field
- Per-request rate limiting - Per-request rate limiting
- Callback retries (one-shot only for MVP; callers poll as fallback)
- Multi-container workers (single worker in MVP; the `FOR UPDATE SKIP LOCKED` claim pattern is ready for horizontal scale when needed)
The `quality_metrics` shape retains the reference-spec counters (`fields_with_provenance`, `total_fields`, `coverage_rate`, `invalid_references`) and adds the two MVP counters (`verified_fields`, `text_agreement_fields`).
Every deferred item is additive: the `OCRClient` / `GenAIClient` / transport-adapter interfaces already leave the plug points, and the pipeline core is unaware of which implementation is in use. Every deferred item is additive: the `OCRClient` / `GenAIClient` / transport-adapter interfaces already leave the plug points, and the pipeline core is unaware of which implementation is in use.