Lands the async-friendly Alembic env (NullPool, reads IX_POSTGRES_URL), the
hand-written 001 migration matching the spec's table layout exactly
(CHECK on status, partial index on pending rows, UNIQUE on
(client_id, request_id)), the SQLAlchemy 2.0 ORM mapping, and a lazy
engine/session factory. The factory reads the URL through ix.config when
available; Task 3.2 makes that the only path.
Smoke-tested: alembic upgrade head + downgrade base against a live
postgres:16 produce the expected table shape and tear down cleanly.
Unit tests assert the migration source contains every required column/index
so the migration can't drift from spec at import time.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the five pipeline steps together with FakeOCRClient +
FakeGenAIClient, feeds the committed synthetic_giro.pdf fixture via
file:// URL, and asserts the full response shape.
- scripts/create_fixture_pdf.py: PyMuPDF-based builder. One-page A4 PDF
with six known header strings (bank name, IBAN, period, balances,
statement date). Re-runnable on demand; the committed PDF is what CI
consumes.
- tests/fixtures/synthetic_giro.pdf: committed output.
- tests/unit/test_pipeline_end_to_end.py: 5 tests covering
* ix_result.result fields populated from the fake LLM
* provenance.fields["result.closing_balance"].provenance_verified True
* text_agreement True when Paperless-style texts match the value
* metadata.timings has one entry per step in the right order
* response.error is None and context is not serialised
197 tests total; ruff clean. No integration tests, no real clients,
no network.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final pipeline step. Three mechanical transforms:
1. include_ocr_text -> concatenate non-tag line texts, pages joined
with \n\n, write to ocr_result.result.text.
2. include_geometries=False (default) -> strip ocr_result.result.pages
+ ocr_result.meta_data. Geometries are heavy; callers opt in.
3. Delete response.context so the internal accumulator never leaks to
the caller (belt-and-braces; Field(exclude=True) already does this).
validate() always returns True per spec.
7 unit tests in tests/unit/test_response_handler_step.py cover all
three branches + context-not-in-model_dump check.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Thin wrapper around ix.provenance.apply_reliability_flags. Validate
skips entirely when include_provenance is off OR when no provenance
data was built (text-only request, etc.). Process reads
context.texts + context.use_case_response and lets the verifier mutate
the FieldProvenance entries + fill quality_metrics counters in place.
11 unit tests in tests/unit/test_reliability_step.py cover: validate
skips on flag off / missing provenance, runs otherwise; per-type
flag behaviour (string verified + text_agreement, Literal -> None,
None value -> None, short numeric -> text_agreement None, date with
both sides parsed, IBAN whitespace-insensitive, disagreement -> False);
quality_metrics verified_fields / text_agreement_fields counters.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Assembles the prompt, picks the structured-output schema, calls the
injected GenAIClient, and maps any emitted segment_citations into
response.provenance. Reliability flags stay None here; ReliabilityStep
fills them in Task 2.7.
- System prompt = use_case.system_prompt + (provenance-on) the verbatim
citation instruction from spec §9.2.
- User text = SegmentIndex.to_prompt_text([p1_l0] style) when provenance
is on, else plain OCR flat text + texts joined.
- Response schema = UseCaseResponse directly, or a runtime
create_model("ProvenanceWrappedResponse", result=(UCR, ...),
segment_citations=(list[SegmentCitation], Field(default_factory=list)))
when provenance is on.
- Model = request override -> use-case default.
- Failure modes: httpx / connection / timeout errors -> IX_002_000;
pydantic.ValidationError -> IX_002_001.
- Writes ix_result.result + ix_result.meta_data (model_name +
token_usage); builds response.provenance via
map_segment_refs_to_provenance when provenance is on.
17 unit tests in tests/unit/test_genai_step.py cover validate
(ocr_only skip, empty -> IX_001_000, text-only, ocr-text path), process
happy path, system-prompt shape with/without citation instruction, user
text tagged vs. plain, response schema plain vs. wrapped, provenance
mapping, error mapping (IX_002_000 + IX_002_001), and model selection
(request override + use-case default).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Runs after SetupStep. Dispatches the flat page list to the injected
OCRClient, writes the raw OCRResult onto response.ocr_result, injects
<page file="..." number="..."> open/close tag lines around each page's
content, and builds a SegmentIndex over the non-tag lines when
provenance is on.
Validate follows the spec triad rule:
- include_geometries/include_ocr_text/ocr_only + no files -> IX_000_004
- no files -> False (skip)
- files + (use_ocr or triad) -> True
9 unit tests in tests/unit/test_ocr_step.py cover all three validate
branches, OCRResult written, page tags injected (format + file_index),
SegmentIndex built iff provenance on.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First pipeline step. Validates the request (IX_000_002 on empty context),
normalises every Context.files entry to a FileRef, downloads them in
parallel via asyncio.gather, byte-sniffs MIMEs (IX_000_005 for
unsupported), loads the use-case pair from REGISTRY (IX_001_001 on
miss), and builds the flat pages + page_metadata list on
response_ix.context.
Fetcher / ingestor / MIME detector / tmp_dir / fetch_config all inject
via the constructor so unit tests stay hermetic — production wires the
real ix.ingestion defaults via the app factory.
7 unit tests in tests/unit/test_setup_step.py cover validate errors,
happy path (fetcher + ingestor invoked correctly, context populated,
use_case_name echoed), FileRef headers pass through, unsupported MIME
-> IX_000_005, unknown use case -> IX_001_001, text-only request, and
the _InternalContext type assertion.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the two Protocol-based client contracts the pipeline steps depend on,
plus test-oriented fakes. Real engines (Surya, Ollama) land in Chunk 4.
- ix.ocr.client.OCRClient — runtime_checkable Protocol with async ocr().
- ix.genai.client.GenAIClient — runtime_checkable Protocol with async
invoke(); GenAIInvocationResult + GenAIUsage dataclasses carry the
parsed model, token usage, and model name.
- FakeOCRClient / FakeGenAIClient: return canned results; both expose a
raise_on_call hook for error-path tests.
8 unit tests across tests/unit/test_ocr_fake.py + test_genai_fake.py
confirm protocol conformance, canned-return behaviour, usage/model-name
defaults, and raise_on_call propagation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the transport-agnostic pipeline orchestrator. Each step implements
async validate + process; the runner wraps both in a Timer, writes
per-step entries to response.metadata.timings, and aborts on the first
IXException by writing response.error.
- Step exposes a step_name property (defaults to class name) so tests and
logs label steps consistently.
- Timer is a plain context manager that appends one {step, elapsed_seconds}
entry on exit regardless of whether the body raised, so the timeline
stays reconstructable for failed steps.
- 9 unit tests cover ordering, skip-on-false, IXException in validate vs.
process, timings populated for every executed step, and shared-response
mutation across steps. Non-IX exceptions propagate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the two remaining provenance-subsystem pieces:
mapper.py — map_segment_refs_to_provenance:
- For each LLM SegmentCitation, pick seg-ids per source_type
(`value` vs `value_and_context`), cap at max_sources_per_field,
resolve each via SegmentIndex, track invalid references.
- Resolve field values by dot-path (`result.items[0].name` supported —
`[N]` bracket notation is normalised to `.N` before traversal).
- Skip fields that resolve to zero valid sources (spec §9.4).
- Write quality_metrics with fields_with_provenance / total_fields /
coverage_rate / invalid_references.
verify.py — verify_field + apply_reliability_flags:
- Dispatches per Pydantic field type: date → parse-both-sides compare;
int/float/Decimal → normalize + whole-snippet / numeric-token scan;
IBAN (detected via `iban` in field name) → upper+strip compare;
Literal / None → flags stay None; else string substring.
- _unwrap_optional handles BOTH typing.Union AND types.UnionType so
`Decimal | None` (PEP 604, what get_type_hints emits on 3.12+) resolves
correctly — caught by the integration-style test_writes_flags_and_counters.
- Number comparator scans numeric tokens in the snippet so labels
("Closing balance CHF 1'234.56") don't mask the match.
- apply_reliability_flags mutates the passed ProvenanceData in place and
writes verified_fields / text_agreement_fields to quality_metrics.
Tests cover each comparator, Literal/None skip, short-value skip (strings
and numerics), Decimal via optional union, and end-to-end flag+counter
writing against a Pydantic use-case schema that mirrors bank_statement_header.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure functions the ReliabilityStep will compose to compare extracted values
against OCR snippets (and context.texts). Kept in one module so every rule
is directly unit-testable without pulling in the step ABC.
Highlights:
- `normalize_string`: NFKC + casefold + strip common punctuation (. , : ; !
? () [] {} / \\ ' " `) + collapse whitespace. Substring-compatible.
- `normalize_number`: returns the canonical "[-]DDD.DD" form (always 2dp)
after stripping currency symbols. Heuristic separator detection handles
Swiss-German apostrophes ("1'234.56"), de-DE commas ("1.234,56"), and
plain ASCII ("1234.56" / "1234.5" / "1234"). Accepts native int/float/
Decimal as well as str.
- `normalize_date`: dateutil parse with dayfirst=True → ISO YYYY-MM-DD.
Date and datetime objects short-circuit to their isoformat().
- `normalize_iban`: uppercase + strip whitespace. Format validation is the
call site's job; this is pure canonicalisation.
- `should_skip_text_agreement`: dispatches on type + value. Literal → skip,
None → skip, numeric |v|<10 → skip, len(str) ≤ 2 → skip. Numeric check
runs first so `10` (len("10")==2) is treated on the numeric side
(not skipped) instead of tripping the string length rule.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Builds the ID <-> on-page-anchor map used by both the GenAIStep (to emit the
segment-tagged user message) and the provenance mapper (to resolve LLM-cited
IDs back to bbox/text/file_index).
Design notes:
- `build()` is a classmethod so the pipeline constructs the index in one
place (OCRStep) and passes the constructed instance along in the internal
context. No mutable global state; tests build indexes inline from fake
OCR fixtures.
- Per-page metadata (file_index) arrives via a parallel `list[PageMetadata]`
rather than being smuggled into OCRResult. Keeps segmentation decoupled
from ingestion — the OCR engine legitimately doesn't know which file a
page came from.
- Page-tag lines (`<page …>` / `</page>`) are filtered via a regex so the
LLM can never cite them as provenance. `line_idx_in_page` increments only
for real lines so the IDs stay dense (p1_l0, p1_l1, ...).
- Bounding-box normalisation divides x-coords by page width, y-coords by
page height. Zero dimensions (defensive) pass through unchanged.
- `to_prompt_text(context_texts=[...])` appends paperless-style texts
untagged, separated from the tagged body by a blank line (spec §7.2b).
Deterministic for prompt caching.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First use case lands. The schema is intentionally flat — nine scalar fields,
no nested arrays — because Ollama's structured-output guidance stays most
reliable when the top level has only scalars, and every field we care about
(bank_name, IBAN, period, opening/closing balance) can be rendered as one.
Registration is explicit in `use_cases/__init__.py`, not a side effect of
importing the use-case module. That keeps load order obvious and lets tests
patch the registry without having to reload modules.
`get_use_case(name)` is the one-liner adapters use; it raises
`IX_001_001` with the offending name in `detail` when the lookup misses,
which keeps log-scrape simple.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Completes the data-contract layer. Highlights:
- `ResponseIX.context` is an internal mutable accumulator used by pipeline
steps (pages, files, texts, use_case classes, segment index). It MUST NOT
leak into the serialised response, so we mark the field with
`Field(exclude=True)` and carry the shape in a small `_InternalContext`
sub-model with `extra="allow"` so steps can stash arbitrary state without
schema churn. Tested: `model_dump()` and `model_dump_json()` both drop it.
- `FieldProvenance` gains `provenance_verified: bool | None` and
`text_agreement: bool | None` — the two MVP reliability flags written by
the new ReliabilityStep. Both default None so rows predating the
ReliabilityStep (empty LLM output, cloud-import replay) parse cleanly.
- `quality_metrics` stays a free-form `dict[str, Any]` — the MVP adds
`verified_fields` and `text_agreement_fields` counters without carving
them into the schema, which keeps future metric additions free.
- `Job.status` and `Job.callback_status` are `Literal[...]` so Pydantic
rejects unknown states at the edge. Invariant
(`status='done' iff response.error is None`) stays worker-enforced —
callers sometimes hydrate in-flight rows and we do not want validation
to reject them.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the incoming-request data contracts as Pydantic v2 models. Matches the
MVP spec §3 exactly — fields dropped from the reference spec (use_vision,
reasoning_effort, version, ...) stay out, and `extra="forbid"` catches any
caller that sends them so drift surfaces immediately instead of silently.
Context.files is `list[str | FileRef]`: plain URLs stay str, dict entries
parse as FileRef. This keeps the common case (public URL) one-liner while
still supporting Paperless-style auth headers and per-file size caps.
ix_id stays optional with a docstring warning that callers MUST NOT set it —
the transport layer assigns the 16-char hex handle on insert. The field is
present so `Job` round-trips out of the store.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the single exception type used throughout the pipeline. Every failure
maps to one of the ten IX_* codes from the MVP spec §8 with a stable
machine-readable code and an optional free-form detail. The `str()` form is
log-scrapable with a single regex (`IX_xxx_xxx: <msg> (detail=...)`), so
mammon-side reliability UX can classify failures without brittle string
parsing.
Enum values equal names so callers can serialise either.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands Task 1.1 from the MVP plan: empty-project skeleton so later tasks have somewhere to land. Local tests + ruff pass. CI trigger fix included so feat branches get runs going forward.
The previous python:3.12-slim container lacked node, which actions/checkout@v4
requires. The Forgejo runner's default image includes node + apt + curl, so
we can bootstrap python + uv the same way mammon does.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- pyproject.toml: runtime deps (FastAPI, SQLAlchemy async, Pydantic, PyMuPDF,
python-magic, Pillow, dateutil), dev group (pytest, pytest-asyncio,
pytest-httpx, ruff, mypy), optional `ocr` extra that pulls surya-ocr + torch
(kept optional so CI without GPU can run the base package).
- pytest config: asyncio_mode=auto; `live` marker for tests that need a real
Ollama/Surya (gated on IX_TEST_OLLAMA=1).
- Single smoke test (tests/unit/test_scaffolding.py) verifies the package
imports and exposes __version__ — keeps CI green until the real test
modules land in later chunks.
- .forgejo/workflows/ci.yml: runs ruff + pytest against a Postgres 16 service
container. Explicit IX_TEST_MODE=fake keeps real-client tests out.
- .env.example: every IX_* var from spec §9 with on-prem-friendly defaults.
- uv.lock committed for reproducible builds.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Detailed, TDD-structured plan with 5 chunks covering ~30 feature-branch
tasks from foundation scaffolding through first live deploy + E2E smoke.
Each task is one PR; pipeline core comes hermetic-first, real Surya/Ollama
clients in Chunk 4, containerization + first deploy in Chunk 5.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- FileRef type added so callers (mammon/Paperless) can pass Authorization
headers alongside URLs. context.files is now list[str | FileRef].
- Job lifecycle state machine pinned down, including worker-startup sweep
for rows stuck in 'running' after a crash.
- Explicit IX_002_000 / IX_002_001 codes for Ollama unreachable and
structured-output schema violations, with per-call timeout
IX_GENAI_CALL_TIMEOUT_SECONDS distinct from the per-job timeout.
- IX_000_007 code for file-fetch failures; per-file size, connect, and
read timeouts configurable via env.
- ReliabilityStep: Literal-typed fields and None values explicitly skipped
from provenance verification (with reason); dates parse both sides
before ISO comparison.
- /healthz semantics pinned down (CUDA + Surya loaded; Ollama reachable
AND model available). /metrics window is last 24h.
- (client_id, request_id) is UNIQUE in ix_jobs, matching the idempotency
claim.
- Deploy-failure workflow uses `git revert` forward commit, not
force-push — aligned with AGENTS.md habits.
- Dockerfile / compose require --gpus all. Pre-deploy requires
`ollama pull gpt-oss:20b`; /healthz verifies before deploy completes.
- CI clarified: Forgejo Actions runners are GPU-less and LAN-disconnected;
all inference is stubbed there. Real-Ollama tests behind IX_TEST_OLLAMA=1.
- Fixture redaction stance: synthetic-template PDF committed; real
redacted fixtures live out-of-repo.
- Deferred list picks up use_case URL/Base64, callback retries,
multi-container workers. quality_metrics retains reference-spec counters
plus the two new MVP ones.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Establishes ix as an async, on-prem, LLM-powered structured extraction
microservice. Full reference spec stays in docs/spec-core-pipeline.md;
MVP spec (strict subset — Ollama only, Surya OCR, REST + Postgres-queue
transports in parallel, in-repo use cases, provenance-based reliability
signals) lives at docs/superpowers/specs/2026-04-18-ix-mvp-design.md.
First use case: bank_statement_header (feeds mammon's needs_parser flow).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>