Commit graph

57 commits

Author SHA1 Message Date
b737ed7b21 Merge pull request 'feat(ocr): SuryaOCRClient real OCR backend (spec 6.2)' (#25) from feat/surya-client into main
All checks were successful
tests / test (push) Successful in 1m10s
2026-04-18 10:04:41 +00:00
322f6b2b1b feat(ocr): SuryaOCRClient — real OCR backend (spec §6.2)
All checks were successful
tests / test (push) Successful in 1m14s
tests / test (pull_request) Successful in 1m14s
Runs Surya's detection + recognition over PIL images rendered from each
Page's source file (PDFs via PyMuPDF, images via Pillow). Lazy warm_up
so FastAPI lifespan start stays predictable. Deferred Surya/torch
imports keep the base install slim — the heavy deps stay under [ocr].

Extends OCRClient Protocol with optional files + page_metadata kwargs
so the engine can resolve each page back to its on-disk source; Fake
accepts-and-ignores to keep hermetic tests unchanged.

selfcheck() runs the predictors on a 1x1 PIL image — wired into /healthz
by Task 4.3.

Tests: 6 hermetic unit tests (Surya predictors mocked, no model
download); 2 live tests gated on IX_TEST_OLLAMA=1 (never run in CI).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:04:19 +02:00
0f045f814a Merge pull request 'feat(genai): OllamaClient structured-output /api/chat backend (spec 6)' (#24) from feat/ollama-client into main
All checks were successful
tests / test (push) Successful in 1m15s
2026-04-18 09:58:38 +00:00
90e46b707d feat(genai): OllamaClient — structured-output /api/chat backend (spec §6)
All checks were successful
tests / test (push) Successful in 1m10s
tests / test (pull_request) Successful in 1m5s
Real GenAIClient for the production pipeline. Sends `format=<pydantic JSON
schema>`, `stream=false`, and mapped options (`temperature`; drops
`reasoning_effort`). Content-parts lists joined to a single string since
MVP models don't speak native content-parts. Error mapping per spec:
connection/timeout/5xx → IX_002_000, schema violations → IX_002_001.
`selfcheck()` probes /api/tags with a fixed 5 s timeout for /healthz.

Tests: 10 hermetic pytest-httpx unit tests; 2 live tests gated on
IX_TEST_OLLAMA=1 (never run in CI).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:58:15 +02:00
6183b9c886 Merge pull request 'feat(pg-queue): LISTEN ix_jobs_new + 10s fallback poll' (#23) from feat/pg-queue-adapter into main
All checks were successful
tests / test (push) Successful in 1m11s
2026-04-18 09:52:45 +00:00
050f80dcd7 feat(pg-queue): LISTEN ix_jobs_new + 10s fallback poll (spec §4)
All checks were successful
tests / test (push) Successful in 1m8s
tests / test (pull_request) Successful in 1m9s
PgQueueListener:
- Dedicated asyncpg connection outside the SQLAlchemy pool (LISTEN
  needs a persistent connection; pooled connections check in/out).
- Exposes wait_for_work(timeout) — resolves on NOTIFY or timeout,
  whichever fires first. The worker treats both wakes identically.
- asyncpg_dsn_from_sqlalchemy_url strips the +asyncpg driver segment
  and percent-decodes the password so the same URL in IX_POSTGRES_URL
  works for both SQLAlchemy and raw asyncpg.

app.py lifespan now also spawns the listener alongside the worker;
both are gated on spawn_worker=True so REST-only tests stay fast.

2 new integration tests: NOTIFY path (wake within 2 s despite 60 s
poll) + missed-NOTIFY path (fallback poll recovers within 5 s). 33
integration tests total, 209 unit. Forgejo Actions trigger is flaky;
local verification is the gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:52:26 +02:00
415e03fba1 Merge pull request 'feat(worker): async worker loop + one-shot callback delivery' (#22) from feat/worker-loop into main
Some checks are pending
tests / test (push) Waiting to run
2026-04-18 09:50:11 +00:00
406a7ea2fd feat(worker): async worker loop + one-shot callback delivery (spec §5)
All checks were successful
tests / test (push) Successful in 1m15s
tests / test (pull_request) Successful in 1m8s
Worker:
- Startup: sweep_orphans(now, max_running_seconds) rescues rows stuck
  in 'running' from a crashed prior process.
- Loop: claim_next_pending → build pipeline via injected factory → run
  → mark_done/mark_error → deliver callback if set → record outcome.
- Non-IX exceptions from the pipeline collapse to IX_002_000 so callers
  see a stable error code.
- Sleep loop uses a cancellable wait so the stop event reacts
  immediately; the wait_for_work hook is ready for Task 3.6 to plug in
  the LISTEN-driven event without the worker knowing about NOTIFY.

Callback:
- One-shot POST, 2xx → delivered, anything else (incl. connect/timeout
  exceptions) → failed. No retries.
- Callback record never reverts the job's terminal state — GET /jobs/{id}
  stays the authoritative fallback.

7 integration tests: happy path, pipeline-raise → error, callback 2xx,
callback 5xx, orphan sweep on startup, no-callback rows stay
callback_status=None (x2 via parametrize). Unit suite still 209.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:49:54 +02:00
ee023d6e34 Merge pull request 'feat(rest): FastAPI adapter + /jobs /healthz /metrics (spec 5)' (#21) from feat/rest-adapter into main
Some checks failed
tests / test (push) Has been cancelled
2026-04-18 09:47:35 +00:00
e46c44f1e0 feat(rest): FastAPI adapter + /jobs, /healthz, /metrics routes (spec §5)
All checks were successful
tests / test (push) Successful in 1m7s
tests / test (pull_request) Successful in 1m5s
Routes:
- POST /jobs: 201 on first insert, 200 on idempotent re-submit.
- GET /jobs/{id}: full Job envelope or 404.
- GET /jobs?client_id=&request_id=: correlation lookup or 404.
- GET /healthz: {postgres, ollama, ocr}; 200 iff all ok (degraded counts
  as non-200 per spec). Postgres probe guarded by a 2 s wait_for.
- GET /metrics: pending/running counts + 24h done/error counters +
  per-use-case avg seconds. Plain JSON, no Prometheus.

create_app(spawn_worker=bool) parameterises worker spawning so tests that
only need REST pass False. Worker spawn is tolerant of the loop module not
being importable yet (Task 3.5 fills it in).

Probes are a DI bundle — production wiring swaps them in at startup
(Chunk 4); tests inject canned ok/fail callables. Session factory is also
DI'd so tests can point at a per-loop engine and sidestep the async-pg
cross-loop future issue that bit the jobs_repo fixture.

9 new integration tests; unit suite unchanged. Forgejo Actions trigger is
flaky; local verification is the gate (unit + integration green locally).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:47:04 +02:00
04a415a191 Merge pull request 'feat(store): JobsRepo CRUD over ix_jobs + integration fixtures' (#20) from feat/jobs-repo into main
All checks were successful
tests / test (push) Successful in 1m14s
2026-04-18 09:43:28 +00:00
141153ffa7 feat(store): JobsRepo CRUD over ix_jobs + integration fixtures (spec §4)
All checks were successful
tests / test (push) Successful in 1m10s
tests / test (pull_request) Successful in 1m10s
JobsRepo covers the full job-lifecycle surface:

- insert_pending: idempotent on (client_id, request_id) via ON CONFLICT
  DO NOTHING + re-select; assigns a 16-hex ix_id.
- claim_next_pending: FOR UPDATE SKIP LOCKED so concurrent workers never
  double-dispatch a row.
- get / get_by_correlation: hydrates JSONB back through Pydantic.
- mark_done: done iff response.error is None, else error.
- mark_error: explicit convenience wrapper.
- update_callback_status: delivered | failed (no status transition).
- sweep_orphans: time-based rescue of stuck running rows; attempts++.

Integration fixtures (tests/integration/conftest.py):
- Skip cleanly when neither IX_TEST_DATABASE_URL nor IX_POSTGRES_URL is
  set (unit suite stays runnable on a bare laptop).
- Alembic upgrade/downgrade runs in a subprocess so its internal
  asyncio.run() doesn't collide with pytest-asyncio's loop.
- Per-test engine + truncate so loops never cross and tests start clean.

15 integration tests against a live postgres:16, including SKIP LOCKED
concurrency + orphan sweep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:43:11 +02:00
8bb220ae43 Merge pull request 'feat(config): AppConfig + cached get_config()' (#19) from feat/config into main
All checks were successful
tests / test (push) Successful in 59s
2026-04-18 09:39:00 +00:00
95728accbf feat(config): AppConfig + cached get_config() (spec §9)
All checks were successful
tests / test (push) Successful in 1m1s
tests / test (pull_request) Successful in 58s
Typed pydantic-settings view over every IX_* env var, defaults matching
spec §9 exactly. @lru_cache-wrapped accessor so parsing/validation happens
once per process; tests clear the cache via get_config.cache_clear().

extra="ignore" keeps the container robust against typo'd env vars in
production .env files. engine.py's URL resolver now goes through
get_config() when ix.config is importable (bootstrap fallback remains so
hypothetical early-import callers don't crash).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:38:44 +02:00
dc6d28bda1 Merge pull request 'feat(store): Alembic scaffolding + initial ix_jobs migration' (#18) from feat/alembic-init into main
Some checks are pending
tests / test (push) Waiting to run
2026-04-18 09:37:37 +00:00
1c60c30084 feat(store): Alembic scaffolding + initial ix_jobs migration (spec §4)
All checks were successful
tests / test (push) Successful in 1m15s
tests / test (pull_request) Successful in 1m2s
Lands the async-friendly Alembic env (NullPool, reads IX_POSTGRES_URL), the
hand-written 001 migration matching the spec's table layout exactly
(CHECK on status, partial index on pending rows, UNIQUE on
(client_id, request_id)), the SQLAlchemy 2.0 ORM mapping, and a lazy
engine/session factory. The factory reads the URL through ix.config when
available; Task 3.2 makes that the only path.

Smoke-tested: alembic upgrade head + downgrade base against a live
postgres:16 produce the expected table shape and tear down cleanly.
Unit tests assert the migration source contains every required column/index
so the migration can't drift from spec at import time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:37:21 +02:00
a54a968313 Merge pull request 'test(pipeline): end-to-end hermetic test with fakes + synthetic fixture' (#17) from feat/pipeline-e2e-fakes into main
Some checks failed
tests / test (push) Has been cancelled
2026-04-18 09:24:51 +00:00
b109bba873 test(pipeline): end-to-end hermetic test with fakes + synthetic fixture
All checks were successful
tests / test (push) Successful in 59s
tests / test (pull_request) Successful in 57s
Wires the five pipeline steps together with FakeOCRClient +
FakeGenAIClient, feeds the committed synthetic_giro.pdf fixture via
file:// URL, and asserts the full response shape.

- scripts/create_fixture_pdf.py: PyMuPDF-based builder. One-page A4 PDF
  with six known header strings (bank name, IBAN, period, balances,
  statement date). Re-runnable on demand; the committed PDF is what CI
  consumes.
- tests/fixtures/synthetic_giro.pdf: committed output.
- tests/unit/test_pipeline_end_to_end.py: 5 tests covering
  * ix_result.result fields populated from the fake LLM
  * provenance.fields["result.closing_balance"].provenance_verified True
  * text_agreement True when Paperless-style texts match the value
  * metadata.timings has one entry per step in the right order
  * response.error is None and context is not serialised

197 tests total; ruff clean. No integration tests, no real clients,
no network.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:24:29 +02:00
118d77c428 Merge pull request 'feat(pipeline): ResponseHandlerStep (spec §8)' (#16) from feat/step-response-handler into main
Some checks are pending
tests / test (push) Waiting to run
2026-04-18 09:21:50 +00:00
565d8d0676 feat(pipeline): ResponseHandlerStep — shape-up final payload (spec §8)
All checks were successful
tests / test (push) Successful in 1m0s
tests / test (pull_request) Successful in 1m2s
Final pipeline step. Three mechanical transforms:

1. include_ocr_text -> concatenate non-tag line texts, pages joined
   with \n\n, write to ocr_result.result.text.
2. include_geometries=False (default) -> strip ocr_result.result.pages
   + ocr_result.meta_data. Geometries are heavy; callers opt in.
3. Delete response.context so the internal accumulator never leaks to
   the caller (belt-and-braces; Field(exclude=True) already does this).

validate() always returns True per spec.

7 unit tests in tests/unit/test_response_handler_step.py cover all
three branches + context-not-in-model_dump check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:21:36 +02:00
83c1996702 Merge pull request 'feat(pipeline): ReliabilityStep (spec §6)' (#15) from feat/step-reliability into main
Some checks are pending
tests / test (push) Waiting to run
2026-04-18 09:20:38 +00:00
132f110463 feat(pipeline): ReliabilityStep — writes reliability flags (spec §6)
All checks were successful
tests / test (push) Successful in 1m3s
tests / test (pull_request) Successful in 1m1s
Thin wrapper around ix.provenance.apply_reliability_flags. Validate
skips entirely when include_provenance is off OR when no provenance
data was built (text-only request, etc.). Process reads
context.texts + context.use_case_response and lets the verifier mutate
the FieldProvenance entries + fill quality_metrics counters in place.

11 unit tests in tests/unit/test_reliability_step.py cover: validate
skips on flag off / missing provenance, runs otherwise; per-type
flag behaviour (string verified + text_agreement, Literal -> None,
None value -> None, short numeric -> text_agreement None, date with
both sides parsed, IBAN whitespace-insensitive, disagreement -> False);
quality_metrics verified_fields / text_agreement_fields counters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:20:18 +02:00
6d9c239e82 Merge pull request 'feat(pipeline): GenAIStep (spec §6.3, §7, §9.2)' (#14) from feat/step-genai into main
Some checks are pending
tests / test (push) Waiting to run
2026-04-18 09:18:59 +00:00
abee9cea7b feat(pipeline): GenAIStep — LLM call + provenance mapping (spec §6.3, §7, §9.2)
All checks were successful
tests / test (push) Successful in 1m14s
tests / test (pull_request) Successful in 1m10s
Assembles the prompt, picks the structured-output schema, calls the
injected GenAIClient, and maps any emitted segment_citations into
response.provenance. Reliability flags stay None here; ReliabilityStep
fills them in Task 2.7.

- System prompt = use_case.system_prompt + (provenance-on) the verbatim
  citation instruction from spec §9.2.
- User text = SegmentIndex.to_prompt_text([p1_l0] style) when provenance
  is on, else plain OCR flat text + texts joined.
- Response schema = UseCaseResponse directly, or a runtime
  create_model("ProvenanceWrappedResponse", result=(UCR, ...),
  segment_citations=(list[SegmentCitation], Field(default_factory=list)))
  when provenance is on.
- Model = request override -> use-case default.
- Failure modes: httpx / connection / timeout errors -> IX_002_000;
  pydantic.ValidationError -> IX_002_001.
- Writes ix_result.result + ix_result.meta_data (model_name +
  token_usage); builds response.provenance via
  map_segment_refs_to_provenance when provenance is on.

17 unit tests in tests/unit/test_genai_step.py cover validate
(ocr_only skip, empty -> IX_001_000, text-only, ocr-text path), process
happy path, system-prompt shape with/without citation instruction, user
text tagged vs. plain, response schema plain vs. wrapped, provenance
mapping, error mapping (IX_002_000 + IX_002_001), and model selection
(request override + use-case default).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:18:44 +02:00
acb2d55ce3 Merge pull request 'feat(pipeline): OCRStep (spec §6.2)' (#13) from feat/step-ocr into main
Some checks are pending
tests / test (push) Waiting to run
2026-04-18 09:16:04 +00:00
81054baa06 feat(pipeline): OCRStep — run OCR + page tags + SegmentIndex (spec §6.2)
All checks were successful
tests / test (push) Successful in 1m11s
tests / test (pull_request) Successful in 1m13s
Runs after SetupStep. Dispatches the flat page list to the injected
OCRClient, writes the raw OCRResult onto response.ocr_result, injects
<page file="..." number="..."> open/close tag lines around each page's
content, and builds a SegmentIndex over the non-tag lines when
provenance is on.

Validate follows the spec triad rule:
- include_geometries/include_ocr_text/ocr_only + no files -> IX_000_004
- no files -> False (skip)
- files + (use_ocr or triad) -> True

9 unit tests in tests/unit/test_ocr_step.py cover all three validate
branches, OCRResult written, page tags injected (format + file_index),
SegmentIndex built iff provenance on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:15:46 +02:00
632acdcd26 Merge pull request 'feat(pipeline): SetupStep (spec §6.1)' (#12) from feat/step-setup into main
Some checks are pending
tests / test (push) Waiting to run
2026-04-18 09:14:19 +00:00
97aa24f478 feat(pipeline): SetupStep — validate + fetch + MIME + pages (spec §6.1)
All checks were successful
tests / test (push) Successful in 1m13s
tests / test (pull_request) Successful in 1m19s
First pipeline step. Validates the request (IX_000_002 on empty context),
normalises every Context.files entry to a FileRef, downloads them in
parallel via asyncio.gather, byte-sniffs MIMEs (IX_000_005 for
unsupported), loads the use-case pair from REGISTRY (IX_001_001 on
miss), and builds the flat pages + page_metadata list on
response_ix.context.

Fetcher / ingestor / MIME detector / tmp_dir / fetch_config all inject
via the constructor so unit tests stay hermetic — production wires the
real ix.ingestion defaults via the app factory.

7 unit tests in tests/unit/test_setup_step.py cover validate errors,
happy path (fetcher + ingestor invoked correctly, context populated,
use_case_name echoed), FileRef headers pass through, unsupported MIME
-> IX_000_005, unknown use case -> IX_001_001, text-only request, and
the _InternalContext type assertion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:14:04 +02:00
d801038c74 Merge pull request 'feat(ingestion): fetch_file + MIME sniff + DocumentIngestor' (#11) from feat/ingestion into main
Some checks are pending
tests / test (push) Waiting to run
2026-04-18 09:12:19 +00:00
290e51416f feat(ingestion): fetch_file + MIME sniff + DocumentIngestor (spec §6.1)
All checks were successful
tests / test (push) Successful in 57s
tests / test (pull_request) Successful in 1m12s
Three layered modules the SetupStep will wire together in Task 2.4.

- fetch.py: async httpx fetch with configurable timeouts + incremental
  size cap (stream=True, accumulate bytes, raise IX_000_007 when
  exceeded). file:// URLs read locally. Auth headers pass through. The
  caller injects a FetchConfig — env reads happen in ix.config (Chunk 3).
- mime.py: python-magic byte-sniff + SUPPORTED_MIMES frozenset +
  require_supported(mime) helper that raises IX_000_005.
- pages.py: DocumentIngestor.build_pages(files, texts) ->
  (list[Page], list[PageMetadata]). PDFs via PyMuPDF (hard 100 pg/PDF
  cap -> IX_000_006), images via Pillow (multi-frame TIFFs yield
  multiple Pages), texts as zero-dim Pages so GenAIStep can still cite
  them.

21 new unit tests (141 total) cover: fetch success with headers, 4xx/5xx
mapping, timeout -> IX_000_007, size cap enforced globally + per-file,
file:// happy path + missing file, MIME detection for PDF/PNG/JPEG/TIFF,
require_supported gate, PDF/TIFF/text page counts, 101-page PDF ->
IX_000_006, multi-file file_index assignment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:12:00 +02:00
2709fb8d6b Merge pull request 'feat(clients): OCRClient + GenAIClient protocols + fakes' (#10) from feat/client-protocols into main
Some checks are pending
tests / test (push) Waiting to run
2026-04-18 09:08:38 +00:00
118a9abd09 feat(clients): OCRClient + GenAIClient protocols + fakes (spec §6.2, §6.3)
All checks were successful
tests / test (push) Successful in 1m0s
tests / test (pull_request) Successful in 1m1s
Adds the two Protocol-based client contracts the pipeline steps depend on,
plus test-oriented fakes. Real engines (Surya, Ollama) land in Chunk 4.

- ix.ocr.client.OCRClient — runtime_checkable Protocol with async ocr().
- ix.genai.client.GenAIClient — runtime_checkable Protocol with async
  invoke(); GenAIInvocationResult + GenAIUsage dataclasses carry the
  parsed model, token usage, and model name.
- FakeOCRClient / FakeGenAIClient: return canned results; both expose a
  raise_on_call hook for error-path tests.

8 unit tests across tests/unit/test_ocr_fake.py + test_genai_fake.py
confirm protocol conformance, canned-return behaviour, usage/model-name
defaults, and raise_on_call propagation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:08:24 +02:00
1344b9ddb4 Merge pull request 'feat(pipeline): Step ABC + Pipeline runner + Timer' (#9) from feat/pipeline-core into main
Some checks are pending
tests / test (push) Waiting to run
2026-04-18 09:07:09 +00:00
dcd1bc764a feat(pipeline): Step ABC + Pipeline runner + Timer (spec §3, §4)
All checks were successful
tests / test (push) Successful in 56s
tests / test (pull_request) Successful in 1m7s
Adds the transport-agnostic pipeline orchestrator. Each step implements
async validate + process; the runner wraps both in a Timer, writes
per-step entries to response.metadata.timings, and aborts on the first
IXException by writing response.error.

- Step exposes a step_name property (defaults to class name) so tests and
  logs label steps consistently.
- Timer is a plain context manager that appends one {step, elapsed_seconds}
  entry on exit regardless of whether the body raised, so the timeline
  stays reconstructable for failed steps.
- 9 unit tests cover ordering, skip-on-false, IXException in validate vs.
  process, timings populated for every executed step, and shared-response
  mutation across steps. Non-IX exceptions propagate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:06:46 +02:00
b397a80c0b feat(provenance): mapper + verifier (spec §9.4, §6) (#8)
Some checks are pending
tests / test (push) Waiting to run
Provenance mapper and reliability verifier land.
2026-04-18 09:01:35 +00:00
1e340c82fa feat(provenance): mapper + verifier for ReliabilityStep (spec §9.4, §6)
All checks were successful
tests / test (pull_request) Successful in 1m10s
tests / test (push) Successful in 1m11s
Lands the two remaining provenance-subsystem pieces:

mapper.py — map_segment_refs_to_provenance:
- For each LLM SegmentCitation, pick seg-ids per source_type
  (`value` vs `value_and_context`), cap at max_sources_per_field,
  resolve each via SegmentIndex, track invalid references.
- Resolve field values by dot-path (`result.items[0].name` supported —
  `[N]` bracket notation is normalised to `.N` before traversal).
- Skip fields that resolve to zero valid sources (spec §9.4).
- Write quality_metrics with fields_with_provenance / total_fields /
  coverage_rate / invalid_references.

verify.py — verify_field + apply_reliability_flags:
- Dispatches per Pydantic field type: date → parse-both-sides compare;
  int/float/Decimal → normalize + whole-snippet / numeric-token scan;
  IBAN (detected via `iban` in field name) → upper+strip compare;
  Literal / None → flags stay None; else string substring.
- _unwrap_optional handles BOTH typing.Union AND types.UnionType so
  `Decimal | None` (PEP 604, what get_type_hints emits on 3.12+) resolves
  correctly — caught by the integration-style test_writes_flags_and_counters.
- Number comparator scans numeric tokens in the snippet so labels
  ("Closing balance CHF 1'234.56") don't mask the match.
- apply_reliability_flags mutates the passed ProvenanceData in place and
  writes verified_fields / text_agreement_fields to quality_metrics.

Tests cover each comparator, Literal/None skip, short-value skip (strings
and numerics), Decimal via optional union, and end-to-end flag+counter
writing against a Pydantic use-case schema that mirrors bank_statement_header.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:01:19 +02:00
2d22115893 feat(provenance): normalisers + short-value skip rule (spec §6) (#7)
Some checks are pending
tests / test (push) Waiting to run
Normalizer primitives land.
2026-04-18 08:56:45 +00:00
527fc620fe feat(provenance): normalisers + short-value skip rule (spec §6)
All checks were successful
tests / test (pull_request) Successful in 1m0s
tests / test (push) Successful in 1m28s
Pure functions the ReliabilityStep will compose to compare extracted values
against OCR snippets (and context.texts). Kept in one module so every rule
is directly unit-testable without pulling in the step ABC.

Highlights:

- `normalize_string`: NFKC + casefold + strip common punctuation (. , : ; !
  ? () [] {} / \\ ' " `) + collapse whitespace. Substring-compatible.

- `normalize_number`: returns the canonical "[-]DDD.DD" form (always 2dp)
  after stripping currency symbols. Heuristic separator detection handles
  Swiss-German apostrophes ("1'234.56"), de-DE commas ("1.234,56"), and
  plain ASCII ("1234.56" / "1234.5" / "1234"). Accepts native int/float/
  Decimal as well as str.

- `normalize_date`: dateutil parse with dayfirst=True → ISO YYYY-MM-DD.
  Date and datetime objects short-circuit to their isoformat().

- `normalize_iban`: uppercase + strip whitespace. Format validation is the
  call site's job; this is pure canonicalisation.

- `should_skip_text_agreement`: dispatches on type + value. Literal → skip,
  None → skip, numeric |v|<10 → skip, len(str) ≤ 2 → skip. Numeric check
  runs first so `10` (len("10")==2) is treated on the numeric side
  (not skipped) instead of tripping the string length rule.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 10:56:31 +02:00
b2ff27c1ca feat(segmentation): SegmentIndex + prompt-text formatter (spec §9.1) (#6)
Some checks are pending
tests / test (push) Waiting to run
SegmentIndex lands.
2026-04-18 08:54:02 +00:00
1321d57354 feat(segmentation): SegmentIndex + prompt-text formatter (spec §9.1)
All checks were successful
tests / test (push) Successful in 58s
tests / test (pull_request) Successful in 56s
Builds the ID <-> on-page-anchor map used by both the GenAIStep (to emit the
segment-tagged user message) and the provenance mapper (to resolve LLM-cited
IDs back to bbox/text/file_index).

Design notes:

- `build()` is a classmethod so the pipeline constructs the index in one
  place (OCRStep) and passes the constructed instance along in the internal
  context. No mutable global state; tests build indexes inline from fake
  OCR fixtures.

- Per-page metadata (file_index) arrives via a parallel `list[PageMetadata]`
  rather than being smuggled into OCRResult. Keeps segmentation decoupled
  from ingestion — the OCR engine legitimately doesn't know which file a
  page came from.

- Page-tag lines (`<page …>` / `</page>`) are filtered via a regex so the
  LLM can never cite them as provenance. `line_idx_in_page` increments only
  for real lines so the IDs stay dense (p1_l0, p1_l1, ...).

- Bounding-box normalisation divides x-coords by page width, y-coords by
  page height. Zero dimensions (defensive) pass through unchanged.

- `to_prompt_text(context_texts=[...])` appends paperless-style texts
  untagged, separated from the tagged body by a blank line (spec §7.2b).
  Deterministic for prompt caching.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 10:53:46 +02:00
810979e416 feat(use_cases): registry + bank_statement_header (spec §7) (#5)
Some checks are pending
tests / test (push) Waiting to run
First use case lands.
2026-04-18 08:51:58 +00:00
b80c7952f7 feat(use_cases): registry + bank_statement_header (spec §7)
All checks were successful
tests / test (pull_request) Successful in 1m0s
tests / test (push) Successful in 58s
First use case lands. The schema is intentionally flat — nine scalar fields,
no nested arrays — because Ollama's structured-output guidance stays most
reliable when the top level has only scalars, and every field we care about
(bank_name, IBAN, period, opening/closing balance) can be rendered as one.

Registration is explicit in `use_cases/__init__.py`, not a side effect of
importing the use-case module. That keeps load order obvious and lets tests
patch the registry without having to reload modules.

`get_use_case(name)` is the one-liner adapters use; it raises
`IX_001_001` with the offending name in `detail` when the lookup misses,
which keeps log-scrape simple.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 10:51:43 +02:00
230068e484 feat(contracts): ResponseIX + Provenance + Job (spec §3, §9.3) (#4)
Some checks are pending
tests / test (push) Waiting to run
Lands the outgoing-response data contracts.
2026-04-18 08:50:37 +00:00
02db3b05cc feat(contracts): ResponseIX + Provenance + Job envelope (spec §3, §9.3)
All checks were successful
tests / test (push) Successful in 1m2s
tests / test (pull_request) Successful in 1m0s
Completes the data-contract layer. Highlights:

- `ResponseIX.context` is an internal mutable accumulator used by pipeline
  steps (pages, files, texts, use_case classes, segment index). It MUST NOT
  leak into the serialised response, so we mark the field with
  `Field(exclude=True)` and carry the shape in a small `_InternalContext`
  sub-model with `extra="allow"` so steps can stash arbitrary state without
  schema churn. Tested: `model_dump()` and `model_dump_json()` both drop it.

- `FieldProvenance` gains `provenance_verified: bool | None` and
  `text_agreement: bool | None` — the two MVP reliability flags written by
  the new ReliabilityStep. Both default None so rows predating the
  ReliabilityStep (empty LLM output, cloud-import replay) parse cleanly.

- `quality_metrics` stays a free-form `dict[str, Any]` — the MVP adds
  `verified_fields` and `text_agreement_fields` counters without carving
  them into the schema, which keeps future metric additions free.

- `Job.status` and `Job.callback_status` are `Literal[...]` so Pydantic
  rejects unknown states at the edge. Invariant
  (`status='done' iff response.error is None`) stays worker-enforced —
  callers sometimes hydrate in-flight rows and we do not want validation
  to reject them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 10:50:22 +02:00
5990218172 feat(contracts): RequestIX + Context + Options (spec §3) (#3)
Some checks are pending
tests / test (push) Waiting to run
Lands the incoming-request Pydantic v2 contracts.
2026-04-18 08:47:47 +00:00
181cc0fbea feat(contracts): RequestIX + Context + Options per spec §3
All checks were successful
tests / test (push) Successful in 1m2s
tests / test (pull_request) Successful in 1m6s
Adds the incoming-request data contracts as Pydantic v2 models. Matches the
MVP spec §3 exactly — fields dropped from the reference spec (use_vision,
reasoning_effort, version, ...) stay out, and `extra="forbid"` catches any
caller that sends them so drift surfaces immediately instead of silently.

Context.files is `list[str | FileRef]`: plain URLs stay str, dict entries
parse as FileRef. This keeps the common case (public URL) one-liner while
still supporting Paperless-style auth headers and per-file size caps.

ix_id stays optional with a docstring warning that callers MUST NOT set it —
the transport layer assigns the 16-char hex handle on insert. The field is
present so `Job` round-trips out of the store.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 10:47:31 +02:00
ebdba99d9f feat(errors): IXException + IXErrorCode (spec §8) (#2)
Some checks are pending
tests / test (push) Waiting to run
Lands the single exception type and ten IX_* codes used throughout the pipeline.
2026-04-18 08:46:19 +00:00
ae595c937a feat(errors): add IXException + IXErrorCode per spec §8
All checks were successful
tests / test (push) Successful in 1m2s
tests / test (pull_request) Successful in 59s
Adds the single exception type used throughout the pipeline. Every failure
maps to one of the ten IX_* codes from the MVP spec §8 with a stable
machine-readable code and an optional free-form detail. The `str()` form is
log-scrapable with a single regex (`IX_xxx_xxx: <msg> (detail=...)`), so
mammon-side reliability UX can classify failures without brittle string
parsing.

Enum values equal names so callers can serialise either.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 10:46:01 +02:00
663cb4ae10 feat(scaffold): project skeleton with uv + pytest + forgejo CI (#1)
Some checks are pending
tests / test (push) Waiting to run
Lands Task 1.1 from the MVP plan: empty-project skeleton so later tasks have somewhere to land. Local tests + ruff pass. CI trigger fix included so feat branches get runs going forward.
2026-04-18 08:42:56 +00:00
4120d106aa ci: trigger re-run
All checks were successful
tests / test (push) Successful in 1m0s
tests / test (pull_request) Successful in 57s
2026-04-18 10:41:57 +02:00