infoxtractor

Author	SHA1	Message	Date
Dirk Riemann	81e3b9a7d0	fix(genai): drop Ollama format flag; extract trailing JSON from response All checks were successful tests / test (push) Successful in 1m30s Details tests / test (pull_request) Successful in 1m21s Details qwen3:14b (and deepseek-r1, other reasoning models) wrap their output in <think>…</think> chains-of-thought before emitting real output. With format=json the constrained sampler terminated immediately at `{}` because the thinking block wasn't valid JSON; without format the model thinks normally and appends the actual JSON at the end. OllamaClient now omits the format flag and extracts the outermost balanced `{…}` block from the response (brace depth counter, string- literal aware). Works for reasoning models, ```json``` code-fenced outputs, and plain JSON alike. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 14:05:28 +02:00
Dirk Riemann	34f8268cd5	fix(genai): inject JSON schema into Ollama system prompt All checks were successful tests / test (push) Successful in 1m8s Details tests / test (pull_request) Successful in 1m18s Details format=json loose mode gives valid JSON but no shape — models default to emitting {} when the system prompt doesn't list fields. Prepend a schema-guidance system message with the full Pydantic schema (after the existing null-branch sanitiser) so the model sees exactly what shape to produce. Pydantic still validates on parse. Unit tests updated to check the schema message is prepended without disturbing the caller's own messages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 14:02:25 +02:00
Dirk Riemann	2efc4d1088	fix(genai): send format="json" (loose mode) to Ollama All checks were successful tests / test (push) Successful in 1m13s Details tests / test (pull_request) Successful in 1m23s Details Ollama 0.11.8 segfaults on any Pydantic-shaped structured-output schema with $ref, anyOf, or pattern — confirmed on the deploy host with the simplest MVP case (BankStatementHeader alone). The earlier null-stripping sanitiser wasn't enough. Switch to format="json", which is "emit valid JSON" mode. We're already describing the exact JSON shape in the system prompt (via GenAIStep + the use case's citation instruction appendix) and validating the response body through Pydantic on parse — which raises IX_002_001 on schema mismatch, exactly as before. Stronger guarantees can come back later via a newer Ollama, an API fix, or a different GenAIClient impl. None of that is needed for the MVP to work end to end. Unit tests: the sanitiser left in place (harmless, still tested). The "happy path" test now asserts format == "json". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 13:59:04 +02:00
Dirk Riemann	9cb62d69af	fix(genai): strip null branches from anyOf before sending to Ollama All checks were successful tests / test (push) Successful in 1m33s Details tests / test (pull_request) Successful in 4m29s Details Ollama 0.11.8's llama.cpp structured-output implementation segfaults on Pydantic v2's standard Optional pattern: {"anyOf": [{"type": "string"}, {"type": "null"}]} Confirmed on the deploy host: /api/chat request with the MVP's ProvenanceWrappedResponse schema crashed Ollama with SIGSEGV; the client saw httpx RemoteProtocolError → IX_002_000. New _sanitise_schema_for_ollama walks the schema recursively and drops "type: null" branches from every anyOf. Single-branch unions are inlined so sibling keys (default, title) survive. This only narrows what the LLM is told it may emit; Pydantic still validates the real response body against the original schema and accepts None for Optional fields if they were absent or explicitly null. Existing unit tests updated: the "happy path" test no longer pins the format to `_Schema.model_json_schema()` verbatim — instead it asserts the sanitisation effect on a known-Optional field. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 13:48:26 +02:00
Dirk Riemann	c7dc40c51e	fix(deploy): switch to network_mode: host — reach postgis + ollama on loopback All checks were successful tests / test (push) Successful in 1m12s Details tests / test (pull_request) Successful in 1m10s Details The shared postgis container is bound to 127.0.0.1 on the host (security hardening, infrastructure §T12). Ollama is similarly LAN-hardened. The previous `host.docker.internal + extra_hosts: host-gateway` approach points at the bridge gateway IP, not loopback, so the container couldn't reach either service. Switch to `network_mode: host` (same pattern goldstein uses) and update the default IX_POSTGRES_URL / IX_OLLAMA_URL to 127.0.0.1. Keep the GPU reservation block; drop the now-meaningless ports: declaration (host mode publishes directly). AppConfig defaults + .env.example + test_config assertions + inline docstring examples all follow. Caught on fourth deploy attempt. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 13:00:02 +02:00
Dirk Riemann	5ee74f367c	chore(model): switch default IX_DEFAULT_MODEL to qwen3:14b (already on host) All checks were successful tests / test (push) Successful in 1m52s Details tests / test (pull_request) Successful in 1m45s Details The home server's Ollama doesn't have gpt-oss:20b pulled; qwen3:14b is already there and is what mammon's chat agent uses. Switching the default now so the first deploy passes the /healthz ollama probe without an extra `ollama pull` step. The spec lists gpt-oss:20b as a concrete example; qwen3:14b is equally on-prem and Ollama-structured-output-compatible. Touched: AppConfig default, BankStatementHeader Request.default_model, .env.example, setup_server.sh ollama-list check, AGENTS.md, deployment.md, live tests. Unit tests that hard-coded the old model string but don't assert the default were left alone. Also: ASCII en-dash in e2e_smoke.py Paperless-style text (ruff RUF001). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 12:20:23 +02:00
Dirk Riemann	ebefee4184	feat(app): production wiring — factories, pipeline, /healthz real probes All checks were successful tests / test (push) Successful in 1m9s Details tests / test (pull_request) Successful in 1m13s Details Task 4.3 closes the loop on Chunk 4: the FastAPI lifespan now selects fake vs real clients via IX_TEST_MODE (new AppConfig field), wires /healthz probes to the live selfcheck() on OllamaClient / SuryaOCRClient, and spawns the worker with a production Pipeline factory that builds SetupStep -> OCRStep -> GenAIStep -> ReliabilityStep -> ResponseHandler over the injected clients. Factories: - make_genai_client(cfg) -> FakeGenAIClient \| OllamaClient - make_ocr_client(cfg) -> FakeOCRClient \| SuryaOCRClient (spec §6.2) Probes run the async selfcheck on a fresh event loop in a short-lived thread so they're safe to call from either sync callers or a live FastAPI handler without stalling the request loop. Drops the worker-loop spawn_worker_task stub — the app module owns the production spawn directly. Tests: +11 unit tests (5 factories + 6 app-wiring / probe adapter / pipeline build). Full suite: 236 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 12:09:11 +02:00
Dirk Riemann	322f6b2b1b	feat(ocr): SuryaOCRClient — real OCR backend (spec §6.2) All checks were successful tests / test (push) Successful in 1m14s Details tests / test (pull_request) Successful in 1m14s Details Runs Surya's detection + recognition over PIL images rendered from each Page's source file (PDFs via PyMuPDF, images via Pillow). Lazy warm_up so FastAPI lifespan start stays predictable. Deferred Surya/torch imports keep the base install slim — the heavy deps stay under [ocr]. Extends OCRClient Protocol with optional files + page_metadata kwargs so the engine can resolve each page back to its on-disk source; Fake accepts-and-ignores to keep hermetic tests unchanged. selfcheck() runs the predictors on a 1x1 PIL image — wired into /healthz by Task 4.3. Tests: 6 hermetic unit tests (Surya predictors mocked, no model download); 2 live tests gated on IX_TEST_OLLAMA=1 (never run in CI). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 12:04:19 +02:00
Dirk Riemann	90e46b707d	feat(genai): OllamaClient — structured-output /api/chat backend (spec §6) All checks were successful tests / test (push) Successful in 1m10s Details tests / test (pull_request) Successful in 1m5s Details Real GenAIClient for the production pipeline. Sends `format=<pydantic JSON schema>`, `stream=false`, and mapped options (`temperature`; drops `reasoning_effort`). Content-parts lists joined to a single string since MVP models don't speak native content-parts. Error mapping per spec: connection/timeout/5xx → IX_002_000, schema violations → IX_002_001. `selfcheck()` probes /api/tags with a fixed 5 s timeout for /healthz. Tests: 10 hermetic pytest-httpx unit tests; 2 live tests gated on IX_TEST_OLLAMA=1 (never run in CI). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 11:58:15 +02:00
Dirk Riemann	050f80dcd7	feat(pg-queue): LISTEN ix_jobs_new + 10s fallback poll (spec §4) All checks were successful tests / test (push) Successful in 1m8s Details tests / test (pull_request) Successful in 1m9s Details PgQueueListener: - Dedicated asyncpg connection outside the SQLAlchemy pool (LISTEN needs a persistent connection; pooled connections check in/out). - Exposes wait_for_work(timeout) — resolves on NOTIFY or timeout, whichever fires first. The worker treats both wakes identically. - asyncpg_dsn_from_sqlalchemy_url strips the +asyncpg driver segment and percent-decodes the password so the same URL in IX_POSTGRES_URL works for both SQLAlchemy and raw asyncpg. app.py lifespan now also spawns the listener alongside the worker; both are gated on spawn_worker=True so REST-only tests stay fast. 2 new integration tests: NOTIFY path (wake within 2 s despite 60 s poll) + missed-NOTIFY path (fallback poll recovers within 5 s). 33 integration tests total, 209 unit. Forgejo Actions trigger is flaky; local verification is the gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 11:52:26 +02:00
Dirk Riemann	406a7ea2fd	feat(worker): async worker loop + one-shot callback delivery (spec §5) All checks were successful tests / test (push) Successful in 1m15s Details tests / test (pull_request) Successful in 1m8s Details Worker: - Startup: sweep_orphans(now, max_running_seconds) rescues rows stuck in 'running' from a crashed prior process. - Loop: claim_next_pending → build pipeline via injected factory → run → mark_done/mark_error → deliver callback if set → record outcome. - Non-IX exceptions from the pipeline collapse to IX_002_000 so callers see a stable error code. - Sleep loop uses a cancellable wait so the stop event reacts immediately; the wait_for_work hook is ready for Task 3.6 to plug in the LISTEN-driven event without the worker knowing about NOTIFY. Callback: - One-shot POST, 2xx → delivered, anything else (incl. connect/timeout exceptions) → failed. No retries. - Callback record never reverts the job's terminal state — GET /jobs/{id} stays the authoritative fallback. 7 integration tests: happy path, pipeline-raise → error, callback 2xx, callback 5xx, orphan sweep on startup, no-callback rows stay callback_status=None (x2 via parametrize). Unit suite still 209. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 11:49:54 +02:00
Dirk Riemann	e46c44f1e0	feat(rest): FastAPI adapter + /jobs, /healthz, /metrics routes (spec §5) All checks were successful tests / test (push) Successful in 1m7s Details tests / test (pull_request) Successful in 1m5s Details Routes: - POST /jobs: 201 on first insert, 200 on idempotent re-submit. - GET /jobs/{id}: full Job envelope or 404. - GET /jobs?client_id=&request_id=: correlation lookup or 404. - GET /healthz: {postgres, ollama, ocr}; 200 iff all ok (degraded counts as non-200 per spec). Postgres probe guarded by a 2 s wait_for. - GET /metrics: pending/running counts + 24h done/error counters + per-use-case avg seconds. Plain JSON, no Prometheus. create_app(spawn_worker=bool) parameterises worker spawning so tests that only need REST pass False. Worker spawn is tolerant of the loop module not being importable yet (Task 3.5 fills it in). Probes are a DI bundle — production wiring swaps them in at startup (Chunk 4); tests inject canned ok/fail callables. Session factory is also DI'd so tests can point at a per-loop engine and sidestep the async-pg cross-loop future issue that bit the jobs_repo fixture. 9 new integration tests; unit suite unchanged. Forgejo Actions trigger is flaky; local verification is the gate (unit + integration green locally). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 11:47:04 +02:00
Dirk Riemann	141153ffa7	feat(store): JobsRepo CRUD over ix_jobs + integration fixtures (spec §4) All checks were successful tests / test (push) Successful in 1m10s Details tests / test (pull_request) Successful in 1m10s Details JobsRepo covers the full job-lifecycle surface: - insert_pending: idempotent on (client_id, request_id) via ON CONFLICT DO NOTHING + re-select; assigns a 16-hex ix_id. - claim_next_pending: FOR UPDATE SKIP LOCKED so concurrent workers never double-dispatch a row. - get / get_by_correlation: hydrates JSONB back through Pydantic. - mark_done: done iff response.error is None, else error. - mark_error: explicit convenience wrapper. - update_callback_status: delivered \| failed (no status transition). - sweep_orphans: time-based rescue of stuck running rows; attempts++. Integration fixtures (tests/integration/conftest.py): - Skip cleanly when neither IX_TEST_DATABASE_URL nor IX_POSTGRES_URL is set (unit suite stays runnable on a bare laptop). - Alembic upgrade/downgrade runs in a subprocess so its internal asyncio.run() doesn't collide with pytest-asyncio's loop. - Per-test engine + truncate so loops never cross and tests start clean. 15 integration tests against a live postgres:16, including SKIP LOCKED concurrency + orphan sweep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 11:43:11 +02:00
Dirk Riemann	95728accbf	feat(config): AppConfig + cached get_config() (spec §9) All checks were successful tests / test (push) Successful in 1m1s Details tests / test (pull_request) Successful in 58s Details Typed pydantic-settings view over every IX_* env var, defaults matching spec §9 exactly. @lru_cache-wrapped accessor so parsing/validation happens once per process; tests clear the cache via get_config.cache_clear(). extra="ignore" keeps the container robust against typo'd env vars in production .env files. engine.py's URL resolver now goes through get_config() when ix.config is importable (bootstrap fallback remains so hypothetical early-import callers don't crash). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 11:38:44 +02:00
Dirk Riemann	1c60c30084	feat(store): Alembic scaffolding + initial ix_jobs migration (spec §4) All checks were successful tests / test (push) Successful in 1m15s Details tests / test (pull_request) Successful in 1m2s Details Lands the async-friendly Alembic env (NullPool, reads IX_POSTGRES_URL), the hand-written 001 migration matching the spec's table layout exactly (CHECK on status, partial index on pending rows, UNIQUE on (client_id, request_id)), the SQLAlchemy 2.0 ORM mapping, and a lazy engine/session factory. The factory reads the URL through ix.config when available; Task 3.2 makes that the only path. Smoke-tested: alembic upgrade head + downgrade base against a live postgres:16 produce the expected table shape and tear down cleanly. Unit tests assert the migration source contains every required column/index so the migration can't drift from spec at import time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 11:37:21 +02:00
Dirk Riemann	b109bba873	test(pipeline): end-to-end hermetic test with fakes + synthetic fixture All checks were successful tests / test (push) Successful in 59s Details tests / test (pull_request) Successful in 57s Details Wires the five pipeline steps together with FakeOCRClient + FakeGenAIClient, feeds the committed synthetic_giro.pdf fixture via file:// URL, and asserts the full response shape. - scripts/create_fixture_pdf.py: PyMuPDF-based builder. One-page A4 PDF with six known header strings (bank name, IBAN, period, balances, statement date). Re-runnable on demand; the committed PDF is what CI consumes. - tests/fixtures/synthetic_giro.pdf: committed output. - tests/unit/test_pipeline_end_to_end.py: 5 tests covering * ix_result.result fields populated from the fake LLM * provenance.fields["result.closing_balance"].provenance_verified True * text_agreement True when Paperless-style texts match the value * metadata.timings has one entry per step in the right order * response.error is None and context is not serialised 197 tests total; ruff clean. No integration tests, no real clients, no network. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 11:24:29 +02:00
Dirk Riemann	565d8d0676	feat(pipeline): ResponseHandlerStep — shape-up final payload (spec §8) All checks were successful tests / test (push) Successful in 1m0s Details tests / test (pull_request) Successful in 1m2s Details Final pipeline step. Three mechanical transforms: 1. include_ocr_text -> concatenate non-tag line texts, pages joined with \n\n, write to ocr_result.result.text. 2. include_geometries=False (default) -> strip ocr_result.result.pages + ocr_result.meta_data. Geometries are heavy; callers opt in. 3. Delete response.context so the internal accumulator never leaks to the caller (belt-and-braces; Field(exclude=True) already does this). validate() always returns True per spec. 7 unit tests in tests/unit/test_response_handler_step.py cover all three branches + context-not-in-model_dump check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 11:21:36 +02:00
Dirk Riemann	132f110463	feat(pipeline): ReliabilityStep — writes reliability flags (spec §6) All checks were successful tests / test (push) Successful in 1m3s Details tests / test (pull_request) Successful in 1m1s Details Thin wrapper around ix.provenance.apply_reliability_flags. Validate skips entirely when include_provenance is off OR when no provenance data was built (text-only request, etc.). Process reads context.texts + context.use_case_response and lets the verifier mutate the FieldProvenance entries + fill quality_metrics counters in place. 11 unit tests in tests/unit/test_reliability_step.py cover: validate skips on flag off / missing provenance, runs otherwise; per-type flag behaviour (string verified + text_agreement, Literal -> None, None value -> None, short numeric -> text_agreement None, date with both sides parsed, IBAN whitespace-insensitive, disagreement -> False); quality_metrics verified_fields / text_agreement_fields counters. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 11:20:18 +02:00
Dirk Riemann	abee9cea7b	feat(pipeline): GenAIStep — LLM call + provenance mapping (spec §6.3, §7, §9.2) All checks were successful tests / test (push) Successful in 1m14s Details tests / test (pull_request) Successful in 1m10s Details Assembles the prompt, picks the structured-output schema, calls the injected GenAIClient, and maps any emitted segment_citations into response.provenance. Reliability flags stay None here; ReliabilityStep fills them in Task 2.7. - System prompt = use_case.system_prompt + (provenance-on) the verbatim citation instruction from spec §9.2. - User text = SegmentIndex.to_prompt_text([p1_l0] style) when provenance is on, else plain OCR flat text + texts joined. - Response schema = UseCaseResponse directly, or a runtime create_model("ProvenanceWrappedResponse", result=(UCR, ...), segment_citations=(list[SegmentCitation], Field(default_factory=list))) when provenance is on. - Model = request override -> use-case default. - Failure modes: httpx / connection / timeout errors -> IX_002_000; pydantic.ValidationError -> IX_002_001. - Writes ix_result.result + ix_result.meta_data (model_name + token_usage); builds response.provenance via map_segment_refs_to_provenance when provenance is on. 17 unit tests in tests/unit/test_genai_step.py cover validate (ocr_only skip, empty -> IX_001_000, text-only, ocr-text path), process happy path, system-prompt shape with/without citation instruction, user text tagged vs. plain, response schema plain vs. wrapped, provenance mapping, error mapping (IX_002_000 + IX_002_001), and model selection (request override + use-case default). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 11:18:44 +02:00
Dirk Riemann	81054baa06	feat(pipeline): OCRStep — run OCR + page tags + SegmentIndex (spec §6.2) All checks were successful tests / test (push) Successful in 1m11s Details tests / test (pull_request) Successful in 1m13s Details Runs after SetupStep. Dispatches the flat page list to the injected OCRClient, writes the raw OCRResult onto response.ocr_result, injects <page file="..." number="..."> open/close tag lines around each page's content, and builds a SegmentIndex over the non-tag lines when provenance is on. Validate follows the spec triad rule: - include_geometries/include_ocr_text/ocr_only + no files -> IX_000_004 - no files -> False (skip) - files + (use_ocr or triad) -> True 9 unit tests in tests/unit/test_ocr_step.py cover all three validate branches, OCRResult written, page tags injected (format + file_index), SegmentIndex built iff provenance on. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 11:15:46 +02:00
Dirk Riemann	97aa24f478	feat(pipeline): SetupStep — validate + fetch + MIME + pages (spec §6.1) All checks were successful tests / test (push) Successful in 1m13s Details tests / test (pull_request) Successful in 1m19s Details First pipeline step. Validates the request (IX_000_002 on empty context), normalises every Context.files entry to a FileRef, downloads them in parallel via asyncio.gather, byte-sniffs MIMEs (IX_000_005 for unsupported), loads the use-case pair from REGISTRY (IX_001_001 on miss), and builds the flat pages + page_metadata list on response_ix.context. Fetcher / ingestor / MIME detector / tmp_dir / fetch_config all inject via the constructor so unit tests stay hermetic — production wires the real ix.ingestion defaults via the app factory. 7 unit tests in tests/unit/test_setup_step.py cover validate errors, happy path (fetcher + ingestor invoked correctly, context populated, use_case_name echoed), FileRef headers pass through, unsupported MIME -> IX_000_005, unknown use case -> IX_001_001, text-only request, and the _InternalContext type assertion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 11:14:04 +02:00
Dirk Riemann	290e51416f	feat(ingestion): fetch_file + MIME sniff + DocumentIngestor (spec §6.1) All checks were successful tests / test (push) Successful in 57s Details tests / test (pull_request) Successful in 1m12s Details Three layered modules the SetupStep will wire together in Task 2.4. - fetch.py: async httpx fetch with configurable timeouts + incremental size cap (stream=True, accumulate bytes, raise IX_000_007 when exceeded). file:// URLs read locally. Auth headers pass through. The caller injects a FetchConfig — env reads happen in ix.config (Chunk 3). - mime.py: python-magic byte-sniff + SUPPORTED_MIMES frozenset + require_supported(mime) helper that raises IX_000_005. - pages.py: DocumentIngestor.build_pages(files, texts) -> (list[Page], list[PageMetadata]). PDFs via PyMuPDF (hard 100 pg/PDF cap -> IX_000_006), images via Pillow (multi-frame TIFFs yield multiple Pages), texts as zero-dim Pages so GenAIStep can still cite them. 21 new unit tests (141 total) cover: fetch success with headers, 4xx/5xx mapping, timeout -> IX_000_007, size cap enforced globally + per-file, file:// happy path + missing file, MIME detection for PDF/PNG/JPEG/TIFF, require_supported gate, PDF/TIFF/text page counts, 101-page PDF -> IX_000_006, multi-file file_index assignment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 11:12:00 +02:00
Dirk Riemann	118a9abd09	feat(clients): OCRClient + GenAIClient protocols + fakes (spec §6.2, §6.3) All checks were successful tests / test (push) Successful in 1m0s Details tests / test (pull_request) Successful in 1m1s Details Adds the two Protocol-based client contracts the pipeline steps depend on, plus test-oriented fakes. Real engines (Surya, Ollama) land in Chunk 4. - ix.ocr.client.OCRClient — runtime_checkable Protocol with async ocr(). - ix.genai.client.GenAIClient — runtime_checkable Protocol with async invoke(); GenAIInvocationResult + GenAIUsage dataclasses carry the parsed model, token usage, and model name. - FakeOCRClient / FakeGenAIClient: return canned results; both expose a raise_on_call hook for error-path tests. 8 unit tests across tests/unit/test_ocr_fake.py + test_genai_fake.py confirm protocol conformance, canned-return behaviour, usage/model-name defaults, and raise_on_call propagation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 11:08:24 +02:00
Dirk Riemann	dcd1bc764a	feat(pipeline): Step ABC + Pipeline runner + Timer (spec §3, §4) All checks were successful tests / test (push) Successful in 56s Details tests / test (pull_request) Successful in 1m7s Details Adds the transport-agnostic pipeline orchestrator. Each step implements async validate + process; the runner wraps both in a Timer, writes per-step entries to response.metadata.timings, and aborts on the first IXException by writing response.error. - Step exposes a step_name property (defaults to class name) so tests and logs label steps consistently. - Timer is a plain context manager that appends one {step, elapsed_seconds} entry on exit regardless of whether the body raised, so the timeline stays reconstructable for failed steps. - 9 unit tests cover ordering, skip-on-false, IXException in validate vs. process, timings populated for every executed step, and shared-response mutation across steps. Non-IX exceptions propagate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 11:06:46 +02:00
Dirk Riemann	1e340c82fa	feat(provenance): mapper + verifier for ReliabilityStep (spec §9.4, §6) All checks were successful tests / test (pull_request) Successful in 1m10s Details tests / test (push) Successful in 1m11s Details Lands the two remaining provenance-subsystem pieces: mapper.py — map_segment_refs_to_provenance: - For each LLM SegmentCitation, pick seg-ids per source_type (`value` vs `value_and_context`), cap at max_sources_per_field, resolve each via SegmentIndex, track invalid references. - Resolve field values by dot-path (`result.items[0].name` supported — `[N]` bracket notation is normalised to `.N` before traversal). - Skip fields that resolve to zero valid sources (spec §9.4). - Write quality_metrics with fields_with_provenance / total_fields / coverage_rate / invalid_references. verify.py — verify_field + apply_reliability_flags: - Dispatches per Pydantic field type: date → parse-both-sides compare; int/float/Decimal → normalize + whole-snippet / numeric-token scan; IBAN (detected via `iban` in field name) → upper+strip compare; Literal / None → flags stay None; else string substring. - _unwrap_optional handles BOTH typing.Union AND types.UnionType so `Decimal \| None` (PEP 604, what get_type_hints emits on 3.12+) resolves correctly — caught by the integration-style test_writes_flags_and_counters. - Number comparator scans numeric tokens in the snippet so labels ("Closing balance CHF 1'234.56") don't mask the match. - apply_reliability_flags mutates the passed ProvenanceData in place and writes verified_fields / text_agreement_fields to quality_metrics. Tests cover each comparator, Literal/None skip, short-value skip (strings and numerics), Decimal via optional union, and end-to-end flag+counter writing against a Pydantic use-case schema that mirrors bank_statement_header. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 11:01:19 +02:00
Dirk Riemann	527fc620fe	feat(provenance): normalisers + short-value skip rule (spec §6) All checks were successful tests / test (pull_request) Successful in 1m0s Details tests / test (push) Successful in 1m28s Details Pure functions the ReliabilityStep will compose to compare extracted values against OCR snippets (and context.texts). Kept in one module so every rule is directly unit-testable without pulling in the step ABC. Highlights: - `normalize_string`: NFKC + casefold + strip common punctuation (. , : ; ! ? () [] {} / \\ ' " `) + collapse whitespace. Substring-compatible. - `normalize_number`: returns the canonical "[-]DDD.DD" form (always 2dp) after stripping currency symbols. Heuristic separator detection handles Swiss-German apostrophes ("1'234.56"), de-DE commas ("1.234,56"), and plain ASCII ("1234.56" / "1234.5" / "1234"). Accepts native int/float/ Decimal as well as str. - `normalize_date`: dateutil parse with dayfirst=True → ISO YYYY-MM-DD. Date and datetime objects short-circuit to their isoformat(). - `normalize_iban`: uppercase + strip whitespace. Format validation is the call site's job; this is pure canonicalisation. - `should_skip_text_agreement`: dispatches on type + value. Literal → skip, None → skip, numeric \|v\|<10 → skip, len(str) ≤ 2 → skip. Numeric check runs first so `10` (len("10")==2) is treated on the numeric side (not skipped) instead of tripping the string length rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 10:56:31 +02:00
Dirk Riemann	1321d57354	feat(segmentation): SegmentIndex + prompt-text formatter (spec §9.1) All checks were successful tests / test (push) Successful in 58s Details tests / test (pull_request) Successful in 56s Details Builds the ID <-> on-page-anchor map used by both the GenAIStep (to emit the segment-tagged user message) and the provenance mapper (to resolve LLM-cited IDs back to bbox/text/file_index). Design notes: - `build()` is a classmethod so the pipeline constructs the index in one place (OCRStep) and passes the constructed instance along in the internal context. No mutable global state; tests build indexes inline from fake OCR fixtures. - Per-page metadata (file_index) arrives via a parallel `list[PageMetadata]` rather than being smuggled into OCRResult. Keeps segmentation decoupled from ingestion — the OCR engine legitimately doesn't know which file a page came from. - Page-tag lines (`<page …>` / `</page>`) are filtered via a regex so the LLM can never cite them as provenance. `line_idx_in_page` increments only for real lines so the IDs stay dense (p1_l0, p1_l1, ...). - Bounding-box normalisation divides x-coords by page width, y-coords by page height. Zero dimensions (defensive) pass through unchanged. - `to_prompt_text(context_texts=[...])` appends paperless-style texts untagged, separated from the tagged body by a blank line (spec §7.2b). Deterministic for prompt caching. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 10:53:46 +02:00
Dirk Riemann	b80c7952f7	feat(use_cases): registry + bank_statement_header (spec §7) All checks were successful tests / test (pull_request) Successful in 1m0s Details tests / test (push) Successful in 58s Details First use case lands. The schema is intentionally flat — nine scalar fields, no nested arrays — because Ollama's structured-output guidance stays most reliable when the top level has only scalars, and every field we care about (bank_name, IBAN, period, opening/closing balance) can be rendered as one. Registration is explicit in `use_cases/__init__.py`, not a side effect of importing the use-case module. That keeps load order obvious and lets tests patch the registry without having to reload modules. `get_use_case(name)` is the one-liner adapters use; it raises `IX_001_001` with the offending name in `detail` when the lookup misses, which keeps log-scrape simple. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 10:51:43 +02:00
Dirk Riemann	02db3b05cc	feat(contracts): ResponseIX + Provenance + Job envelope (spec §3, §9.3) All checks were successful tests / test (push) Successful in 1m2s Details tests / test (pull_request) Successful in 1m0s Details Completes the data-contract layer. Highlights: - `ResponseIX.context` is an internal mutable accumulator used by pipeline steps (pages, files, texts, use_case classes, segment index). It MUST NOT leak into the serialised response, so we mark the field with `Field(exclude=True)` and carry the shape in a small `_InternalContext` sub-model with `extra="allow"` so steps can stash arbitrary state without schema churn. Tested: `model_dump()` and `model_dump_json()` both drop it. - `FieldProvenance` gains `provenance_verified: bool \| None` and `text_agreement: bool \| None` — the two MVP reliability flags written by the new ReliabilityStep. Both default None so rows predating the ReliabilityStep (empty LLM output, cloud-import replay) parse cleanly. - `quality_metrics` stays a free-form `dict[str, Any]` — the MVP adds `verified_fields` and `text_agreement_fields` counters without carving them into the schema, which keeps future metric additions free. - `Job.status` and `Job.callback_status` are `Literal[...]` so Pydantic rejects unknown states at the edge. Invariant (`status='done' iff response.error is None`) stays worker-enforced — callers sometimes hydrate in-flight rows and we do not want validation to reject them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 10:50:22 +02:00
Dirk Riemann	181cc0fbea	feat(contracts): RequestIX + Context + Options per spec §3 All checks were successful tests / test (push) Successful in 1m2s Details tests / test (pull_request) Successful in 1m6s Details Adds the incoming-request data contracts as Pydantic v2 models. Matches the MVP spec §3 exactly — fields dropped from the reference spec (use_vision, reasoning_effort, version, ...) stay out, and `extra="forbid"` catches any caller that sends them so drift surfaces immediately instead of silently. Context.files is `list[str \| FileRef]`: plain URLs stay str, dict entries parse as FileRef. This keeps the common case (public URL) one-liner while still supporting Paperless-style auth headers and per-file size caps. ix_id stays optional with a docstring warning that callers MUST NOT set it — the transport layer assigns the 16-char hex handle on insert. The field is present so `Job` round-trips out of the store. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 10:47:31 +02:00
Dirk Riemann	ae595c937a	feat(errors): add IXException + IXErrorCode per spec §8 All checks were successful tests / test (push) Successful in 1m2s Details tests / test (pull_request) Successful in 59s Details Adds the single exception type used throughout the pipeline. Every failure maps to one of the ten IX_* codes from the MVP spec §8 with a stable machine-readable code and an optional free-form detail. The `str()` form is log-scrapable with a single regex (`IX_xxx_xxx: <msg> (detail=...)`), so mammon-side reliability UX can classify failures without brittle string parsing. Enum values equal names so callers can serialise either. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 10:46:01 +02:00
Dirk Riemann	7e141829ac	fix(ci): create empty tests/integration so pytest doesn't error on missing dir All checks were successful tests / test (pull_request) Successful in 1m4s Details Integration tests land in Chunk 3; until then CI needs the directory to exist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 10:39:26 +02:00
Dirk Riemann	57cdfd73fb	feat(scaffold): project skeleton with uv + pytest + forgejo CI Some checks failed CI / test (pull_request) Failing after 4s Details - pyproject.toml: runtime deps (FastAPI, SQLAlchemy async, Pydantic, PyMuPDF, python-magic, Pillow, dateutil), dev group (pytest, pytest-asyncio, pytest-httpx, ruff, mypy), optional `ocr` extra that pulls surya-ocr + torch (kept optional so CI without GPU can run the base package). - pytest config: asyncio_mode=auto; `live` marker for tests that need a real Ollama/Surya (gated on IX_TEST_OLLAMA=1). - Single smoke test (tests/unit/test_scaffolding.py) verifies the package imports and exposes __version__ — keeps CI green until the real test modules land in later chunks. - .forgejo/workflows/ci.yml: runs ruff + pytest against a Postgres 16 service container. Explicit IX_TEST_MODE=fake keeps real-client tests out. - .env.example: every IX_* var from spec §9 with on-prem-friendly defaults. - uv.lock committed for reproducible builds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 10:36:43 +02:00

33 commits