qwen3:14b (and deepseek-r1, other reasoning models) wrap their output
in <think>…</think> chains-of-thought before emitting real output.
With format=json the constrained sampler terminated immediately at
`{}` because the thinking block wasn't valid JSON; without format the
model thinks normally and appends the actual JSON at the end.
OllamaClient now omits the format flag and extracts the outermost
balanced `{…}` block from the response (brace depth counter, string-
literal aware). Works for reasoning models, ```json``` code-fenced
outputs, and plain JSON alike.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
format=json loose mode gives valid JSON but no shape — models default
to emitting {} when the system prompt doesn't list fields. Prepend a
schema-guidance system message with the full Pydantic schema (after
the existing null-branch sanitiser) so the model sees exactly what
shape to produce. Pydantic still validates on parse.
Unit tests updated to check the schema message is prepended without
disturbing the caller's own messages.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ollama 0.11.8 segfaults on any Pydantic-shaped structured-output schema
with $ref, anyOf, or pattern — confirmed on the deploy host with the
simplest MVP case (BankStatementHeader alone). The earlier null-stripping
sanitiser wasn't enough.
Switch to format="json", which is "emit valid JSON" mode. We're already
describing the exact JSON shape in the system prompt (via GenAIStep +
the use case's citation instruction appendix) and validating the
response body through Pydantic on parse — which raises IX_002_001 on
schema mismatch, exactly as before.
Stronger guarantees can come back later via a newer Ollama, an API
fix, or a different GenAIClient impl. None of that is needed for the
MVP to work end to end.
Unit tests: the sanitiser left in place (harmless, still tested). The
"happy path" test now asserts format == "json".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ollama 0.11.8's llama.cpp structured-output implementation segfaults on
Pydantic v2's standard Optional pattern:
{"anyOf": [{"type": "string"}, {"type": "null"}]}
Confirmed on the deploy host: /api/chat request with the MVP's
ProvenanceWrappedResponse schema crashed Ollama with SIGSEGV; the client
saw httpx RemoteProtocolError → IX_002_000.
New _sanitise_schema_for_ollama walks the schema recursively and drops
"type: null" branches from every anyOf. Single-branch unions are
inlined so sibling keys (default, title) survive. This only narrows
what the LLM is *told* it may emit; Pydantic still validates the real
response body against the original schema and accepts None for
Optional fields if they were absent or explicitly null.
Existing unit tests updated: the "happy path" test no longer pins the
format to `_Schema.model_json_schema()` verbatim — instead it asserts
the sanitisation effect on a known-Optional field.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The shared postgis container is bound to 127.0.0.1 on the host (security
hardening, infrastructure §T12). Ollama is similarly LAN-hardened. The
previous `host.docker.internal + extra_hosts: host-gateway` approach
points at the bridge gateway IP, not loopback, so the container couldn't
reach either service.
Switch to `network_mode: host` (same pattern goldstein uses) and update
the default IX_POSTGRES_URL / IX_OLLAMA_URL to 127.0.0.1. Keep the GPU
reservation block; drop the now-meaningless ports: declaration (host mode
publishes directly).
AppConfig defaults + .env.example + test_config assertions + inline
docstring examples all follow.
Caught on fourth deploy attempt.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The home server's Ollama doesn't have gpt-oss:20b pulled; qwen3:14b is
already there and is what mammon's chat agent uses. Switching the default
now so the first deploy passes the /healthz ollama probe without an extra
`ollama pull` step. The spec lists gpt-oss:20b as a concrete example;
qwen3:14b is equally on-prem and Ollama-structured-output-compatible.
Touched: AppConfig default, BankStatementHeader Request.default_model,
.env.example, setup_server.sh ollama-list check, AGENTS.md, deployment.md,
live tests. Unit tests that hard-coded the old model string but don't
assert the default were left alone.
Also: ASCII en-dash in e2e_smoke.py Paperless-style text (ruff RUF001).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task 4.3 closes the loop on Chunk 4: the FastAPI lifespan now selects
fake vs real clients via IX_TEST_MODE (new AppConfig field), wires
/healthz probes to the live selfcheck() on OllamaClient / SuryaOCRClient,
and spawns the worker with a production Pipeline factory that builds
SetupStep -> OCRStep -> GenAIStep -> ReliabilityStep -> ResponseHandler
over the injected clients.
Factories:
- make_genai_client(cfg) -> FakeGenAIClient | OllamaClient
- make_ocr_client(cfg) -> FakeOCRClient | SuryaOCRClient (spec §6.2)
Probes run the async selfcheck on a fresh event loop in a short-lived
thread so they're safe to call from either sync callers or a live
FastAPI handler without stalling the request loop.
Drops the worker-loop spawn_worker_task stub — the app module owns the
production spawn directly.
Tests: +11 unit tests (5 factories + 6 app-wiring / probe adapter /
pipeline build). Full suite: 236 passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Runs Surya's detection + recognition over PIL images rendered from each
Page's source file (PDFs via PyMuPDF, images via Pillow). Lazy warm_up
so FastAPI lifespan start stays predictable. Deferred Surya/torch
imports keep the base install slim — the heavy deps stay under [ocr].
Extends OCRClient Protocol with optional files + page_metadata kwargs
so the engine can resolve each page back to its on-disk source; Fake
accepts-and-ignores to keep hermetic tests unchanged.
selfcheck() runs the predictors on a 1x1 PIL image — wired into /healthz
by Task 4.3.
Tests: 6 hermetic unit tests (Surya predictors mocked, no model
download); 2 live tests gated on IX_TEST_OLLAMA=1 (never run in CI).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Real GenAIClient for the production pipeline. Sends `format=<pydantic JSON
schema>`, `stream=false`, and mapped options (`temperature`; drops
`reasoning_effort`). Content-parts lists joined to a single string since
MVP models don't speak native content-parts. Error mapping per spec:
connection/timeout/5xx → IX_002_000, schema violations → IX_002_001.
`selfcheck()` probes /api/tags with a fixed 5 s timeout for /healthz.
Tests: 10 hermetic pytest-httpx unit tests; 2 live tests gated on
IX_TEST_OLLAMA=1 (never run in CI).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PgQueueListener:
- Dedicated asyncpg connection outside the SQLAlchemy pool (LISTEN
needs a persistent connection; pooled connections check in/out).
- Exposes wait_for_work(timeout) — resolves on NOTIFY or timeout,
whichever fires first. The worker treats both wakes identically.
- asyncpg_dsn_from_sqlalchemy_url strips the +asyncpg driver segment
and percent-decodes the password so the same URL in IX_POSTGRES_URL
works for both SQLAlchemy and raw asyncpg.
app.py lifespan now also spawns the listener alongside the worker;
both are gated on spawn_worker=True so REST-only tests stay fast.
2 new integration tests: NOTIFY path (wake within 2 s despite 60 s
poll) + missed-NOTIFY path (fallback poll recovers within 5 s). 33
integration tests total, 209 unit. Forgejo Actions trigger is flaky;
local verification is the gate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Worker:
- Startup: sweep_orphans(now, max_running_seconds) rescues rows stuck
in 'running' from a crashed prior process.
- Loop: claim_next_pending → build pipeline via injected factory → run
→ mark_done/mark_error → deliver callback if set → record outcome.
- Non-IX exceptions from the pipeline collapse to IX_002_000 so callers
see a stable error code.
- Sleep loop uses a cancellable wait so the stop event reacts
immediately; the wait_for_work hook is ready for Task 3.6 to plug in
the LISTEN-driven event without the worker knowing about NOTIFY.
Callback:
- One-shot POST, 2xx → delivered, anything else (incl. connect/timeout
exceptions) → failed. No retries.
- Callback record never reverts the job's terminal state — GET /jobs/{id}
stays the authoritative fallback.
7 integration tests: happy path, pipeline-raise → error, callback 2xx,
callback 5xx, orphan sweep on startup, no-callback rows stay
callback_status=None (x2 via parametrize). Unit suite still 209.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Routes:
- POST /jobs: 201 on first insert, 200 on idempotent re-submit.
- GET /jobs/{id}: full Job envelope or 404.
- GET /jobs?client_id=&request_id=: correlation lookup or 404.
- GET /healthz: {postgres, ollama, ocr}; 200 iff all ok (degraded counts
as non-200 per spec). Postgres probe guarded by a 2 s wait_for.
- GET /metrics: pending/running counts + 24h done/error counters +
per-use-case avg seconds. Plain JSON, no Prometheus.
create_app(spawn_worker=bool) parameterises worker spawning so tests that
only need REST pass False. Worker spawn is tolerant of the loop module not
being importable yet (Task 3.5 fills it in).
Probes are a DI bundle — production wiring swaps them in at startup
(Chunk 4); tests inject canned ok/fail callables. Session factory is also
DI'd so tests can point at a per-loop engine and sidestep the async-pg
cross-loop future issue that bit the jobs_repo fixture.
9 new integration tests; unit suite unchanged. Forgejo Actions trigger is
flaky; local verification is the gate (unit + integration green locally).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JobsRepo covers the full job-lifecycle surface:
- insert_pending: idempotent on (client_id, request_id) via ON CONFLICT
DO NOTHING + re-select; assigns a 16-hex ix_id.
- claim_next_pending: FOR UPDATE SKIP LOCKED so concurrent workers never
double-dispatch a row.
- get / get_by_correlation: hydrates JSONB back through Pydantic.
- mark_done: done iff response.error is None, else error.
- mark_error: explicit convenience wrapper.
- update_callback_status: delivered | failed (no status transition).
- sweep_orphans: time-based rescue of stuck running rows; attempts++.
Integration fixtures (tests/integration/conftest.py):
- Skip cleanly when neither IX_TEST_DATABASE_URL nor IX_POSTGRES_URL is
set (unit suite stays runnable on a bare laptop).
- Alembic upgrade/downgrade runs in a subprocess so its internal
asyncio.run() doesn't collide with pytest-asyncio's loop.
- Per-test engine + truncate so loops never cross and tests start clean.
15 integration tests against a live postgres:16, including SKIP LOCKED
concurrency + orphan sweep.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Typed pydantic-settings view over every IX_* env var, defaults matching
spec §9 exactly. @lru_cache-wrapped accessor so parsing/validation happens
once per process; tests clear the cache via get_config.cache_clear().
extra="ignore" keeps the container robust against typo'd env vars in
production .env files. engine.py's URL resolver now goes through
get_config() when ix.config is importable (bootstrap fallback remains so
hypothetical early-import callers don't crash).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the async-friendly Alembic env (NullPool, reads IX_POSTGRES_URL), the
hand-written 001 migration matching the spec's table layout exactly
(CHECK on status, partial index on pending rows, UNIQUE on
(client_id, request_id)), the SQLAlchemy 2.0 ORM mapping, and a lazy
engine/session factory. The factory reads the URL through ix.config when
available; Task 3.2 makes that the only path.
Smoke-tested: alembic upgrade head + downgrade base against a live
postgres:16 produce the expected table shape and tear down cleanly.
Unit tests assert the migration source contains every required column/index
so the migration can't drift from spec at import time.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the five pipeline steps together with FakeOCRClient +
FakeGenAIClient, feeds the committed synthetic_giro.pdf fixture via
file:// URL, and asserts the full response shape.
- scripts/create_fixture_pdf.py: PyMuPDF-based builder. One-page A4 PDF
with six known header strings (bank name, IBAN, period, balances,
statement date). Re-runnable on demand; the committed PDF is what CI
consumes.
- tests/fixtures/synthetic_giro.pdf: committed output.
- tests/unit/test_pipeline_end_to_end.py: 5 tests covering
* ix_result.result fields populated from the fake LLM
* provenance.fields["result.closing_balance"].provenance_verified True
* text_agreement True when Paperless-style texts match the value
* metadata.timings has one entry per step in the right order
* response.error is None and context is not serialised
197 tests total; ruff clean. No integration tests, no real clients,
no network.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final pipeline step. Three mechanical transforms:
1. include_ocr_text -> concatenate non-tag line texts, pages joined
with \n\n, write to ocr_result.result.text.
2. include_geometries=False (default) -> strip ocr_result.result.pages
+ ocr_result.meta_data. Geometries are heavy; callers opt in.
3. Delete response.context so the internal accumulator never leaks to
the caller (belt-and-braces; Field(exclude=True) already does this).
validate() always returns True per spec.
7 unit tests in tests/unit/test_response_handler_step.py cover all
three branches + context-not-in-model_dump check.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Thin wrapper around ix.provenance.apply_reliability_flags. Validate
skips entirely when include_provenance is off OR when no provenance
data was built (text-only request, etc.). Process reads
context.texts + context.use_case_response and lets the verifier mutate
the FieldProvenance entries + fill quality_metrics counters in place.
11 unit tests in tests/unit/test_reliability_step.py cover: validate
skips on flag off / missing provenance, runs otherwise; per-type
flag behaviour (string verified + text_agreement, Literal -> None,
None value -> None, short numeric -> text_agreement None, date with
both sides parsed, IBAN whitespace-insensitive, disagreement -> False);
quality_metrics verified_fields / text_agreement_fields counters.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Assembles the prompt, picks the structured-output schema, calls the
injected GenAIClient, and maps any emitted segment_citations into
response.provenance. Reliability flags stay None here; ReliabilityStep
fills them in Task 2.7.
- System prompt = use_case.system_prompt + (provenance-on) the verbatim
citation instruction from spec §9.2.
- User text = SegmentIndex.to_prompt_text([p1_l0] style) when provenance
is on, else plain OCR flat text + texts joined.
- Response schema = UseCaseResponse directly, or a runtime
create_model("ProvenanceWrappedResponse", result=(UCR, ...),
segment_citations=(list[SegmentCitation], Field(default_factory=list)))
when provenance is on.
- Model = request override -> use-case default.
- Failure modes: httpx / connection / timeout errors -> IX_002_000;
pydantic.ValidationError -> IX_002_001.
- Writes ix_result.result + ix_result.meta_data (model_name +
token_usage); builds response.provenance via
map_segment_refs_to_provenance when provenance is on.
17 unit tests in tests/unit/test_genai_step.py cover validate
(ocr_only skip, empty -> IX_001_000, text-only, ocr-text path), process
happy path, system-prompt shape with/without citation instruction, user
text tagged vs. plain, response schema plain vs. wrapped, provenance
mapping, error mapping (IX_002_000 + IX_002_001), and model selection
(request override + use-case default).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Runs after SetupStep. Dispatches the flat page list to the injected
OCRClient, writes the raw OCRResult onto response.ocr_result, injects
<page file="..." number="..."> open/close tag lines around each page's
content, and builds a SegmentIndex over the non-tag lines when
provenance is on.
Validate follows the spec triad rule:
- include_geometries/include_ocr_text/ocr_only + no files -> IX_000_004
- no files -> False (skip)
- files + (use_ocr or triad) -> True
9 unit tests in tests/unit/test_ocr_step.py cover all three validate
branches, OCRResult written, page tags injected (format + file_index),
SegmentIndex built iff provenance on.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First pipeline step. Validates the request (IX_000_002 on empty context),
normalises every Context.files entry to a FileRef, downloads them in
parallel via asyncio.gather, byte-sniffs MIMEs (IX_000_005 for
unsupported), loads the use-case pair from REGISTRY (IX_001_001 on
miss), and builds the flat pages + page_metadata list on
response_ix.context.
Fetcher / ingestor / MIME detector / tmp_dir / fetch_config all inject
via the constructor so unit tests stay hermetic — production wires the
real ix.ingestion defaults via the app factory.
7 unit tests in tests/unit/test_setup_step.py cover validate errors,
happy path (fetcher + ingestor invoked correctly, context populated,
use_case_name echoed), FileRef headers pass through, unsupported MIME
-> IX_000_005, unknown use case -> IX_001_001, text-only request, and
the _InternalContext type assertion.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the two Protocol-based client contracts the pipeline steps depend on,
plus test-oriented fakes. Real engines (Surya, Ollama) land in Chunk 4.
- ix.ocr.client.OCRClient — runtime_checkable Protocol with async ocr().
- ix.genai.client.GenAIClient — runtime_checkable Protocol with async
invoke(); GenAIInvocationResult + GenAIUsage dataclasses carry the
parsed model, token usage, and model name.
- FakeOCRClient / FakeGenAIClient: return canned results; both expose a
raise_on_call hook for error-path tests.
8 unit tests across tests/unit/test_ocr_fake.py + test_genai_fake.py
confirm protocol conformance, canned-return behaviour, usage/model-name
defaults, and raise_on_call propagation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the transport-agnostic pipeline orchestrator. Each step implements
async validate + process; the runner wraps both in a Timer, writes
per-step entries to response.metadata.timings, and aborts on the first
IXException by writing response.error.
- Step exposes a step_name property (defaults to class name) so tests and
logs label steps consistently.
- Timer is a plain context manager that appends one {step, elapsed_seconds}
entry on exit regardless of whether the body raised, so the timeline
stays reconstructable for failed steps.
- 9 unit tests cover ordering, skip-on-false, IXException in validate vs.
process, timings populated for every executed step, and shared-response
mutation across steps. Non-IX exceptions propagate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the two remaining provenance-subsystem pieces:
mapper.py — map_segment_refs_to_provenance:
- For each LLM SegmentCitation, pick seg-ids per source_type
(`value` vs `value_and_context`), cap at max_sources_per_field,
resolve each via SegmentIndex, track invalid references.
- Resolve field values by dot-path (`result.items[0].name` supported —
`[N]` bracket notation is normalised to `.N` before traversal).
- Skip fields that resolve to zero valid sources (spec §9.4).
- Write quality_metrics with fields_with_provenance / total_fields /
coverage_rate / invalid_references.
verify.py — verify_field + apply_reliability_flags:
- Dispatches per Pydantic field type: date → parse-both-sides compare;
int/float/Decimal → normalize + whole-snippet / numeric-token scan;
IBAN (detected via `iban` in field name) → upper+strip compare;
Literal / None → flags stay None; else string substring.
- _unwrap_optional handles BOTH typing.Union AND types.UnionType so
`Decimal | None` (PEP 604, what get_type_hints emits on 3.12+) resolves
correctly — caught by the integration-style test_writes_flags_and_counters.
- Number comparator scans numeric tokens in the snippet so labels
("Closing balance CHF 1'234.56") don't mask the match.
- apply_reliability_flags mutates the passed ProvenanceData in place and
writes verified_fields / text_agreement_fields to quality_metrics.
Tests cover each comparator, Literal/None skip, short-value skip (strings
and numerics), Decimal via optional union, and end-to-end flag+counter
writing against a Pydantic use-case schema that mirrors bank_statement_header.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure functions the ReliabilityStep will compose to compare extracted values
against OCR snippets (and context.texts). Kept in one module so every rule
is directly unit-testable without pulling in the step ABC.
Highlights:
- `normalize_string`: NFKC + casefold + strip common punctuation (. , : ; !
? () [] {} / \\ ' " `) + collapse whitespace. Substring-compatible.
- `normalize_number`: returns the canonical "[-]DDD.DD" form (always 2dp)
after stripping currency symbols. Heuristic separator detection handles
Swiss-German apostrophes ("1'234.56"), de-DE commas ("1.234,56"), and
plain ASCII ("1234.56" / "1234.5" / "1234"). Accepts native int/float/
Decimal as well as str.
- `normalize_date`: dateutil parse with dayfirst=True → ISO YYYY-MM-DD.
Date and datetime objects short-circuit to their isoformat().
- `normalize_iban`: uppercase + strip whitespace. Format validation is the
call site's job; this is pure canonicalisation.
- `should_skip_text_agreement`: dispatches on type + value. Literal → skip,
None → skip, numeric |v|<10 → skip, len(str) ≤ 2 → skip. Numeric check
runs first so `10` (len("10")==2) is treated on the numeric side
(not skipped) instead of tripping the string length rule.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Builds the ID <-> on-page-anchor map used by both the GenAIStep (to emit the
segment-tagged user message) and the provenance mapper (to resolve LLM-cited
IDs back to bbox/text/file_index).
Design notes:
- `build()` is a classmethod so the pipeline constructs the index in one
place (OCRStep) and passes the constructed instance along in the internal
context. No mutable global state; tests build indexes inline from fake
OCR fixtures.
- Per-page metadata (file_index) arrives via a parallel `list[PageMetadata]`
rather than being smuggled into OCRResult. Keeps segmentation decoupled
from ingestion — the OCR engine legitimately doesn't know which file a
page came from.
- Page-tag lines (`<page …>` / `</page>`) are filtered via a regex so the
LLM can never cite them as provenance. `line_idx_in_page` increments only
for real lines so the IDs stay dense (p1_l0, p1_l1, ...).
- Bounding-box normalisation divides x-coords by page width, y-coords by
page height. Zero dimensions (defensive) pass through unchanged.
- `to_prompt_text(context_texts=[...])` appends paperless-style texts
untagged, separated from the tagged body by a blank line (spec §7.2b).
Deterministic for prompt caching.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First use case lands. The schema is intentionally flat — nine scalar fields,
no nested arrays — because Ollama's structured-output guidance stays most
reliable when the top level has only scalars, and every field we care about
(bank_name, IBAN, period, opening/closing balance) can be rendered as one.
Registration is explicit in `use_cases/__init__.py`, not a side effect of
importing the use-case module. That keeps load order obvious and lets tests
patch the registry without having to reload modules.
`get_use_case(name)` is the one-liner adapters use; it raises
`IX_001_001` with the offending name in `detail` when the lookup misses,
which keeps log-scrape simple.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Completes the data-contract layer. Highlights:
- `ResponseIX.context` is an internal mutable accumulator used by pipeline
steps (pages, files, texts, use_case classes, segment index). It MUST NOT
leak into the serialised response, so we mark the field with
`Field(exclude=True)` and carry the shape in a small `_InternalContext`
sub-model with `extra="allow"` so steps can stash arbitrary state without
schema churn. Tested: `model_dump()` and `model_dump_json()` both drop it.
- `FieldProvenance` gains `provenance_verified: bool | None` and
`text_agreement: bool | None` — the two MVP reliability flags written by
the new ReliabilityStep. Both default None so rows predating the
ReliabilityStep (empty LLM output, cloud-import replay) parse cleanly.
- `quality_metrics` stays a free-form `dict[str, Any]` — the MVP adds
`verified_fields` and `text_agreement_fields` counters without carving
them into the schema, which keeps future metric additions free.
- `Job.status` and `Job.callback_status` are `Literal[...]` so Pydantic
rejects unknown states at the edge. Invariant
(`status='done' iff response.error is None`) stays worker-enforced —
callers sometimes hydrate in-flight rows and we do not want validation
to reject them.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the incoming-request data contracts as Pydantic v2 models. Matches the
MVP spec §3 exactly — fields dropped from the reference spec (use_vision,
reasoning_effort, version, ...) stay out, and `extra="forbid"` catches any
caller that sends them so drift surfaces immediately instead of silently.
Context.files is `list[str | FileRef]`: plain URLs stay str, dict entries
parse as FileRef. This keeps the common case (public URL) one-liner while
still supporting Paperless-style auth headers and per-file size caps.
ix_id stays optional with a docstring warning that callers MUST NOT set it —
the transport layer assigns the 16-char hex handle on insert. The field is
present so `Job` round-trips out of the store.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the single exception type used throughout the pipeline. Every failure
maps to one of the ten IX_* codes from the MVP spec §8 with a stable
machine-readable code and an optional free-form detail. The `str()` form is
log-scrapable with a single regex (`IX_xxx_xxx: <msg> (detail=...)`), so
mammon-side reliability UX can classify failures without brittle string
parsing.
Enum values equal names so callers can serialise either.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- pyproject.toml: runtime deps (FastAPI, SQLAlchemy async, Pydantic, PyMuPDF,
python-magic, Pillow, dateutil), dev group (pytest, pytest-asyncio,
pytest-httpx, ruff, mypy), optional `ocr` extra that pulls surya-ocr + torch
(kept optional so CI without GPU can run the base package).
- pytest config: asyncio_mode=auto; `live` marker for tests that need a real
Ollama/Surya (gated on IX_TEST_OLLAMA=1).
- Single smoke test (tests/unit/test_scaffolding.py) verifies the package
imports and exposes __version__ — keeps CI green until the real test
modules land in later chunks.
- .forgejo/workflows/ci.yml: runs ruff + pytest against a Postgres 16 service
container. Explicit IX_TEST_MODE=fake keeps real-client tests out.
- .env.example: every IX_* var from spec §9 with on-prem-friendly defaults.
- uv.lock committed for reproducible builds.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>