Ollama 0.11.8 segfaults on any Pydantic-shaped structured-output schema
with $ref, anyOf, or pattern — confirmed on the deploy host with the
simplest MVP case (BankStatementHeader alone). The earlier null-stripping
sanitiser wasn't enough.
Switch to format="json", which is "emit valid JSON" mode. We're already
describing the exact JSON shape in the system prompt (via GenAIStep +
the use case's citation instruction appendix) and validating the
response body through Pydantic on parse — which raises IX_002_001 on
schema mismatch, exactly as before.
Stronger guarantees can come back later via a newer Ollama, an API
fix, or a different GenAIClient impl. None of that is needed for the
MVP to work end to end.
Unit tests: the sanitiser left in place (harmless, still tested). The
"happy path" test now asserts format == "json".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First /healthz call on a fresh container triggers Surya to fetch the
text-recognition (1.34 GB) and detection (73 MB) models from HuggingFace.
Without a volume they land in the container fs and vanish on every
rebuild, which is every deploy.
Mount named volumes for /root/.cache/datalab (Surya) and
/root/.cache/huggingface. Rebuild now keeps /healthz warm.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ollama 0.11.8's llama.cpp structured-output implementation segfaults on
Pydantic v2's standard Optional pattern:
{"anyOf": [{"type": "string"}, {"type": "null"}]}
Confirmed on the deploy host: /api/chat request with the MVP's
ProvenanceWrappedResponse schema crashed Ollama with SIGSEGV; the client
saw httpx RemoteProtocolError → IX_002_000.
New _sanitise_schema_for_ollama walks the schema recursively and drops
"type: null" branches from every anyOf. Single-branch unions are
inlined so sibling keys (default, title) survive. This only narrows
what the LLM is *told* it may emit; Pydantic still validates the real
response body against the original schema and accepts None for
Optional fields if they were absent or explicitly null.
Existing unit tests updated: the "happy path" test no longer pins the
format to `_Schema.model_json_schema()` verbatim — instead it asserts
the sanitisation effect on a known-Optional field.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Our client code imports surya.foundation (added in 0.17). The earlier
cu124 torch pin forced uv to downgrade surya to 0.14.1, which doesn't
have that module and depends on a transformers version that lacks
QuantizedCacheConfig. Net: ocr: fail at /healthz.
Drop the cu124 index pin. surya 0.17.1 needs torch >= 2.7, which the
default pypi torch (2.11) satisfies. The deploy host's CUDA 12.4
driver doesn't match torch 2.11's cu13 wheels, so CUDA init warns and
the GPU isn't available — torch + Surya transparently fall back to CPU.
Slower than GPU but correct for MVP. A host driver upgrade later will
unlock GPU with no code changes.
Unit suite stays green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The default pypi torch (2.11 as of lockfile) ships cu13 wheels, which
refuse to initialise against the deploy host's NVIDIA 12.4 driver
(UserWarning: "driver on your system is too old (found version 12040)").
/healthz reported ocr: fail because Surya couldn't pick up the GPU.
Use `tool.uv.sources` to route torch through PyTorch's cu124 index.
That pulls torch 2.6.0+cu124 (still satisfies surya-ocr >= 0.9). Lock
updated. transformers downgraded to 4.57.6, triton to 3.2.0 — all
compatible with surya and each other.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The shared postgis container is bound to 127.0.0.1 on the host (security
hardening, infrastructure §T12). Ollama is similarly LAN-hardened. The
previous `host.docker.internal + extra_hosts: host-gateway` approach
points at the bridge gateway IP, not loopback, so the container couldn't
reach either service.
Switch to `network_mode: host` (same pattern goldstein uses) and update
the default IX_POSTGRES_URL / IX_OLLAMA_URL to 127.0.0.1. Keep the GPU
reservation block; drop the now-meaningless ports: declaration (host mode
publishes directly).
AppConfig defaults + .env.example + test_config assertions + inline
docstring examples all follow.
Caught on fourth deploy attempt.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Docker on the deploy host doesn't register 'nvidia' as a named runtime
(modern nvidia-container-toolkit hooks via --gpus all / resources.devices
instead). Immich-ml on the same host uses only deploy.resources.devices
with driver: nvidia, which is enough. Drop the legacy runtime line.
Caught on third deploy attempt.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pyproject.toml names README.md as the readme; hatchling validates that
file exists when uv sync resolves the editable install of the project
itself. Caught on second deploy build.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Python 3.12 from deadsnakes on Ubuntu 22.04 drops `distutils` from the
stdlib, and Ubuntu's system pip still imports from it — so `pip install`
fails immediately with ModuleNotFoundError: distutils. Switch to the uv
standalone installer, which doesn't need pip at all.
Caught during the first deploy build.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The home server's Ollama doesn't have gpt-oss:20b pulled; qwen3:14b is
already there and is what mammon's chat agent uses. Switching the default
now so the first deploy passes the /healthz ollama probe without an extra
`ollama pull` step. The spec lists gpt-oss:20b as a concrete example;
qwen3:14b is equally on-prem and Ollama-structured-output-compatible.
Touched: AppConfig default, BankStatementHeader Request.default_model,
.env.example, setup_server.sh ollama-list check, AGENTS.md, deployment.md,
live tests. Unit tests that hard-coded the old model string but don't
assert the default were left alone.
Also: ASCII en-dash in e2e_smoke.py Paperless-style text (ruff RUF001).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Runs from the Mac after every `git push server main`.
Flow: starts a tiny HTTP server on the Mac's LAN IP serving
tests/fixtures/synthetic_giro.pdf → POST /jobs with bank_statement_header
+ Paperless-style texts so text_agreement has something to check against →
poll GET /jobs/{id} until terminal → assert status=done, bank_name
non-empty, closing_balance.provenance_verified=True, text_agreement=True,
elapsed < 60 s. Non-zero exit blocks the deploy.
Uses only stdlib (http.server, urllib) — no extra deps on the Mac-side,
no test framework overhead.
Task 5.4 of MVP plan.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- scripts/setup_server.sh: idempotent one-shot. Creates bare repo,
post-receive hook (which rebuilds docker compose + gates on /healthz),
infoxtractor Postgres role + DB on the shared postgis container, .env
(0600) from .env.example with the password substituted in, verifies
gpt-oss:20b is pulled.
- docs/deployment.md: topology, one-time setup command, normal deploy
workflow, rollback-via-revert pattern (never force-push main),
operational checklists for the common /healthz degraded states.
- First deploy section reserved; filled in after Task 5.3 runs.
Task 5.2 of MVP plan.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task 4.3 closes the loop on Chunk 4: the FastAPI lifespan now selects
fake vs real clients via IX_TEST_MODE (new AppConfig field), wires
/healthz probes to the live selfcheck() on OllamaClient / SuryaOCRClient,
and spawns the worker with a production Pipeline factory that builds
SetupStep -> OCRStep -> GenAIStep -> ReliabilityStep -> ResponseHandler
over the injected clients.
Factories:
- make_genai_client(cfg) -> FakeGenAIClient | OllamaClient
- make_ocr_client(cfg) -> FakeOCRClient | SuryaOCRClient (spec §6.2)
Probes run the async selfcheck on a fresh event loop in a short-lived
thread so they're safe to call from either sync callers or a live
FastAPI handler without stalling the request loop.
Drops the worker-loop spawn_worker_task stub — the app module owns the
production spawn directly.
Tests: +11 unit tests (5 factories + 6 app-wiring / probe adapter /
pipeline build). Full suite: 236 passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Runs Surya's detection + recognition over PIL images rendered from each
Page's source file (PDFs via PyMuPDF, images via Pillow). Lazy warm_up
so FastAPI lifespan start stays predictable. Deferred Surya/torch
imports keep the base install slim — the heavy deps stay under [ocr].
Extends OCRClient Protocol with optional files + page_metadata kwargs
so the engine can resolve each page back to its on-disk source; Fake
accepts-and-ignores to keep hermetic tests unchanged.
selfcheck() runs the predictors on a 1x1 PIL image — wired into /healthz
by Task 4.3.
Tests: 6 hermetic unit tests (Surya predictors mocked, no model
download); 2 live tests gated on IX_TEST_OLLAMA=1 (never run in CI).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Real GenAIClient for the production pipeline. Sends `format=<pydantic JSON
schema>`, `stream=false`, and mapped options (`temperature`; drops
`reasoning_effort`). Content-parts lists joined to a single string since
MVP models don't speak native content-parts. Error mapping per spec:
connection/timeout/5xx → IX_002_000, schema violations → IX_002_001.
`selfcheck()` probes /api/tags with a fixed 5 s timeout for /healthz.
Tests: 10 hermetic pytest-httpx unit tests; 2 live tests gated on
IX_TEST_OLLAMA=1 (never run in CI).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PgQueueListener:
- Dedicated asyncpg connection outside the SQLAlchemy pool (LISTEN
needs a persistent connection; pooled connections check in/out).
- Exposes wait_for_work(timeout) — resolves on NOTIFY or timeout,
whichever fires first. The worker treats both wakes identically.
- asyncpg_dsn_from_sqlalchemy_url strips the +asyncpg driver segment
and percent-decodes the password so the same URL in IX_POSTGRES_URL
works for both SQLAlchemy and raw asyncpg.
app.py lifespan now also spawns the listener alongside the worker;
both are gated on spawn_worker=True so REST-only tests stay fast.
2 new integration tests: NOTIFY path (wake within 2 s despite 60 s
poll) + missed-NOTIFY path (fallback poll recovers within 5 s). 33
integration tests total, 209 unit. Forgejo Actions trigger is flaky;
local verification is the gate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Worker:
- Startup: sweep_orphans(now, max_running_seconds) rescues rows stuck
in 'running' from a crashed prior process.
- Loop: claim_next_pending → build pipeline via injected factory → run
→ mark_done/mark_error → deliver callback if set → record outcome.
- Non-IX exceptions from the pipeline collapse to IX_002_000 so callers
see a stable error code.
- Sleep loop uses a cancellable wait so the stop event reacts
immediately; the wait_for_work hook is ready for Task 3.6 to plug in
the LISTEN-driven event without the worker knowing about NOTIFY.
Callback:
- One-shot POST, 2xx → delivered, anything else (incl. connect/timeout
exceptions) → failed. No retries.
- Callback record never reverts the job's terminal state — GET /jobs/{id}
stays the authoritative fallback.
7 integration tests: happy path, pipeline-raise → error, callback 2xx,
callback 5xx, orphan sweep on startup, no-callback rows stay
callback_status=None (x2 via parametrize). Unit suite still 209.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Routes:
- POST /jobs: 201 on first insert, 200 on idempotent re-submit.
- GET /jobs/{id}: full Job envelope or 404.
- GET /jobs?client_id=&request_id=: correlation lookup or 404.
- GET /healthz: {postgres, ollama, ocr}; 200 iff all ok (degraded counts
as non-200 per spec). Postgres probe guarded by a 2 s wait_for.
- GET /metrics: pending/running counts + 24h done/error counters +
per-use-case avg seconds. Plain JSON, no Prometheus.
create_app(spawn_worker=bool) parameterises worker spawning so tests that
only need REST pass False. Worker spawn is tolerant of the loop module not
being importable yet (Task 3.5 fills it in).
Probes are a DI bundle — production wiring swaps them in at startup
(Chunk 4); tests inject canned ok/fail callables. Session factory is also
DI'd so tests can point at a per-loop engine and sidestep the async-pg
cross-loop future issue that bit the jobs_repo fixture.
9 new integration tests; unit suite unchanged. Forgejo Actions trigger is
flaky; local verification is the gate (unit + integration green locally).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JobsRepo covers the full job-lifecycle surface:
- insert_pending: idempotent on (client_id, request_id) via ON CONFLICT
DO NOTHING + re-select; assigns a 16-hex ix_id.
- claim_next_pending: FOR UPDATE SKIP LOCKED so concurrent workers never
double-dispatch a row.
- get / get_by_correlation: hydrates JSONB back through Pydantic.
- mark_done: done iff response.error is None, else error.
- mark_error: explicit convenience wrapper.
- update_callback_status: delivered | failed (no status transition).
- sweep_orphans: time-based rescue of stuck running rows; attempts++.
Integration fixtures (tests/integration/conftest.py):
- Skip cleanly when neither IX_TEST_DATABASE_URL nor IX_POSTGRES_URL is
set (unit suite stays runnable on a bare laptop).
- Alembic upgrade/downgrade runs in a subprocess so its internal
asyncio.run() doesn't collide with pytest-asyncio's loop.
- Per-test engine + truncate so loops never cross and tests start clean.
15 integration tests against a live postgres:16, including SKIP LOCKED
concurrency + orphan sweep.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Typed pydantic-settings view over every IX_* env var, defaults matching
spec §9 exactly. @lru_cache-wrapped accessor so parsing/validation happens
once per process; tests clear the cache via get_config.cache_clear().
extra="ignore" keeps the container robust against typo'd env vars in
production .env files. engine.py's URL resolver now goes through
get_config() when ix.config is importable (bootstrap fallback remains so
hypothetical early-import callers don't crash).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the async-friendly Alembic env (NullPool, reads IX_POSTGRES_URL), the
hand-written 001 migration matching the spec's table layout exactly
(CHECK on status, partial index on pending rows, UNIQUE on
(client_id, request_id)), the SQLAlchemy 2.0 ORM mapping, and a lazy
engine/session factory. The factory reads the URL through ix.config when
available; Task 3.2 makes that the only path.
Smoke-tested: alembic upgrade head + downgrade base against a live
postgres:16 produce the expected table shape and tear down cleanly.
Unit tests assert the migration source contains every required column/index
so the migration can't drift from spec at import time.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the five pipeline steps together with FakeOCRClient +
FakeGenAIClient, feeds the committed synthetic_giro.pdf fixture via
file:// URL, and asserts the full response shape.
- scripts/create_fixture_pdf.py: PyMuPDF-based builder. One-page A4 PDF
with six known header strings (bank name, IBAN, period, balances,
statement date). Re-runnable on demand; the committed PDF is what CI
consumes.
- tests/fixtures/synthetic_giro.pdf: committed output.
- tests/unit/test_pipeline_end_to_end.py: 5 tests covering
* ix_result.result fields populated from the fake LLM
* provenance.fields["result.closing_balance"].provenance_verified True
* text_agreement True when Paperless-style texts match the value
* metadata.timings has one entry per step in the right order
* response.error is None and context is not serialised
197 tests total; ruff clean. No integration tests, no real clients,
no network.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final pipeline step. Three mechanical transforms:
1. include_ocr_text -> concatenate non-tag line texts, pages joined
with \n\n, write to ocr_result.result.text.
2. include_geometries=False (default) -> strip ocr_result.result.pages
+ ocr_result.meta_data. Geometries are heavy; callers opt in.
3. Delete response.context so the internal accumulator never leaks to
the caller (belt-and-braces; Field(exclude=True) already does this).
validate() always returns True per spec.
7 unit tests in tests/unit/test_response_handler_step.py cover all
three branches + context-not-in-model_dump check.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Thin wrapper around ix.provenance.apply_reliability_flags. Validate
skips entirely when include_provenance is off OR when no provenance
data was built (text-only request, etc.). Process reads
context.texts + context.use_case_response and lets the verifier mutate
the FieldProvenance entries + fill quality_metrics counters in place.
11 unit tests in tests/unit/test_reliability_step.py cover: validate
skips on flag off / missing provenance, runs otherwise; per-type
flag behaviour (string verified + text_agreement, Literal -> None,
None value -> None, short numeric -> text_agreement None, date with
both sides parsed, IBAN whitespace-insensitive, disagreement -> False);
quality_metrics verified_fields / text_agreement_fields counters.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>