First deploy done 2026-04-18. E2E extraction of the bank_statement_header
use case completes in 35 s against the live service, with 7 of 9 header
fields provenance-verified + text-agreement-green. closing_balance
asserts from spec §12 all pass.
Updates:
- README.md: status -> "MVP deployed"; worked example curl snippet;
pointers to deployment runbook + spec + plan.
- AGENTS.md: status line updated with the live URL + date.
- pyproject.toml: version comment referencing the first deploy.
- docs/deployment.md: "First deploy" section filled in with times,
field-level extraction result, plus a log of every small Docker/ops
follow-up PR that had to land to make the first deploy healthy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
qwen3:14b (and deepseek-r1, other reasoning models) wrap their output
in <think>…</think> chains-of-thought before emitting real output.
With format=json the constrained sampler terminated immediately at
`{}` because the thinking block wasn't valid JSON; without format the
model thinks normally and appends the actual JSON at the end.
OllamaClient now omits the format flag and extracts the outermost
balanced `{…}` block from the response (brace depth counter, string-
literal aware). Works for reasoning models, ```json``` code-fenced
outputs, and plain JSON alike.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
format=json loose mode gives valid JSON but no shape — models default
to emitting {} when the system prompt doesn't list fields. Prepend a
schema-guidance system message with the full Pydantic schema (after
the existing null-branch sanitiser) so the model sees exactly what
shape to produce. Pydantic still validates on parse.
Unit tests updated to check the schema message is prepended without
disturbing the caller's own messages.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ollama 0.11.8 segfaults on any Pydantic-shaped structured-output schema
with $ref, anyOf, or pattern — confirmed on the deploy host with the
simplest MVP case (BankStatementHeader alone). The earlier null-stripping
sanitiser wasn't enough.
Switch to format="json", which is "emit valid JSON" mode. We're already
describing the exact JSON shape in the system prompt (via GenAIStep +
the use case's citation instruction appendix) and validating the
response body through Pydantic on parse — which raises IX_002_001 on
schema mismatch, exactly as before.
Stronger guarantees can come back later via a newer Ollama, an API
fix, or a different GenAIClient impl. None of that is needed for the
MVP to work end to end.
Unit tests: the sanitiser left in place (harmless, still tested). The
"happy path" test now asserts format == "json".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First /healthz call on a fresh container triggers Surya to fetch the
text-recognition (1.34 GB) and detection (73 MB) models from HuggingFace.
Without a volume they land in the container fs and vanish on every
rebuild, which is every deploy.
Mount named volumes for /root/.cache/datalab (Surya) and
/root/.cache/huggingface. Rebuild now keeps /healthz warm.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ollama 0.11.8's llama.cpp structured-output implementation segfaults on
Pydantic v2's standard Optional pattern:
{"anyOf": [{"type": "string"}, {"type": "null"}]}
Confirmed on the deploy host: /api/chat request with the MVP's
ProvenanceWrappedResponse schema crashed Ollama with SIGSEGV; the client
saw httpx RemoteProtocolError → IX_002_000.
New _sanitise_schema_for_ollama walks the schema recursively and drops
"type: null" branches from every anyOf. Single-branch unions are
inlined so sibling keys (default, title) survive. This only narrows
what the LLM is *told* it may emit; Pydantic still validates the real
response body against the original schema and accepts None for
Optional fields if they were absent or explicitly null.
Existing unit tests updated: the "happy path" test no longer pins the
format to `_Schema.model_json_schema()` verbatim — instead it asserts
the sanitisation effect on a known-Optional field.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Our client code imports surya.foundation (added in 0.17). The earlier
cu124 torch pin forced uv to downgrade surya to 0.14.1, which doesn't
have that module and depends on a transformers version that lacks
QuantizedCacheConfig. Net: ocr: fail at /healthz.
Drop the cu124 index pin. surya 0.17.1 needs torch >= 2.7, which the
default pypi torch (2.11) satisfies. The deploy host's CUDA 12.4
driver doesn't match torch 2.11's cu13 wheels, so CUDA init warns and
the GPU isn't available — torch + Surya transparently fall back to CPU.
Slower than GPU but correct for MVP. A host driver upgrade later will
unlock GPU with no code changes.
Unit suite stays green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The default pypi torch (2.11 as of lockfile) ships cu13 wheels, which
refuse to initialise against the deploy host's NVIDIA 12.4 driver
(UserWarning: "driver on your system is too old (found version 12040)").
/healthz reported ocr: fail because Surya couldn't pick up the GPU.
Use `tool.uv.sources` to route torch through PyTorch's cu124 index.
That pulls torch 2.6.0+cu124 (still satisfies surya-ocr >= 0.9). Lock
updated. transformers downgraded to 4.57.6, triton to 3.2.0 — all
compatible with surya and each other.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The shared postgis container is bound to 127.0.0.1 on the host (security
hardening, infrastructure §T12). Ollama is similarly LAN-hardened. The
previous `host.docker.internal + extra_hosts: host-gateway` approach
points at the bridge gateway IP, not loopback, so the container couldn't
reach either service.
Switch to `network_mode: host` (same pattern goldstein uses) and update
the default IX_POSTGRES_URL / IX_OLLAMA_URL to 127.0.0.1. Keep the GPU
reservation block; drop the now-meaningless ports: declaration (host mode
publishes directly).
AppConfig defaults + .env.example + test_config assertions + inline
docstring examples all follow.
Caught on fourth deploy attempt.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Docker on the deploy host doesn't register 'nvidia' as a named runtime
(modern nvidia-container-toolkit hooks via --gpus all / resources.devices
instead). Immich-ml on the same host uses only deploy.resources.devices
with driver: nvidia, which is enough. Drop the legacy runtime line.
Caught on third deploy attempt.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pyproject.toml names README.md as the readme; hatchling validates that
file exists when uv sync resolves the editable install of the project
itself. Caught on second deploy build.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Python 3.12 from deadsnakes on Ubuntu 22.04 drops `distutils` from the
stdlib, and Ubuntu's system pip still imports from it — so `pip install`
fails immediately with ModuleNotFoundError: distutils. Switch to the uv
standalone installer, which doesn't need pip at all.
Caught during the first deploy build.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The home server's Ollama doesn't have gpt-oss:20b pulled; qwen3:14b is
already there and is what mammon's chat agent uses. Switching the default
now so the first deploy passes the /healthz ollama probe without an extra
`ollama pull` step. The spec lists gpt-oss:20b as a concrete example;
qwen3:14b is equally on-prem and Ollama-structured-output-compatible.
Touched: AppConfig default, BankStatementHeader Request.default_model,
.env.example, setup_server.sh ollama-list check, AGENTS.md, deployment.md,
live tests. Unit tests that hard-coded the old model string but don't
assert the default were left alone.
Also: ASCII en-dash in e2e_smoke.py Paperless-style text (ruff RUF001).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Runs from the Mac after every `git push server main`.
Flow: starts a tiny HTTP server on the Mac's LAN IP serving
tests/fixtures/synthetic_giro.pdf → POST /jobs with bank_statement_header
+ Paperless-style texts so text_agreement has something to check against →
poll GET /jobs/{id} until terminal → assert status=done, bank_name
non-empty, closing_balance.provenance_verified=True, text_agreement=True,
elapsed < 60 s. Non-zero exit blocks the deploy.
Uses only stdlib (http.server, urllib) — no extra deps on the Mac-side,
no test framework overhead.
Task 5.4 of MVP plan.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- scripts/setup_server.sh: idempotent one-shot. Creates bare repo,
post-receive hook (which rebuilds docker compose + gates on /healthz),
infoxtractor Postgres role + DB on the shared postgis container, .env
(0600) from .env.example with the password substituted in, verifies
gpt-oss:20b is pulled.
- docs/deployment.md: topology, one-time setup command, normal deploy
workflow, rollback-via-revert pattern (never force-push main),
operational checklists for the common /healthz degraded states.
- First deploy section reserved; filled in after Task 5.3 runs.
Task 5.2 of MVP plan.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Task 4.3 closes the loop on Chunk 4: the FastAPI lifespan now selects
fake vs real clients via IX_TEST_MODE (new AppConfig field), wires
/healthz probes to the live selfcheck() on OllamaClient / SuryaOCRClient,
and spawns the worker with a production Pipeline factory that builds
SetupStep -> OCRStep -> GenAIStep -> ReliabilityStep -> ResponseHandler
over the injected clients.
Factories:
- make_genai_client(cfg) -> FakeGenAIClient | OllamaClient
- make_ocr_client(cfg) -> FakeOCRClient | SuryaOCRClient (spec §6.2)
Probes run the async selfcheck on a fresh event loop in a short-lived
thread so they're safe to call from either sync callers or a live
FastAPI handler without stalling the request loop.
Drops the worker-loop spawn_worker_task stub — the app module owns the
production spawn directly.
Tests: +11 unit tests (5 factories + 6 app-wiring / probe adapter /
pipeline build). Full suite: 236 passed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Runs Surya's detection + recognition over PIL images rendered from each
Page's source file (PDFs via PyMuPDF, images via Pillow). Lazy warm_up
so FastAPI lifespan start stays predictable. Deferred Surya/torch
imports keep the base install slim — the heavy deps stay under [ocr].
Extends OCRClient Protocol with optional files + page_metadata kwargs
so the engine can resolve each page back to its on-disk source; Fake
accepts-and-ignores to keep hermetic tests unchanged.
selfcheck() runs the predictors on a 1x1 PIL image — wired into /healthz
by Task 4.3.
Tests: 6 hermetic unit tests (Surya predictors mocked, no model
download); 2 live tests gated on IX_TEST_OLLAMA=1 (never run in CI).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Real GenAIClient for the production pipeline. Sends `format=<pydantic JSON
schema>`, `stream=false`, and mapped options (`temperature`; drops
`reasoning_effort`). Content-parts lists joined to a single string since
MVP models don't speak native content-parts. Error mapping per spec:
connection/timeout/5xx → IX_002_000, schema violations → IX_002_001.
`selfcheck()` probes /api/tags with a fixed 5 s timeout for /healthz.
Tests: 10 hermetic pytest-httpx unit tests; 2 live tests gated on
IX_TEST_OLLAMA=1 (never run in CI).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PgQueueListener:
- Dedicated asyncpg connection outside the SQLAlchemy pool (LISTEN
needs a persistent connection; pooled connections check in/out).
- Exposes wait_for_work(timeout) — resolves on NOTIFY or timeout,
whichever fires first. The worker treats both wakes identically.
- asyncpg_dsn_from_sqlalchemy_url strips the +asyncpg driver segment
and percent-decodes the password so the same URL in IX_POSTGRES_URL
works for both SQLAlchemy and raw asyncpg.
app.py lifespan now also spawns the listener alongside the worker;
both are gated on spawn_worker=True so REST-only tests stay fast.
2 new integration tests: NOTIFY path (wake within 2 s despite 60 s
poll) + missed-NOTIFY path (fallback poll recovers within 5 s). 33
integration tests total, 209 unit. Forgejo Actions trigger is flaky;
local verification is the gate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Worker:
- Startup: sweep_orphans(now, max_running_seconds) rescues rows stuck
in 'running' from a crashed prior process.
- Loop: claim_next_pending → build pipeline via injected factory → run
→ mark_done/mark_error → deliver callback if set → record outcome.
- Non-IX exceptions from the pipeline collapse to IX_002_000 so callers
see a stable error code.
- Sleep loop uses a cancellable wait so the stop event reacts
immediately; the wait_for_work hook is ready for Task 3.6 to plug in
the LISTEN-driven event without the worker knowing about NOTIFY.
Callback:
- One-shot POST, 2xx → delivered, anything else (incl. connect/timeout
exceptions) → failed. No retries.
- Callback record never reverts the job's terminal state — GET /jobs/{id}
stays the authoritative fallback.
7 integration tests: happy path, pipeline-raise → error, callback 2xx,
callback 5xx, orphan sweep on startup, no-callback rows stay
callback_status=None (x2 via parametrize). Unit suite still 209.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Routes:
- POST /jobs: 201 on first insert, 200 on idempotent re-submit.
- GET /jobs/{id}: full Job envelope or 404.
- GET /jobs?client_id=&request_id=: correlation lookup or 404.
- GET /healthz: {postgres, ollama, ocr}; 200 iff all ok (degraded counts
as non-200 per spec). Postgres probe guarded by a 2 s wait_for.
- GET /metrics: pending/running counts + 24h done/error counters +
per-use-case avg seconds. Plain JSON, no Prometheus.
create_app(spawn_worker=bool) parameterises worker spawning so tests that
only need REST pass False. Worker spawn is tolerant of the loop module not
being importable yet (Task 3.5 fills it in).
Probes are a DI bundle — production wiring swaps them in at startup
(Chunk 4); tests inject canned ok/fail callables. Session factory is also
DI'd so tests can point at a per-loop engine and sidestep the async-pg
cross-loop future issue that bit the jobs_repo fixture.
9 new integration tests; unit suite unchanged. Forgejo Actions trigger is
flaky; local verification is the gate (unit + integration green locally).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JobsRepo covers the full job-lifecycle surface:
- insert_pending: idempotent on (client_id, request_id) via ON CONFLICT
DO NOTHING + re-select; assigns a 16-hex ix_id.
- claim_next_pending: FOR UPDATE SKIP LOCKED so concurrent workers never
double-dispatch a row.
- get / get_by_correlation: hydrates JSONB back through Pydantic.
- mark_done: done iff response.error is None, else error.
- mark_error: explicit convenience wrapper.
- update_callback_status: delivered | failed (no status transition).
- sweep_orphans: time-based rescue of stuck running rows; attempts++.
Integration fixtures (tests/integration/conftest.py):
- Skip cleanly when neither IX_TEST_DATABASE_URL nor IX_POSTGRES_URL is
set (unit suite stays runnable on a bare laptop).
- Alembic upgrade/downgrade runs in a subprocess so its internal
asyncio.run() doesn't collide with pytest-asyncio's loop.
- Per-test engine + truncate so loops never cross and tests start clean.
15 integration tests against a live postgres:16, including SKIP LOCKED
concurrency + orphan sweep.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Typed pydantic-settings view over every IX_* env var, defaults matching
spec §9 exactly. @lru_cache-wrapped accessor so parsing/validation happens
once per process; tests clear the cache via get_config.cache_clear().
extra="ignore" keeps the container robust against typo'd env vars in
production .env files. engine.py's URL resolver now goes through
get_config() when ix.config is importable (bootstrap fallback remains so
hypothetical early-import callers don't crash).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the async-friendly Alembic env (NullPool, reads IX_POSTGRES_URL), the
hand-written 001 migration matching the spec's table layout exactly
(CHECK on status, partial index on pending rows, UNIQUE on
(client_id, request_id)), the SQLAlchemy 2.0 ORM mapping, and a lazy
engine/session factory. The factory reads the URL through ix.config when
available; Task 3.2 makes that the only path.
Smoke-tested: alembic upgrade head + downgrade base against a live
postgres:16 produce the expected table shape and tear down cleanly.
Unit tests assert the migration source contains every required column/index
so the migration can't drift from spec at import time.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>