From 842c4da90cecd0aa43eec237fe772bac3d9035e6 Mon Sep 17 00:00:00 2001 From: Dirk Riemann Date: Sat, 18 Apr 2026 14:08:07 +0200 Subject: [PATCH] =?UTF-8?q?chore:=20MVP=20deployed=20=E2=80=94=20readme,?= =?UTF-8?q?=20AGENTS.md=20status,=20deploy=20runbook=20filled=20in?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit First deploy done 2026-04-18. E2E extraction of the bank_statement_header use case completes in 35 s against the live service, with 7 of 9 header fields provenance-verified + text-agreement-green. closing_balance asserts from spec §12 all pass. Updates: - README.md: status -> "MVP deployed"; worked example curl snippet; pointers to deployment runbook + spec + plan. - AGENTS.md: status line updated with the live URL + date. - pyproject.toml: version comment referencing the first deploy. - docs/deployment.md: "First deploy" section filled in with times, field-level extraction result, plus a log of every small Docker/ops follow-up PR that had to land to make the first deploy healthy. Co-Authored-By: Claude Opus 4.7 (1M context) --- AGENTS.md | 2 +- README.md | 45 ++++++++++++++++++++++++++++++++++++++++++++- docs/deployment.md | 29 +++++++++++++++++++++++------ pyproject.toml | 2 ++ 4 files changed, 70 insertions(+), 8 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 7a417f3..ef55c1b 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -4,7 +4,7 @@ Async, on-prem, LLM-powered structured information extraction microservice. Give Designed to be used by other on-prem services (e.g. mammon) as a reliable fallback / second opinion for format-specific deterministic parsers. -Status: design phase. Full reference spec at `docs/spec-core-pipeline.md`. MVP spec will live at `docs/superpowers/specs/`. +Status: MVP deployed (2026-04-18) at `http://192.168.68.42:8994` — LAN only. Full reference spec at `docs/spec-core-pipeline.md`; MVP spec at `docs/superpowers/specs/2026-04-18-ix-mvp-design.md`; deploy runbook at `docs/deployment.md`. ## Guiding Principles diff --git a/README.md b/README.md index 71652b4..1906459 100644 --- a/README.md +++ b/README.md @@ -4,10 +4,12 @@ Async, on-prem, LLM-powered structured information extraction microservice. Given a document (PDF, image, text) and a named *use case*, ix returns a structured JSON result whose shape matches the use-case schema — together with per-field provenance (OCR segment IDs, bounding boxes, cross-OCR agreement flags) that let the caller decide how much to trust each extracted value. -**Status:** design phase. Implementation about to start. +**Status:** MVP deployed. Live on the home LAN at `http://192.168.68.42:8994`. - Full reference spec: [`docs/spec-core-pipeline.md`](docs/spec-core-pipeline.md) (aspirational; MVP is a strict subset) - **MVP design:** [`docs/superpowers/specs/2026-04-18-ix-mvp-design.md`](docs/superpowers/specs/2026-04-18-ix-mvp-design.md) +- **Implementation plan:** [`docs/superpowers/plans/2026-04-18-ix-mvp-implementation.md`](docs/superpowers/plans/2026-04-18-ix-mvp-implementation.md) +- **Deployment runbook:** [`docs/deployment.md`](docs/deployment.md) - Agent / development notes: [`AGENTS.md`](AGENTS.md) ## Principles @@ -15,3 +17,44 @@ Given a document (PDF, image, text) and a named *use case*, ix returns a structu - **On-prem always.** LLM = Ollama, OCR = local engines (Surya first). No OpenAI / Anthropic / Azure / AWS / cloud. - **Grounded extraction, not DB truth.** ix returns best-effort fields + provenance; the caller decides what to trust. - **Transport-agnostic pipeline core.** REST + Postgres-queue adapters in parallel on one job store. + +## Submitting a job + +```bash +curl -X POST http://192.168.68.42:8994/jobs \ + -H "Content-Type: application/json" \ + -d '{ + "use_case": "bank_statement_header", + "ix_client_id": "mammon", + "request_id": "some-correlation-id", + "context": { + "files": [{ + "url": "http://paperless.local/api/documents/42/download/", + "headers": {"Authorization": "Token …"} + }], + "texts": [""] + } + }' +# → {"job_id":"…","ix_id":"…","status":"pending"} +``` + +Poll `GET /jobs/{job_id}` until `status` is `done` or `error`. Optionally pass `callback_url` to receive a webhook on completion (one-shot, no retry; polling stays authoritative). + +Full REST surface + provenance response shape documented in the MVP design spec. + +## Running locally + +```bash +uv sync --extra dev +uv run pytest tests/unit -v # hermetic unit + integration suite +IX_TEST_OLLAMA=1 uv run pytest tests/live -v # needs LAN access to Ollama + GPU +``` + +## Deploying + +```bash +git push server main # rebuilds Docker image, restarts container, /healthz deploy gate +python scripts/e2e_smoke.py # E2E acceptance against the live service +``` + +See [`docs/deployment.md`](docs/deployment.md) for full runbook + rollback. diff --git a/docs/deployment.md b/docs/deployment.md index 3de24c5..547b3c4 100644 --- a/docs/deployment.md +++ b/docs/deployment.md @@ -71,13 +71,30 @@ git push server main ## First deploy -_(fill in after running — timestamps, commit sha, e2e_smoke output)_ +- **Date:** 2026-04-18 +- **Commit:** `fix/ollama-extract-json` (#36, the last of several Docker/ops follow-ups after PR #27 shipped the initial Dockerfile) +- **`/healthz`:** all three probes (`postgres`, `ollama`, `ocr`) green. First-pass took ~7 min for the fresh container because Surya's recognition (1.34 GB) + detection (73 MB) models download from HuggingFace on first run; subsequent rebuilds reuse the named volumes declared in `docker-compose.yml` and come up in <30 s. +- **E2E extraction:** `bank_statement_header` against `tests/fixtures/synthetic_giro.pdf` with Paperless-style texts: + - Pipeline completes in **35 s**. + - Extracted: `bank_name=DKB`, `account_iban=DE89370400440532013000`, `currency=EUR`, `opening_balance=1234.56`, `closing_balance=1450.22`, `statement_date=2026-03-31`, `statement_period_end=2026-03-31`, `statement_period_start=2026-03-01`, `account_type=null`. + - Provenance: 8 / 9 leaf fields have sources; 7 / 8 `provenance_verified` and `text_agreement` are True. `statement_period_start` shows up in the OCR but normalisation fails (dateutil picks a different interpretation of the cited day); to be chased in a follow-up. -- **Date:** TBD -- **Commit:** TBD -- **`/healthz` first-ok time:** TBD -- **`e2e_smoke.py` status:** TBD -- **Notes:** — +### Docker-ops follow-ups that landed during the first deploy + +All small, each merged as its own PR. In commit order after the scaffold (#27): + +- **#31** `fix(docker): uv via standalone installer` — Python 3.12 on Ubuntu 22.04 drops `distutils`; Ubuntu's pip needed it. Switched to the `uv` standalone installer, which has no pip dependency. +- **#32** `fix(docker): include README.md in the uv sync COPY` — `hatchling` validates the readme file exists when resolving the editable project install. +- **#33** `fix(compose): drop runtime: nvidia` — the deploy host's Docker daemon doesn't register a named `nvidia` runtime; `deploy.resources.devices` is sufficient and matches immich-ml. +- **#34** `fix(deploy): network_mode: host` — `postgis` is bound to `127.0.0.1` on the host (security hardening T12). `host.docker.internal` points at the bridge gateway, not loopback, so the container couldn't reach postgis. Goldstein uses the same pattern. +- **#35** `fix(deps): pin surya-ocr ^0.17` — earlier cu124 torch pin had forced surya to 0.14.1, which breaks our `surya.foundation` import and needs a transformers version that lacks `QuantizedCacheConfig`. +- **#36** `fix(genai): drop Ollama format flag; extract trailing JSON` — Ollama 0.11.8 segfaults on Pydantic JSON Schemas (`$ref`, `anyOf`, `pattern`), and `format="json"` terminates reasoning models (qwen3) at `{}` because their `` chain-of-thought isn't valid JSON. Omit the flag, inject the schema into the system prompt, extract the outermost `{…}` balanced block from the response. +- **volumes** — named `ix_surya_cache` + `ix_hf_cache` mount `/root/.cache/datalab` + `/root/.cache/huggingface` so rebuilds don't re-download ~1.5 GB of model weights. + +Production notes: + +- `IX_DEFAULT_MODEL=qwen3:14b` (already pulled on the host). Spec listed `gpt-oss:20b` as a concrete example; swapped to keep the deploy on-prem without an extra `ollama pull`. +- Torch 2.11 default cu13 wheels fall back to CPU against the host's CUDA 12.4 driver — Surya runs on CPU. Expected inference times: seconds per page. Upgrading the NVIDIA driver (or pinning a cu12-compatible torch wheel newer than 2.7) will unlock GPU with no code changes. ## E2E smoke test (`scripts/e2e_smoke.py`) diff --git a/pyproject.toml b/pyproject.toml index 898f0d0..fbf6c30 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,8 @@ [project] name = "infoxtractor" version = "0.1.0" +# Released 2026-04-18 with the first live deploy of the MVP. See +# docs/deployment.md §"First deploy" for the commit + /healthz times. description = "Async on-prem LLM-powered structured information extraction microservice" readme = "README.md" requires-python = ">=3.12"