infoxtractor/docs/deployment.md
Dirk Riemann 842c4da90c
All checks were successful
tests / test (push) Successful in 1m16s
tests / test (pull_request) Successful in 1m12s
chore: MVP deployed — readme, AGENTS.md status, deploy runbook filled in
First deploy done 2026-04-18. E2E extraction of the bank_statement_header
use case completes in 35 s against the live service, with 7 of 9 header
fields provenance-verified + text-agreement-green. closing_balance
asserts from spec §12 all pass.

Updates:
- README.md: status -> "MVP deployed"; worked example curl snippet;
  pointers to deployment runbook + spec + plan.
- AGENTS.md: status line updated with the live URL + date.
- pyproject.toml: version comment referencing the first deploy.
- docs/deployment.md: "First deploy" section filled in with times,
  field-level extraction result, plus a log of every small Docker/ops
  follow-up PR that had to land to make the first deploy healthy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 14:08:07 +02:00

8.3 KiB

Deployment

On-prem deploy to 192.168.68.42. Push-to-deploy via a bare git repo + post-receive hook that rebuilds the Docker Compose stack. Pattern mirrors mammon and unified_messaging.

Topology

Mac (dev)
  │  git push server main
  ▼
192.168.68.42:/home/server/Public/infoxtractor/repos.git   (bare)
  │  post-receive → GIT_WORK_TREE=/…/app git checkout -f main
  │                 docker compose up -d --build
  │                 curl /healthz (60 s gate)
  ▼
Docker container `infoxtractor` (port 8994)
  ├─ 127.0.0.1:11434  →  Ollama (qwen3:14b; host-network mode)
  └─ 127.0.0.1:5431   →  postgis (database `infoxtractor`; host-network mode)

One-time server setup

Run once from the Mac. Idempotent.

export IX_POSTGRES_PASSWORD=<generate-a-strong-one>
./scripts/setup_server.sh

The script:

  1. Creates /home/server/Public/infoxtractor/repos.git (bare) + /home/server/Public/infoxtractor/app/ (worktree).
  2. Installs the post-receive hook (see scripts/setup_server.sh for the template).
  3. Creates the infoxtractor Postgres role + database on the shared postgis container.
  4. Writes /home/server/Public/infoxtractor/app/.env (mode 0600) from .env.example with the password substituted in.
  5. Verifies qwen3:14b is pulled in Ollama.
  6. Prints a hint to open UFW for port 8994 on the LAN subnet if it's missing.

After the script finishes, add the deploy remote to the local repo:

git remote add server ssh://server@192.168.68.42/home/server/Public/infoxtractor/repos.git

Normal deploy workflow

# after merging a feat branch into main
git push server main

# tail the server's deploy log
ssh server@192.168.68.42 "tail -f /tmp/infoxtractor-deploy.log"

# healthz gate (the post-receive hook also waits up to 60 s for this)
curl http://192.168.68.42:8994/healthz

# end-to-end smoke — this IS the real acceptance test
python scripts/e2e_smoke.py

If the post-receive hook exits non-zero (healthz never reaches 200), the deploy is considered failed. The previous container keeps running (the hook swaps via docker compose up -d --build, which first builds the new image and only swaps if the build succeeds; if the new container fails /healthz, it's still up but broken). Investigate with docker compose logs --tail 200 in ${APP_DIR} and either fix forward or revert (see below).

Rollback

Never force-push main. Rollbacks happen as forward commits via git revert:

git revert HEAD     # creates a revert commit for the last change
git push forgejo main
git push server main

First deploy

  • Date: 2026-04-18
  • Commit: fix/ollama-extract-json (#36, the last of several Docker/ops follow-ups after PR #27 shipped the initial Dockerfile)
  • /healthz: all three probes (postgres, ollama, ocr) green. First-pass took ~7 min for the fresh container because Surya's recognition (1.34 GB) + detection (73 MB) models download from HuggingFace on first run; subsequent rebuilds reuse the named volumes declared in docker-compose.yml and come up in <30 s.
  • E2E extraction: bank_statement_header against tests/fixtures/synthetic_giro.pdf with Paperless-style texts:
    • Pipeline completes in 35 s.
    • Extracted: bank_name=DKB, account_iban=DE89370400440532013000, currency=EUR, opening_balance=1234.56, closing_balance=1450.22, statement_date=2026-03-31, statement_period_end=2026-03-31, statement_period_start=2026-03-01, account_type=null.
    • Provenance: 8 / 9 leaf fields have sources; 7 / 8 provenance_verified and text_agreement are True. statement_period_start shows up in the OCR but normalisation fails (dateutil picks a different interpretation of the cited day); to be chased in a follow-up.

Docker-ops follow-ups that landed during the first deploy

All small, each merged as its own PR. In commit order after the scaffold (#27):

  • #31 fix(docker): uv via standalone installer — Python 3.12 on Ubuntu 22.04 drops distutils; Ubuntu's pip needed it. Switched to the uv standalone installer, which has no pip dependency.
  • #32 fix(docker): include README.md in the uv sync COPYhatchling validates the readme file exists when resolving the editable project install.
  • #33 fix(compose): drop runtime: nvidia — the deploy host's Docker daemon doesn't register a named nvidia runtime; deploy.resources.devices is sufficient and matches immich-ml.
  • #34 fix(deploy): network_mode: hostpostgis is bound to 127.0.0.1 on the host (security hardening T12). host.docker.internal points at the bridge gateway, not loopback, so the container couldn't reach postgis. Goldstein uses the same pattern.
  • #35 fix(deps): pin surya-ocr ^0.17 — earlier cu124 torch pin had forced surya to 0.14.1, which breaks our surya.foundation import and needs a transformers version that lacks QuantizedCacheConfig.
  • #36 fix(genai): drop Ollama format flag; extract trailing JSON — Ollama 0.11.8 segfaults on Pydantic JSON Schemas ($ref, anyOf, pattern), and format="json" terminates reasoning models (qwen3) at {} because their <think>…</think> chain-of-thought isn't valid JSON. Omit the flag, inject the schema into the system prompt, extract the outermost {…} balanced block from the response.
  • volumes — named ix_surya_cache + ix_hf_cache mount /root/.cache/datalab + /root/.cache/huggingface so rebuilds don't re-download ~1.5 GB of model weights.

Production notes:

  • IX_DEFAULT_MODEL=qwen3:14b (already pulled on the host). Spec listed gpt-oss:20b as a concrete example; swapped to keep the deploy on-prem without an extra ollama pull.
  • Torch 2.11 default cu13 wheels fall back to CPU against the host's CUDA 12.4 driver — Surya runs on CPU. Expected inference times: seconds per page. Upgrading the NVIDIA driver (or pinning a cu12-compatible torch wheel newer than 2.7) will unlock GPU with no code changes.

E2E smoke test (scripts/e2e_smoke.py)

What it does (from the Mac):

  1. Checks /healthz.
  2. Starts a tiny HTTP server on the Mac's LAN IP serving tests/fixtures/synthetic_giro.pdf.
  3. Submits a POST /jobs with use_case=bank_statement_header, the fixture URL in context.files, and a Paperless-style OCR text in context.texts (to exercise the text_agreement cross-check).
  4. Polls GET /jobs/{id} every 2 s until terminal or 120 s timeout.
  5. Asserts: status=="done", bank_name non-empty, provenance.fields["result.closing_balance"].provenance_verified=True, text_agreement=True, total elapsed < 60s.

Non-zero exit means the deploy is not healthy. Roll back via git revert HEAD.

Operational checklists

After ollama pull on the host

The IX_DEFAULT_MODEL env var on the server's .env must match something in ollama list. Changing the default means:

  1. Edit /home/server/Public/infoxtractor/app/.envIX_DEFAULT_MODEL=<new>.
  2. docker compose --project-directory /home/server/Public/infoxtractor/app restart.
  3. curl http://192.168.68.42:8994/healthz → confirm ollama: ok.

If /healthz shows ollama: degraded

qwen3:14b (or the configured default) is not pulled. On the host:

ssh server@192.168.68.42 "docker exec ollama ollama pull qwen3:14b"

If /healthz shows ocr: fail

Surya couldn't initialize (model missing, CUDA unavailable, OOM). First run can be slow — models download on first call. Check container logs:

ssh server@192.168.68.42 "docker logs infoxtractor --tail 200"

If the container fails to start

ssh server@192.168.68.42 "tail -100 /tmp/infoxtractor-deploy.log"
ssh server@192.168.68.42 "docker compose -f /home/server/Public/infoxtractor/app/docker-compose.yml logs --tail 200"

Monitoring

  • Monitoring dashboard auto-discovers via the infrastructure.web_url label on the container: http://192.168.68.42:8001 → "infoxtractor" card.
  • Backup opt-in via backup.enable=true + backup.type=postgres + backup.name=infoxtractor labels. The daily backup script picks up the infoxtractor Postgres database automatically.

Ports

Port Direction Source Service
8994/tcp ALLOW 192.168.68.0/24 ix REST + healthz (LAN only; not publicly exposed)

No VPS Caddy entry; no infrastructure.docs_url label — this is an internal service.