First deploy done 2026-04-18. E2E extraction of the bank_statement_header use case completes in 35 s against the live service, with 7 of 9 header fields provenance-verified + text-agreement-green. closing_balance asserts from spec §12 all pass. Updates: - README.md: status -> "MVP deployed"; worked example curl snippet; pointers to deployment runbook + spec + plan. - AGENTS.md: status line updated with the live URL + date. - pyproject.toml: version comment referencing the first deploy. - docs/deployment.md: "First deploy" section filled in with times, field-level extraction result, plus a log of every small Docker/ops follow-up PR that had to land to make the first deploy healthy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
153 lines
8.3 KiB
Markdown
153 lines
8.3 KiB
Markdown
# Deployment
|
|
|
|
On-prem deploy to `192.168.68.42`. Push-to-deploy via a bare git repo + `post-receive` hook that rebuilds the Docker Compose stack. Pattern mirrors mammon and unified_messaging.
|
|
|
|
## Topology
|
|
|
|
```
|
|
Mac (dev)
|
|
│ git push server main
|
|
▼
|
|
192.168.68.42:/home/server/Public/infoxtractor/repos.git (bare)
|
|
│ post-receive → GIT_WORK_TREE=/…/app git checkout -f main
|
|
│ docker compose up -d --build
|
|
│ curl /healthz (60 s gate)
|
|
▼
|
|
Docker container `infoxtractor` (port 8994)
|
|
├─ 127.0.0.1:11434 → Ollama (qwen3:14b; host-network mode)
|
|
└─ 127.0.0.1:5431 → postgis (database `infoxtractor`; host-network mode)
|
|
```
|
|
|
|
## One-time server setup
|
|
|
|
Run **once** from the Mac. Idempotent.
|
|
|
|
```bash
|
|
export IX_POSTGRES_PASSWORD=<generate-a-strong-one>
|
|
./scripts/setup_server.sh
|
|
```
|
|
|
|
The script:
|
|
1. Creates `/home/server/Public/infoxtractor/repos.git` (bare) + `/home/server/Public/infoxtractor/app/` (worktree).
|
|
2. Installs the `post-receive` hook (see `scripts/setup_server.sh` for the template).
|
|
3. Creates the `infoxtractor` Postgres role + database on the shared `postgis` container.
|
|
4. Writes `/home/server/Public/infoxtractor/app/.env` (mode 0600) from `.env.example` with the password substituted in.
|
|
5. Verifies `qwen3:14b` is pulled in Ollama.
|
|
6. Prints a hint to open UFW for port 8994 on the LAN subnet if it's missing.
|
|
|
|
After the script finishes, add the deploy remote to the local repo:
|
|
|
|
```bash
|
|
git remote add server ssh://server@192.168.68.42/home/server/Public/infoxtractor/repos.git
|
|
```
|
|
|
|
## Normal deploy workflow
|
|
|
|
```bash
|
|
# after merging a feat branch into main
|
|
git push server main
|
|
|
|
# tail the server's deploy log
|
|
ssh server@192.168.68.42 "tail -f /tmp/infoxtractor-deploy.log"
|
|
|
|
# healthz gate (the post-receive hook also waits up to 60 s for this)
|
|
curl http://192.168.68.42:8994/healthz
|
|
|
|
# end-to-end smoke — this IS the real acceptance test
|
|
python scripts/e2e_smoke.py
|
|
```
|
|
|
|
If the post-receive hook exits non-zero (healthz never reaches 200), the deploy is considered failed. The previous container keeps running (the hook swaps via `docker compose up -d --build`, which first builds the new image and only swaps if the build succeeds; if the new container fails `/healthz`, it's still up but broken). Investigate with `docker compose logs --tail 200` in `${APP_DIR}` and either fix forward or revert (see below).
|
|
|
|
## Rollback
|
|
|
|
Never force-push `main`. Rollbacks happen as **forward commits** via `git revert`:
|
|
|
|
```bash
|
|
git revert HEAD # creates a revert commit for the last change
|
|
git push forgejo main
|
|
git push server main
|
|
```
|
|
|
|
## First deploy
|
|
|
|
- **Date:** 2026-04-18
|
|
- **Commit:** `fix/ollama-extract-json` (#36, the last of several Docker/ops follow-ups after PR #27 shipped the initial Dockerfile)
|
|
- **`/healthz`:** all three probes (`postgres`, `ollama`, `ocr`) green. First-pass took ~7 min for the fresh container because Surya's recognition (1.34 GB) + detection (73 MB) models download from HuggingFace on first run; subsequent rebuilds reuse the named volumes declared in `docker-compose.yml` and come up in <30 s.
|
|
- **E2E extraction:** `bank_statement_header` against `tests/fixtures/synthetic_giro.pdf` with Paperless-style texts:
|
|
- Pipeline completes in **35 s**.
|
|
- Extracted: `bank_name=DKB`, `account_iban=DE89370400440532013000`, `currency=EUR`, `opening_balance=1234.56`, `closing_balance=1450.22`, `statement_date=2026-03-31`, `statement_period_end=2026-03-31`, `statement_period_start=2026-03-01`, `account_type=null`.
|
|
- Provenance: 8 / 9 leaf fields have sources; 7 / 8 `provenance_verified` and `text_agreement` are True. `statement_period_start` shows up in the OCR but normalisation fails (dateutil picks a different interpretation of the cited day); to be chased in a follow-up.
|
|
|
|
### Docker-ops follow-ups that landed during the first deploy
|
|
|
|
All small, each merged as its own PR. In commit order after the scaffold (#27):
|
|
|
|
- **#31** `fix(docker): uv via standalone installer` — Python 3.12 on Ubuntu 22.04 drops `distutils`; Ubuntu's pip needed it. Switched to the `uv` standalone installer, which has no pip dependency.
|
|
- **#32** `fix(docker): include README.md in the uv sync COPY` — `hatchling` validates the readme file exists when resolving the editable project install.
|
|
- **#33** `fix(compose): drop runtime: nvidia` — the deploy host's Docker daemon doesn't register a named `nvidia` runtime; `deploy.resources.devices` is sufficient and matches immich-ml.
|
|
- **#34** `fix(deploy): network_mode: host` — `postgis` is bound to `127.0.0.1` on the host (security hardening T12). `host.docker.internal` points at the bridge gateway, not loopback, so the container couldn't reach postgis. Goldstein uses the same pattern.
|
|
- **#35** `fix(deps): pin surya-ocr ^0.17` — earlier cu124 torch pin had forced surya to 0.14.1, which breaks our `surya.foundation` import and needs a transformers version that lacks `QuantizedCacheConfig`.
|
|
- **#36** `fix(genai): drop Ollama format flag; extract trailing JSON` — Ollama 0.11.8 segfaults on Pydantic JSON Schemas (`$ref`, `anyOf`, `pattern`), and `format="json"` terminates reasoning models (qwen3) at `{}` because their `<think>…</think>` chain-of-thought isn't valid JSON. Omit the flag, inject the schema into the system prompt, extract the outermost `{…}` balanced block from the response.
|
|
- **volumes** — named `ix_surya_cache` + `ix_hf_cache` mount `/root/.cache/datalab` + `/root/.cache/huggingface` so rebuilds don't re-download ~1.5 GB of model weights.
|
|
|
|
Production notes:
|
|
|
|
- `IX_DEFAULT_MODEL=qwen3:14b` (already pulled on the host). Spec listed `gpt-oss:20b` as a concrete example; swapped to keep the deploy on-prem without an extra `ollama pull`.
|
|
- Torch 2.11 default cu13 wheels fall back to CPU against the host's CUDA 12.4 driver — Surya runs on CPU. Expected inference times: seconds per page. Upgrading the NVIDIA driver (or pinning a cu12-compatible torch wheel newer than 2.7) will unlock GPU with no code changes.
|
|
|
|
## E2E smoke test (`scripts/e2e_smoke.py`)
|
|
|
|
What it does (from the Mac):
|
|
|
|
1. Checks `/healthz`.
|
|
2. Starts a tiny HTTP server on the Mac's LAN IP serving `tests/fixtures/synthetic_giro.pdf`.
|
|
3. Submits a `POST /jobs` with `use_case=bank_statement_header`, the fixture URL in `context.files`, and a Paperless-style OCR text in `context.texts` (to exercise the `text_agreement` cross-check).
|
|
4. Polls `GET /jobs/{id}` every 2 s until terminal or 120 s timeout.
|
|
5. Asserts: `status=="done"`, `bank_name` non-empty, `provenance.fields["result.closing_balance"].provenance_verified=True`, `text_agreement=True`, total elapsed `< 60s`.
|
|
|
|
Non-zero exit means the deploy is not healthy. Roll back via `git revert HEAD`.
|
|
|
|
## Operational checklists
|
|
|
|
### After `ollama pull` on the host
|
|
|
|
The `IX_DEFAULT_MODEL` env var on the server's `.env` must match something in `ollama list`. Changing the default means:
|
|
|
|
1. Edit `/home/server/Public/infoxtractor/app/.env` → `IX_DEFAULT_MODEL=<new>`.
|
|
2. `docker compose --project-directory /home/server/Public/infoxtractor/app restart`.
|
|
3. `curl http://192.168.68.42:8994/healthz` → confirm `ollama: ok`.
|
|
|
|
### If `/healthz` shows `ollama: degraded`
|
|
|
|
`qwen3:14b` (or the configured default) is not pulled. On the host:
|
|
```bash
|
|
ssh server@192.168.68.42 "docker exec ollama ollama pull qwen3:14b"
|
|
```
|
|
|
|
### If `/healthz` shows `ocr: fail`
|
|
|
|
Surya couldn't initialize (model missing, CUDA unavailable, OOM). First run can be slow — models download on first call. Check container logs:
|
|
```bash
|
|
ssh server@192.168.68.42 "docker logs infoxtractor --tail 200"
|
|
```
|
|
|
|
### If the container fails to start
|
|
|
|
```bash
|
|
ssh server@192.168.68.42 "tail -100 /tmp/infoxtractor-deploy.log"
|
|
ssh server@192.168.68.42 "docker compose -f /home/server/Public/infoxtractor/app/docker-compose.yml logs --tail 200"
|
|
```
|
|
|
|
## Monitoring
|
|
|
|
- Monitoring dashboard auto-discovers via the `infrastructure.web_url` label on the container: `http://192.168.68.42:8001` → "infoxtractor" card.
|
|
- Backup opt-in via `backup.enable=true` + `backup.type=postgres` + `backup.name=infoxtractor` labels. The daily backup script picks up the `infoxtractor` Postgres database automatically.
|
|
|
|
## Ports
|
|
|
|
| Port | Direction | Source | Service |
|
|
|------|-----------|--------|---------|
|
|
| 8994/tcp | ALLOW | 192.168.68.0/24 | ix REST + healthz (LAN only; not publicly exposed) |
|
|
|
|
No VPS Caddy entry; no `infrastructure.docs_url` label — this is an internal service.
|