chore(compose): pin project name to infoxtractor

Without `name:`, Compose infers the project from the parent directory (`app/` on the server), so containers show up under an "app" stack in the infra monitoring dashboard instead of "infoxtractor". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
chore: MVP deployed (#42 )
2026-04-18 19:57:16 +02:00 · 2026-04-18 12:08:21 +00:00 · 2026-04-18 14:08:07 +02:00 · 2026-04-18 12:05:46 +00:00 · 2026-04-18 14:05:28 +02:00 · 2026-04-18 12:02:38 +00:00
7 changed files with 236 additions and 14 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@ -4,7 +4,7 @@ Async, on-prem, LLM-powered structured information extraction microservice. Give
 Designed to be used by other on-prem services (e.g. mammon) as a reliable fallback / second opinion for format-specific deterministic parsers.
-Status: design phase. Full reference spec at `docs/spec-core-pipeline.md`. MVP spec will live at `docs/superpowers/specs/`.
+Status: MVP deployed (2026-04-18) at `http://192.168.68.42:8994` — LAN only. Full reference spec at `docs/spec-core-pipeline.md`; MVP spec at `docs/superpowers/specs/2026-04-18-ix-mvp-design.md`; deploy runbook at `docs/deployment.md`.
 ## Guiding Principles
--- a/README.md
+++ b/README.md
@ -4,10 +4,12 @@ Async, on-prem, LLM-powered structured information extraction microservice.
 Given a document (PDF, image, text) and a named *use case*, ix returns a structured JSON result whose shape matches the use-case schema — together with per-field provenance (OCR segment IDs, bounding boxes, cross-OCR agreement flags) that let the caller decide how much to trust each extracted value.
-**Status:** design phase. Implementation about to start.
+**Status:** MVP deployed. Live on the home LAN at `http://192.168.68.42:8994`.
 - Full reference spec: [`docs/spec-core-pipeline.md`](docs/spec-core-pipeline.md) (aspirational; MVP is a strict subset)
 - **MVP design:** [`docs/superpowers/specs/2026-04-18-ix-mvp-design.md`](docs/superpowers/specs/2026-04-18-ix-mvp-design.md)
 - **Implementation plan:** [`docs/superpowers/plans/2026-04-18-ix-mvp-implementation.md`](docs/superpowers/plans/2026-04-18-ix-mvp-implementation.md)
 - **Deployment runbook:** [`docs/deployment.md`](docs/deployment.md)
 - Agent / development notes: [`AGENTS.md`](AGENTS.md)
 ## Principles
@ -15,3 +17,44 @@ Given a document (PDF, image, text) and a named *use case*, ix returns a structu
 - **On-prem always.** LLM = Ollama, OCR = local engines (Surya first). No OpenAI / Anthropic / Azure / AWS / cloud.
 - **Grounded extraction, not DB truth.** ix returns best-effort fields + provenance; the caller decides what to trust.
 - **Transport-agnostic pipeline core.** REST + Postgres-queue adapters in parallel on one job store.
 ## Submitting a job
 ```bash
 curl -X POST http://192.168.68.42:8994/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "use_case": "bank_statement_header",
    "ix_client_id": "mammon",
    "request_id": "some-correlation-id",
    "context": {
      "files": [{
        "url": "http://paperless.local/api/documents/42/download/",
        "headers": {"Authorization": "Token …"}
      }],
      "texts": ["<Paperless Tesseract OCR content>"]
    }
  }'
 # → {"job_id":"…","ix_id":"…","status":"pending"}
 ```
 Poll `GET /jobs/{job_id}` until `status` is `done` or `error`. Optionally pass `callback_url` to receive a webhook on completion (one-shot, no retry; polling stays authoritative).
 Full REST surface + provenance response shape documented in the MVP design spec.
 ## Running locally
 ```bash
 uv sync --extra dev
 uv run pytest tests/unit -v                    # hermetic unit + integration suite
 IX_TEST_OLLAMA=1 uv run pytest tests/live -v    # needs LAN access to Ollama + GPU
 ```
 ## Deploying
 ```bash
 git push server main      # rebuilds Docker image, restarts container, /healthz deploy gate
 python scripts/e2e_smoke.py   # E2E acceptance against the live service
 ```
 See [`docs/deployment.md`](docs/deployment.md) for full runbook + rollback.
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -10,6 +10,8 @@
 # The GPU reservation block matches immich-ml / the shape Docker Compose
 # expects for GPU allocation on this host.
 name: infoxtractor
 services:
  infoxtractor:
    build: .
@ -24,8 +26,17 @@ services:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    volumes:
      # Persist Surya (datalab) + HuggingFace model caches so rebuilds don't
      # re-download ~1.5 GB of weights every time.
      - ix_surya_cache:/root/.cache/datalab
      - ix_hf_cache:/root/.cache/huggingface
    labels:
      infrastructure.web_url: "http://192.168.68.42:8994"
      backup.enable: "true"
      backup.type: "postgres"
      backup.name: "infoxtractor"
 volumes:
  ix_surya_cache:
  ix_hf_cache:
--- a/docs/deployment.md
+++ b/docs/deployment.md
@ -71,13 +71,30 @@ git push server main
 ## First deploy
-_(fill in after running — timestamps, commit sha, e2e_smoke output)_
+- **Date:** 2026-04-18
 - **Commit:** `fix/ollama-extract-json` (#36, the last of several Docker/ops follow-ups after PR #27 shipped the initial Dockerfile)
 - **`/healthz`:** all three probes (`postgres`, `ollama`, `ocr`) green. First-pass took ~7 min for the fresh container because Surya's recognition (1.34 GB) + detection (73 MB) models download from HuggingFace on first run; subsequent rebuilds reuse the named volumes declared in `docker-compose.yml` and come up in <30 s.
 - **E2E extraction:** `bank_statement_header` against `tests/fixtures/synthetic_giro.pdf` with Paperless-style texts:
  - Pipeline completes in **35 s**.
  - Extracted: `bank_name=DKB`, `account_iban=DE89370400440532013000`, `currency=EUR`, `opening_balance=1234.56`, `closing_balance=1450.22`, `statement_date=2026-03-31`, `statement_period_end=2026-03-31`, `statement_period_start=2026-03-01`, `account_type=null`.
  - Provenance: 8 / 9 leaf fields have sources; 7 / 8 `provenance_verified` and `text_agreement` are True. `statement_period_start` shows up in the OCR but normalisation fails (dateutil picks a different interpretation of the cited day); to be chased in a follow-up.
- **Date:** TBD
+### Docker-ops follow-ups that landed during the first deploy
- **Commit:** TBD
+
- **`/healthz` first-ok time:** TBD
+All small, each merged as its own PR. In commit order after the scaffold (#27):
- **`e2e_smoke.py` status:** TBD
+
- **Notes:** —
+- **#31** `fix(docker): uv via standalone installer` — Python 3.12 on Ubuntu 22.04 drops `distutils`; Ubuntu's pip needed it. Switched to the `uv` standalone installer, which has no pip dependency.
 - **#32** `fix(docker): include README.md in the uv sync COPY` — `hatchling` validates the readme file exists when resolving the editable project install.
 - **#33** `fix(compose): drop runtime: nvidia` — the deploy host's Docker daemon doesn't register a named `nvidia` runtime; `deploy.resources.devices` is sufficient and matches immich-ml.
 - **#34** `fix(deploy): network_mode: host` — `postgis` is bound to `127.0.0.1` on the host (security hardening T12). `host.docker.internal` points at the bridge gateway, not loopback, so the container couldn't reach postgis. Goldstein uses the same pattern.
 - **#35** `fix(deps): pin surya-ocr ^0.17` — earlier cu124 torch pin had forced surya to 0.14.1, which breaks our `surya.foundation` import and needs a transformers version that lacks `QuantizedCacheConfig`.
 - **#36** `fix(genai): drop Ollama format flag; extract trailing JSON` — Ollama 0.11.8 segfaults on Pydantic JSON Schemas (`$ref`, `anyOf`, `pattern`), and `format="json"` terminates reasoning models (qwen3) at `{}` because their `<think>…</think>` chain-of-thought isn't valid JSON. Omit the flag, inject the schema into the system prompt, extract the outermost `{…}` balanced block from the response.
 - **volumes** — named `ix_surya_cache` + `ix_hf_cache` mount `/root/.cache/datalab` + `/root/.cache/huggingface` so rebuilds don't re-download ~1.5 GB of model weights.
 Production notes:
 - `IX_DEFAULT_MODEL=qwen3:14b` (already pulled on the host). Spec listed `gpt-oss:20b` as a concrete example; swapped to keep the deploy on-prem without an extra `ollama pull`.
 - Torch 2.11 default cu13 wheels fall back to CPU against the host's CUDA 12.4 driver — Surya runs on CPU. Expected inference times: seconds per page. Upgrading the NVIDIA driver (or pinning a cu12-compatible torch wheel newer than 2.7) will unlock GPU with no code changes.
 ## E2E smoke test (`scripts/e2e_smoke.py`)
--- a/pyproject.toml
+++ b/pyproject.toml
@ -1,6 +1,8 @@
 [project]
 name = "infoxtractor"
 version = "0.1.0"
 # Released 2026-04-18 with the first live deploy of the MVP. See
 # docs/deployment.md §"First deploy" for the commit + /healthz times.
 description = "Async on-prem LLM-powered structured information extraction microservice"
 readme = "README.md"
 requires-python = ">=3.12"
--- a/src/ix/genai/ollama_client.py
+++ b/src/ix/genai/ollama_client.py
@ -96,8 +96,9 @@ class OllamaClient:
            ) from exc
        content = (payload.get("message") or {}).get("content") or ""
        json_blob = _extract_json_blob(content)
        try:
-            parsed = response_schema.model_validate_json(content)
+            parsed = response_schema.model_validate_json(json_blob)
        except ValidationError as exc:
            raise IXException(
                IXErrorCode.IX_002_001,
@ -159,16 +160,39 @@ class OllamaClient:
        request_kwargs: dict[str, Any],
        response_schema: type[BaseModel],
    ) -> dict[str, Any]:
-        """Map provider-neutral kwargs to Ollama's /api/chat body."""
+        """Map provider-neutral kwargs to Ollama's /api/chat body.
        Schema strategy for Ollama 0.11.8: we pass ``format="json"`` (loose
        JSON mode) and bake the Pydantic schema into a system message
        ahead of the caller's own system prompt. Rationale:
        * The full Pydantic schema as ``format=<schema>`` crashes llama.cpp's
          structured-output implementation (SIGSEGV) on every non-trivial
          shape — ``anyOf`` / ``$ref`` / ``pattern`` all trigger it.
        * ``format="json"`` alone guarantees valid JSON but not the shape;
          models routinely return ``{}`` when not told what fields to emit.
        * Injecting the schema into the prompt is the cheapest way to
          get both: the model sees the expected shape explicitly, Pydantic
          validates the response at parse time (IX_002_001 on mismatch).
        Non-Ollama ``GenAIClient`` impls can ignore this behaviour and use
        native structured-output (``response_format`` on OpenAI, etc.).
        """
        messages = self._translate_messages(
            list(request_kwargs.get("messages") or [])
        )
        messages = _inject_schema_system_message(messages, response_schema)
        body: dict[str, Any] = {
            "model": request_kwargs.get("model"),
            "messages": messages,
            "stream": False,
-            "format": response_schema.model_json_schema(),
+            # NOTE: format is deliberately omitted. `format="json"` made
            # reasoning models (qwen3) abort after emitting `{}` because the
            # constrained sampler terminated before the chain-of-thought
            # finished; `format=<schema>` segfaulted Ollama 0.11.8. Letting
            # the model stream freely and then extracting the trailing JSON
            # blob works for both reasoning and non-reasoning models.
        }
        options: dict[str, Any] = {}
@ -200,4 +224,117 @@ class OllamaClient:
        return out
 def _extract_json_blob(text: str) -> str:
    """Return the outermost balanced JSON object in ``text``.
    Reasoning models (qwen3, deepseek-r1) wrap their real answer in
    ``<think>…</think>`` blocks. Other models sometimes prefix prose or
    fence the JSON in ```json``` code blocks. Finding the last balanced
    ``{…}`` is the cheapest robust parse that works for all three shapes;
    a malformed response yields the full text and Pydantic catches it
    downstream as ``IX_002_001``.
    """
    start = text.find("{")
    if start < 0:
        return text
    depth = 0
    in_string = False
    escaped = False
    for i in range(start, len(text)):
        ch = text[i]
        if in_string:
            if escaped:
                escaped = False
            elif ch == "\\":
                escaped = True
            elif ch == '"':
                in_string = False
            continue
        if ch == '"':
            in_string = True
        elif ch == "{":
            depth += 1
        elif ch == "}":
            depth -= 1
            if depth == 0:
                return text[start : i + 1]
    return text[start:]
 def _inject_schema_system_message(
    messages: list[dict[str, Any]],
    response_schema: type[BaseModel],
 ) -> list[dict[str, Any]]:
    """Prepend a system message that pins the expected JSON shape.
    Ollama's ``format="json"`` mode guarantees valid JSON but not the
    field set or names. We emit the Pydantic schema as JSON and
    instruct the model to match it. If the caller already provides a
    system message, we prepend ours; otherwise ours becomes the first
    system turn.
    """
    import json as _json
    schema_json = _json.dumps(
        _sanitise_schema_for_ollama(response_schema.model_json_schema()),
        indent=2,
    )
    guidance = (
        "Respond ONLY with a single JSON object matching this JSON Schema "
        "exactly. No prose, no code fences, no explanations. All top-level "
        "properties listed in `required` MUST be present. Use null for "
        "fields you cannot confidently extract. The JSON Schema:\n"
        f"{schema_json}"
    )
    return [{"role": "system", "content": guidance}, *messages]
 def _sanitise_schema_for_ollama(schema: Any) -> Any:
    """Strip null branches from ``anyOf`` unions.
    Ollama 0.11.8's llama.cpp structured-output implementation segfaults on
    Pydantic v2's standard Optional pattern::
        {"anyOf": [{"type": "string"}, {"type": "null"}]}
    We collapse any ``anyOf`` that includes a ``{"type": "null"}`` entry to
    its non-null branch — single branch becomes that branch inline; multiple
    branches keep the union without null. This only narrows what the LLM is
    *told* it may emit; Pydantic still validates the real response and can
    accept ``None`` at parse time if the field is ``Optional``.
    Walk is recursive and structure-preserving. Other ``anyOf`` shapes (e.g.
    polymorphic unions without null) are left alone.
    """
    if isinstance(schema, dict):
        cleaned: dict[str, Any] = {}
        for key, value in schema.items():
            if key == "anyOf" and isinstance(value, list):
                non_null = [
                    _sanitise_schema_for_ollama(branch)
                    for branch in value
                    if not (isinstance(branch, dict) and branch.get("type") == "null")
                ]
                if len(non_null) == 1:
                    # Inline the single remaining branch; merge its keys into the
                    # parent so siblings like ``default``/``title`` are preserved.
                    only = non_null[0]
                    if isinstance(only, dict):
                        for ok, ov in only.items():
                            cleaned.setdefault(ok, ov)
                    else:
                        cleaned[key] = non_null
                elif len(non_null) == 0:
                    # Pathological: nothing left. Fall back to a permissive type.
                    cleaned["type"] = "string"
                else:
                    cleaned[key] = non_null
            else:
                cleaned[key] = _sanitise_schema_for_ollama(value)
        return cleaned
    if isinstance(schema, list):
        return [_sanitise_schema_for_ollama(item) for item in schema]
    return schema
 __all__ = ["OllamaClient"]
--- a/tests/unit/test_ollama_client.py
+++ b/tests/unit/test_ollama_client.py
@ -79,10 +79,19 @@ class TestInvokeHappyPath:
        body_json = json.loads(body)
        assert body_json["model"] == "gpt-oss:20b"
        assert body_json["stream"] is False
-        assert body_json["format"] == _Schema.model_json_schema()
+        # No `format` is sent: Ollama 0.11.8 segfaults on full schemas and
        # aborts to `{}` with `format=json` on reasoning models. Schema is
        # injected into the system prompt instead; we extract the trailing
        # JSON blob from the response and validate via Pydantic.
        assert "format" not in body_json
        assert body_json["options"]["temperature"] == 0.2
        assert "reasoning_effort" not in body_json
-        assert body_json["messages"] == [
+        # A schema-guidance system message is prepended to the caller's
        # messages so Ollama (format=json loose mode) emits the right shape.
        msgs = body_json["messages"]
        assert msgs[0]["role"] == "system"
        assert "JSON Schema" in msgs[0]["content"]
        assert msgs[1:] == [
            {"role": "system", "content": "You extract."},
            {"role": "user", "content": "Doc body"},
        ]
@ -116,7 +125,10 @@ class TestInvokeHappyPath:
        import json
        request_body = json.loads(httpx_mock.get_requests()[0].read())
-        assert request_body["messages"] == [
+        # First message is the auto-injected schema guidance; after that
        # the caller's user message has its text parts joined.
        assert request_body["messages"][0]["role"] == "system"
        assert request_body["messages"][1:] == [
            {"role": "user", "content": "part-a\npart-b"}
        ]
Author	SHA1	Message	Date
Dirk Riemann	f6934bdf2a	chore(compose): pin project name to `infoxtractor` All checks were successful tests / test (push) Successful in 2m6s Details Without `name:`, Compose infers the project from the parent directory (`app/` on the server), so containers show up under an "app" stack in the infra monitoring dashboard instead of "infoxtractor". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 19:57:16 +02:00
goldstein	ce33aff174	chore: MVP deployed (#42 ) All checks were successful tests / test (push) Successful in 1m14s Details	2026-04-18 12:08:21 +00:00
Dirk Riemann	842c4da90c	chore: MVP deployed — readme, AGENTS.md status, deploy runbook filled in All checks were successful tests / test (push) Successful in 1m16s Details tests / test (pull_request) Successful in 1m12s Details First deploy done 2026-04-18. E2E extraction of the bank_statement_header use case completes in 35 s against the live service, with 7 of 9 header fields provenance-verified + text-agreement-green. closing_balance asserts from spec §12 all pass. Updates: - README.md: status -> "MVP deployed"; worked example curl snippet; pointers to deployment runbook + spec + plan. - AGENTS.md: status line updated with the live URL + date. - pyproject.toml: version comment referencing the first deploy. - docs/deployment.md: "First deploy" section filled in with times, field-level extraction result, plus a log of every small Docker/ops follow-up PR that had to land to make the first deploy healthy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 14:08:07 +02:00
goldstein	95a576f744	fix(genai): extract trailing JSON (#41 ) Some checks are pending tests / test (push) Waiting to run Details	2026-04-18 12:05:46 +00:00
Dirk Riemann	81e3b9a7d0	fix(genai): drop Ollama format flag; extract trailing JSON from response All checks were successful tests / test (push) Successful in 1m30s Details tests / test (pull_request) Successful in 1m21s Details qwen3:14b (and deepseek-r1, other reasoning models) wrap their output in <think>…</think> chains-of-thought before emitting real output. With format=json the constrained sampler terminated immediately at `{}` because the thinking block wasn't valid JSON; without format the model thinks normally and appends the actual JSON at the end. OllamaClient now omits the format flag and extracts the outermost balanced `{…}` block from the response (brace depth counter, string- literal aware). Works for reasoning models, ```json``` code-fenced outputs, and plain JSON alike. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 14:05:28 +02:00
goldstein	763407ba1c	fix(genai): schema in prompt (#40 ) Some checks failed tests / test (push) Has been cancelled Details	2026-04-18 12:02:38 +00:00
Dirk Riemann	34f8268cd5	fix(genai): inject JSON schema into Ollama system prompt All checks were successful tests / test (push) Successful in 1m8s Details tests / test (pull_request) Successful in 1m18s Details format=json loose mode gives valid JSON but no shape — models default to emitting {} when the system prompt doesn't list fields. Prepend a schema-guidance system message with the full Pydantic schema (after the existing null-branch sanitiser) so the model sees exactly what shape to produce. Pydantic still validates on parse. Unit tests updated to check the schema message is prepended without disturbing the caller's own messages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 14:02:25 +02:00
goldstein	9c73895318	fix(genai): ollama loose JSON (#39 ) Some checks failed tests / test (push) Has been cancelled Details	2026-04-18 11:59:18 +00:00
Dirk Riemann	2efc4d1088	fix(genai): send format="json" (loose mode) to Ollama All checks were successful tests / test (push) Successful in 1m13s Details tests / test (pull_request) Successful in 1m23s Details Ollama 0.11.8 segfaults on any Pydantic-shaped structured-output schema with $ref, anyOf, or pattern — confirmed on the deploy host with the simplest MVP case (BankStatementHeader alone). The earlier null-stripping sanitiser wasn't enough. Switch to format="json", which is "emit valid JSON" mode. We're already describing the exact JSON shape in the system prompt (via GenAIStep + the use case's citation instruction appendix) and validating the response body through Pydantic on parse — which raises IX_002_001 on schema mismatch, exactly as before. Stronger guarantees can come back later via a newer Ollama, an API fix, or a different GenAIClient impl. None of that is needed for the MVP to work end to end. Unit tests: the sanitiser left in place (harmless, still tested). The "happy path" test now asserts format == "json". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 13:59:04 +02:00
goldstein	f6ce97d7fd	fix(compose): persist model caches (#38 ) All checks were successful tests / test (push) Successful in 1m8s Details	2026-04-18 11:49:24 +00:00
Dirk Riemann	9e33923f71	fix(compose): persist Surya + HF caches so rebuilds don't redownload models All checks were successful tests / test (push) Successful in 2m1s Details tests / test (pull_request) Successful in 1m18s Details First /healthz call on a fresh container triggers Surya to fetch the text-recognition (1.34 GB) and detection (73 MB) models from HuggingFace. Without a volume they land in the container fs and vanish on every rebuild, which is every deploy. Mount named volumes for /root/.cache/datalab (Surya) and /root/.cache/huggingface. Rebuild now keeps /healthz warm. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 13:49:09 +02:00
goldstein	65670af78f	fix(genai): sanitise Optional for Ollama (#37 ) Some checks are pending tests / test (push) Waiting to run Details	2026-04-18 11:48:43 +00:00
Dirk Riemann	9cb62d69af	fix(genai): strip null branches from anyOf before sending to Ollama All checks were successful tests / test (push) Successful in 1m33s Details tests / test (pull_request) Successful in 4m29s Details Ollama 0.11.8's llama.cpp structured-output implementation segfaults on Pydantic v2's standard Optional pattern: {"anyOf": [{"type": "string"}, {"type": "null"}]} Confirmed on the deploy host: /api/chat request with the MVP's ProvenanceWrappedResponse schema crashed Ollama with SIGSEGV; the client saw httpx RemoteProtocolError → IX_002_000. New _sanitise_schema_for_ollama walks the schema recursively and drops "type: null" branches from every anyOf. Single-branch unions are inlined so sibling keys (default, title) survive. This only narrows what the LLM is told it may emit; Pydantic still validates the real response body against the original schema and accepts None for Optional fields if they were absent or explicitly null. Existing unit tests updated: the "happy path" test no longer pins the format to `_Schema.model_json_schema()` verbatim — instead it asserts the sanitisation effect on a known-Optional field. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 13:48:26 +02:00
goldstein	4c0746950e	fix(deps): surya ^0.17 (#36 ) All checks were successful tests / test (push) Successful in 2m58s Details	2026-04-18 11:21:54 +00:00