Async on-prem LLM-powered structured information extraction microservice

Find a file

Dirk Riemann a418969251 All checks were successful tests / test (push) Successful in 1m23s Details tests / test (pull_request) Successful in 2m23s Details fix(deps): pin surya-ocr ^0.17 and drop cu124 index Our client code imports surya.foundation (added in 0.17). The earlier cu124 torch pin forced uv to downgrade surya to 0.14.1, which doesn't have that module and depends on a transformers version that lacks QuantizedCacheConfig. Net: ocr: fail at /healthz. Drop the cu124 index pin. surya 0.17.1 needs torch >= 2.7, which the default pypi torch (2.11) satisfies. The deploy host's CUDA 12.4 driver doesn't match torch 2.11's cu13 wheels, so CUDA init warns and the GPU isn't available — torch + Surya transparently fall back to CPU. Slower than GPU but correct for MVP. A host driver upgrade later will unlock GPU with no code changes. Unit suite stays green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-04-18 13:21:40 +02:00
.forgejo/workflows	ci: run on every push (not just main) so feat branches also get CI	2026-04-18 10:40:44 +02:00
alembic	feat(store): Alembic scaffolding + initial ix_jobs migration (spec §4)	2026-04-18 11:37:21 +02:00
docs	fix(deploy): switch to network_mode: host — reach postgis + ollama on loopback	2026-04-18 13:00:02 +02:00
scripts	chore(model): switch default IX_DEFAULT_MODEL to qwen3:14b (already on host)	2026-04-18 12:20:23 +02:00
src/ix	fix(deploy): switch to network_mode: host — reach postgis + ollama on loopback	2026-04-18 13:00:02 +02:00
tests	fix(deploy): switch to network_mode: host — reach postgis + ollama on loopback	2026-04-18 13:00:02 +02:00
.env.example	fix(deploy): switch to network_mode: host — reach postgis + ollama on loopback	2026-04-18 13:00:02 +02:00
.gitignore	feat(docker): Dockerfile (CUDA+python3.12) + compose with GPU reservation	2026-04-18 12:15:26 +02:00
.python-version	feat(scaffold): project skeleton with uv + pytest + forgejo CI	2026-04-18 10:36:43 +02:00
AGENTS.md	chore(model): switch default IX_DEFAULT_MODEL to qwen3:14b (already on host)	2026-04-18 12:20:23 +02:00
alembic.ini	feat(store): Alembic scaffolding + initial ix_jobs migration (spec §4)	2026-04-18 11:37:21 +02:00
docker-compose.yml	fix(deploy): switch to network_mode: host — reach postgis + ollama on loopback	2026-04-18 13:00:02 +02:00
Dockerfile	fix(docker): include README.md in the uv sync COPY so hatchling finds it	2026-04-18 12:42:29 +02:00
pyproject.toml	fix(deps): pin surya-ocr ^0.17 and drop cu124 index	2026-04-18 13:21:40 +02:00
README.md	Initial design: on-prem LLM extraction microservice MVP	2026-04-18 10:23:17 +02:00
uv.lock	fix(deps): pin surya-ocr ^0.17 and drop cu124 index	2026-04-18 13:21:40 +02:00

README.md

InfoXtractor (ix)

Async, on-prem, LLM-powered structured information extraction microservice.

Given a document (PDF, image, text) and a named use case, ix returns a structured JSON result whose shape matches the use-case schema — together with per-field provenance (OCR segment IDs, bounding boxes, cross-OCR agreement flags) that let the caller decide how much to trust each extracted value.

Status: design phase. Implementation about to start.

Full reference spec: docs/spec-core-pipeline.md (aspirational; MVP is a strict subset)
MVP design: docs/superpowers/specs/2026-04-18-ix-mvp-design.md
Agent / development notes: AGENTS.md

Principles

On-prem always. LLM = Ollama, OCR = local engines (Surya first). No OpenAI / Anthropic / Azure / AWS / cloud.
Grounded extraction, not DB truth. ix returns best-effort fields + provenance; the caller decides what to trust.
Transport-agnostic pipeline core. REST + Postgres-queue adapters in parallel on one job store.