infoxtractor/docs/superpowers/plans/2026-04-18-ix-mvp-implementation.md
Dirk Riemann 86538ee8de Implementation plan for ix MVP
Detailed, TDD-structured plan with 5 chunks covering ~30 feature-branch
tasks from foundation scaffolding through first live deploy + E2E smoke.
Each task is one PR; pipeline core comes hermetic-first, real Surya/Ollama
clients in Chunk 4, containerization + first deploy in Chunk 5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 10:34:30 +02:00

34 KiB

InfoXtractor MVP — Implementation Plan

For agentic workers: REQUIRED: Use superpowers:subagent-driven-development (if subagents available) or superpowers:executing-plans to implement this plan. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Ship an on-prem, async, LLM-powered structured-extraction microservice with one end-to-end use case (bank_statement_header) reachable from mammon via REST.

Architecture: FastAPI container + single asyncio worker + shared postgis job store. Transport-agnostic pipeline core with pluggable OCRClient (Surya) and GenAIClient (Ollama). Provenance-based reliability signals per field. No cloud services.

Tech Stack: Python 3.12, FastAPI, uvicorn, SQLAlchemy 2.0 async, asyncpg, Alembic, Pydantic v2, pydantic-settings, httpx, PyMuPDF, python-magic, Pillow, surya-ocr (CUDA), Ollama REST API, pytest (+ pytest-asyncio, pytest-postgresql fixture), uv for dep management.

Spec reference: docs/superpowers/specs/2026-04-18-ix-mvp-design.md (approved 2026-04-18 after 2 review rounds).

Habits (inherited from mammon; see AGENTS.md):

  • Every task lands as its own feature branch: feat/<task-slug>.
  • TDD: write failing test → minimal implementation → green → refactor → commit.
  • Every commit updates code + tests + docs (AGENTS.md, README.md, section in this plan) in one shot.
  • Push: git push forgejo feat/<task-slug> → PR via Forgejo API → wait for CI green → merge → git push server main to deploy.
  • Never skip hooks, never force-push main, never amend merged commits.
  • After each deploy, run scripts/e2e_smoke.py against the live service.

File structure (target)

infoxtractor/
├── AGENTS.md                     # Guiding principles, habits, stack, deploy
├── README.md                     # One-paragraph + pointers
├── Dockerfile                    # nvidia/cuda:12.4 + python 3.12 + surya
├── docker-compose.yml            # GPU reservation, env_file, monitoring labels
├── pyproject.toml                # deps via uv; no setup.py
├── uv.lock
├── alembic.ini
├── alembic/
│   ├── env.py
│   └── versions/
│       └── 001_initial_ix_jobs.py
├── src/ix/
│   ├── __init__.py
│   ├── app.py                    # FastAPI create_app factory + lifespan
│   ├── config.py                 # AppConfig (pydantic-settings)
│   ├── errors.py                 # IXException + IX_* error codes
│   ├── logging.py                # JSON formatter, ix_id context
│   ├── contracts/                # Pydantic data contracts
│   │   ├── __init__.py
│   │   ├── request.py            # RequestIX, Context, FileRef, Options, *Options
│   │   ├── response.py           # ResponseIX, IXResult, OCRResult, OCRDetails, Page, Line, Metadata
│   │   ├── provenance.py         # ProvenanceData, FieldProvenance, ExtractionSource, BoundingBox, SegmentCitation
│   │   └── job.py                # Job envelope (status enum, lifecycle)
│   ├── use_cases/
│   │   ├── __init__.py           # REGISTRY
│   │   └── bank_statement_header.py
│   ├── pipeline/
│   │   ├── __init__.py
│   │   ├── step.py               # Step ABC
│   │   ├── pipeline.py           # Pipeline orchestrator + Timer
│   │   ├── setup_step.py
│   │   ├── ocr_step.py
│   │   ├── genai_step.py
│   │   ├── reliability_step.py
│   │   └── response_handler_step.py
│   ├── segmentation/
│   │   ├── __init__.py
│   │   └── segment_index.py      # SegmentIndex + prompt formatting
│   ├── provenance/
│   │   ├── __init__.py
│   │   ├── mapper.py             # map_segment_refs_to_provenance
│   │   ├── normalize.py          # normalizers (string, number, date, iban)
│   │   └── verify.py             # provenance_verified + text_agreement logic
│   ├── ocr/
│   │   ├── __init__.py
│   │   ├── client.py             # OCRClient Protocol
│   │   ├── fake.py               # FakeOCRClient (for tests)
│   │   └── surya_client.py       # SuryaOCRClient (real)
│   ├── genai/
│   │   ├── __init__.py
│   │   ├── client.py             # GenAIClient Protocol, GenAIInvocationResult, GenAIUsage
│   │   ├── fake.py               # FakeGenAIClient (for tests)
│   │   └── ollama_client.py      # OllamaClient (real)
│   ├── ingestion/
│   │   ├── __init__.py
│   │   ├── fetch.py              # file downloader w/ auth headers, timeouts, size cap
│   │   ├── mime.py               # python-magic wrapper
│   │   └── pages.py              # DocumentIngestor: PDF/image/text → Page list
│   ├── store/
│   │   ├── __init__.py
│   │   ├── models.py             # SQLAlchemy ORM for ix_jobs
│   │   ├── engine.py             # lazy async engine, session factory
│   │   └── jobs_repo.py          # claim_next, insert, get, list_by_correlation, sweep_orphans, update
│   ├── worker/
│   │   ├── __init__.py
│   │   ├── loop.py               # worker task: claim → run pipeline → deliver callback
│   │   └── callback.py           # one-shot webhook delivery
│   ├── adapters/
│   │   ├── __init__.py
│   │   ├── rest/
│   │   │   ├── __init__.py
│   │   │   ├── routes.py         # /jobs, /jobs/{id}, /jobs (list), /healthz, /metrics
│   │   │   └── schemas.py        # request/response bodies
│   │   └── pg_queue/
│   │       ├── __init__.py
│   │       └── listener.py       # LISTEN ix_jobs_new + 10s fallback poll
│   └── metrics/
│       ├── __init__.py
│       └── counters.py           # plain-JSON counter queries
├── tests/
│   ├── __init__.py
│   ├── conftest.py
│   ├── fixtures/
│   │   ├── synthetic_giro.pdf    # generated from template
│   │   └── ocr_canned/           # canned Surya outputs for integration tests
│   ├── unit/
│   │   ├── test_contracts.py
│   │   ├── test_errors.py
│   │   ├── test_segment_index.py
│   │   ├── test_provenance_normalize.py
│   │   ├── test_provenance_verify.py
│   │   ├── test_provenance_mapper.py
│   │   ├── test_setup_step.py
│   │   ├── test_ocr_step.py
│   │   ├── test_genai_step.py
│   │   ├── test_reliability_step.py
│   │   ├── test_response_handler_step.py
│   │   ├── test_pipeline.py
│   │   ├── test_use_case_registry.py
│   │   ├── test_ingestion_fetch.py
│   │   ├── test_ingestion_pages.py
│   │   └── test_use_case_bank_statement_header.py
│   ├── integration/
│   │   ├── test_jobs_repo.py
│   │   ├── test_rest_adapter.py
│   │   ├── test_pg_queue_adapter.py
│   │   ├── test_worker_loop.py
│   │   └── test_pipeline_end_to_end.py  # fakes, real DB
│   └── live/
│       └── test_ollama_surya_smoke.py   # gated on IX_TEST_OLLAMA=1
├── scripts/
│   ├── e2e_smoke.py                 # post-deploy gate (Mac → :8994)
│   ├── create_fixture_pdf.py        # builds synthetic_giro.pdf deterministically
│   └── forgejo_pr.py                # wrapper: create branch → PR → merge
├── .env.example
├── .gitignore
└── docs/
    ├── spec-core-pipeline.md
    └── superpowers/
        ├── specs/2026-04-18-ix-mvp-design.md
        └── plans/2026-04-18-ix-mvp-implementation.md   (this file)

Boundary rules:

  • pipeline/ knows about contracts/, segmentation/, provenance/, ocr.client, genai.client. NOT store/, adapters/, worker/.
  • adapters/ knows about store/ and contracts/, NOT pipeline/ directly (it hands requests to the job store; the worker pulls from the store).
  • worker/ knows about store/, pipeline/, contracts/. Bridges the two.
  • store/ knows only about contracts/ (for JSONB serialization).
  • ocr/surya_client.py and genai/ollama_client.py are the only files that import external libraries beyond stdlib/FastAPI/SQLAlchemy — all other modules stay hermetic.

Chunk 1: Foundation (scaffolding + contracts + use case + SegmentIndex)

Purpose: Set up the project skeleton and land the data contracts, error model, use-case registry with the first use case, and SegmentIndex. No pipeline, no transport, no LLM — just the Pydantic/types core plus the scaffolding to run tests and CI.

Task 1.1: Project scaffolding

Branch: feat/scaffold

Files:

  • Create: pyproject.toml, .python-version, uv.lock, .env.example

  • Create: src/ix/__init__.py

  • Create: tests/__init__.py, tests/conftest.py

  • Create: .forgejo/workflows/ci.yml (pytest on push + PR)

  • Create: pytest.ini with asyncio_mode=auto

  • Write pyproject.toml with deps: fastapi, uvicorn[standard], sqlalchemy[asyncio]>=2, asyncpg, alembic, pydantic>=2, pydantic-settings, httpx, pymupdf, python-magic, pillow, python-dateutil. Dev: pytest, pytest-asyncio, pytest-httpx, ruff, mypy.

  • .env.example: every var from spec §9, all placeholders obvious (<password>, <host-internal-url>).

  • pytest.ini: asyncio_mode = auto, collect from tests/.

  • .forgejo/workflows/ci.yml: runs uv sync && uv run pytest tests/unit tests/integration -v; service container postgres:16; env IX_POSTGRES_URL points at service. Excludes tests/live/ (real Ollama).

  • Commit, push branch, create PR, merge, deploy — no deploy needed yet (nothing to deploy). Stop at merge.

Task 1.2: Error model

Branch: feat/errors

Files:

  • Create: src/ix/errors.py

  • Create: tests/unit/test_errors.py

  • Write failing tests for IXException and every IX_* code in spec §8. Each code is a class attribute: IXError.IX_000_000 = "IX_000_000: request_ix is None". Exception carries code + detail.

  • Implement enum-like class with __str__ producing "IX_000_000: message (detail=...)".

  • Green, commit, PR, merge.

Task 1.3: Data contracts — RequestIX / Options / Context / FileRef

Branch: feat/contracts-request

Files:

  • Create: src/ix/contracts/request.py

  • Create: src/ix/contracts/__init__.py

  • Create: tests/unit/test_contracts.py

  • Failing tests: round-trip RequestIX.model_validate_json(...) with each shape in spec §3. String-or-FileRef union. Defaults: include_provenance=True, service="surya", etc. Validation errors on unknown fields.

  • Pydantic models per spec §3. Options, OCROptions, GenAIOptions, ProvenanceOptions, Context, FileRef, RequestIX.

  • Green, commit, PR, merge.

Task 1.4: Data contracts — ResponseIX / Provenance / IXResult / OCRResult / Metadata / Job

Branch: feat/contracts-response

Files:

  • Create: src/ix/contracts/response.py

  • Create: src/ix/contracts/provenance.py

  • Create: src/ix/contracts/job.py

  • Modify: src/ix/contracts/__init__.py (export)

  • Modify: tests/unit/test_contracts.py

  • Failing tests:

    • FieldProvenance with new provenance_verified / text_agreement fields; None-allowed.
    • quality_metrics keys: fields_with_provenance, total_fields, coverage_rate, invalid_references, verified_fields, text_agreement_fields.
    • ResponseIX.context excluded from model_dump (use exclude).
    • Job envelope, status is a Literal, callback_status starts as None.
  • Implement per spec §3 + §9.3. ResponseIX.context uses Field(exclude=True); a sibling internal model _InternalContext holds pages, files, texts, use_case_request, use_case_response, segment_index. Keep it simple: one class, Field(exclude=True) on the attribute.

  • Green, commit, PR, merge.

Task 1.5: Use-case registry and first use case

Branch: feat/use-case-bank-statement-header

Files:

  • Create: src/ix/use_cases/__init__.py (REGISTRY)

  • Create: src/ix/use_cases/bank_statement_header.py

  • Create: tests/unit/test_use_case_registry.py

  • Create: tests/unit/test_use_case_bank_statement_header.py

  • Failing tests: REGISTRY["bank_statement_header"] returns (Request, BankStatementHeader); unknown name raises IX_001_001; Request().system_prompt contains "extract header metadata" substring.

  • Implement per spec §7. Pydantic models. Register on import of the module (side-effect registration, or explicit registry assembly in __init__.py). Prefer explicit — REGISTRY = {"bank_statement_header": (Request, BankStatementHeader)} — no import-time side effects.

  • Green, commit, PR, merge.

Task 1.6: SegmentIndex

Branch: feat/segment-index

Files:

  • Create: src/ix/segmentation/__init__.py

  • Create: src/ix/segmentation/segment_index.py

  • Create: tests/unit/test_segment_index.py

  • Failing tests (take from spec §9.1):

    • build() assigns IDs p1_l0, p1_l1, … across the flat page list.
    • <page> tag lines are excluded from IDs.
    • lookup_segment("p1_l0") returns {page, bbox, text, file_index}; unknown → None.
    • to_prompt_text() emits "[p1_l0] text\n…" and appends raw context.texts untagged at the end.
    • BoundingBox normalization divides by page width/height.
  • Implement. SegmentIndex is built from an OCRResult + pages metadata; holds _id_to_position: dict[str, dict] and _ordered_ids: list[str].

  • Green, commit, PR, merge.

Task 1.7: Provenance normalizers

Branch: feat/provenance-normalize

Files:

  • Create: src/ix/provenance/__init__.py

  • Create: src/ix/provenance/normalize.py

  • Create: tests/unit/test_provenance_normalize.py

  • Failing tests for each normalizer in spec §6 ReliabilityStep:

    • String: " FOO bar!!! ""foo bar" (after NFKC + casefold + whitespace collapse + punctuation strip).
    • Number: "CHF 1'234.56"Decimal("1234.56") → same canonical form.
    • Date: "31.03.2026"date(2026,3,31)"2026-03-31" via dateutil(dayfirst=True).
    • IBAN: "de 89 3704 0044 0532 0130 00""DE89370400440532013000".
    • Short-value rule: _should_skip_text_agreement("0", field_type=int)True; "AB" for str → True.
  • Implement. Pure functions, no external state, fully unit-testable.

  • Green, commit, PR, merge.

Task 1.8: Provenance mapper + verifier

Branch: feat/provenance-mapper-verifier

Files:

  • Create: src/ix/provenance/mapper.py (map_segment_refs_to_provenance per spec §9.4)

  • Create: src/ix/provenance/verify.py (verify_field_value + text_agreement_for_field)

  • Create: tests/unit/test_provenance_mapper.py

  • Create: tests/unit/test_provenance_verify.py

  • Failing tests for mapper: given fake SegmentIndex + fake segment_citations → correct FieldProvenance.sources; invalid_references count; value resolution via dot-path ("result.invoice_number", "items.0.name"); max_sources_per_field cap.

  • Failing tests for verifier: provenance_verified true/false per field type; text_agreement with and without context.texts; Literal → None; None value → None; short value → text_agreement None; date parses both sides.

  • Implement; pure functions.

  • Green, commit, PR, merge.

Chunk 1 end state: pytest tests/unit runs green locally and in Forgejo Actions. No runtime service yet. ~8 merged PRs to main. Time estimate: one focused afternoon.


Chunk 2: Pipeline core

Purpose: Wire up the Step ABC + Pipeline orchestrator + all five steps + fake OCR/GenAI clients. At end of chunk, pipeline runs end-to-end with fakes and produces a full ResponseIX for bank_statement_header, entirely hermetic.

Task 2.1: Step ABC + Pipeline orchestrator + Timer

Branch: feat/pipeline-core

Files:

  • Create: src/ix/pipeline/__init__.py

  • Create: src/ix/pipeline/step.py

  • Create: src/ix/pipeline/pipeline.py

  • Create: tests/unit/test_pipeline.py

  • Failing tests using synthetic steps: order preserved; validate=False skips step; validate raise → error written + abort; process raise → error written + abort; each step's elapsed seconds added to metadata.timings.

  • Implement per spec §3/§4. Pipeline(steps=[...]). _execute_step wraps in timer + try/except, sets response_ix.error on raise.

  • Green, commit, PR, merge.

Task 2.2: OCRClient and GenAIClient protocols + fakes

Branch: feat/client-protocols

Files:

  • Create: src/ix/ocr/__init__.py, src/ix/ocr/client.py, src/ix/ocr/fake.py

  • Create: src/ix/genai/__init__.py, src/ix/genai/client.py, src/ix/genai/fake.py

  • Create: tests/unit/test_ocr_fake.py, tests/unit/test_genai_fake.py

  • Failing tests: FakeOCRClient(canned=OCRResult(...)) returns the canned result; FakeGenAIClient(parsed=MyModel(...)) returns a GenAIInvocationResult with that parsed instance + stubbed usage.

  • Implement Protocols + fakes. Protocols are @runtime_checkable.

  • Green, commit, PR, merge.

Task 2.3: Ingestion — fetch + MIME + pages

Branch: feat/ingestion

Files:

  • Create: src/ix/ingestion/__init__.py

  • Create: src/ix/ingestion/fetch.py

  • Create: src/ix/ingestion/mime.py

  • Create: src/ix/ingestion/pages.py

  • Create: tests/unit/test_ingestion_fetch.py (pytest-httpx mocks)

  • Create: tests/unit/test_ingestion_pages.py (fixture PDFs/images)

  • Failing tests:

    • fetch_file(FileRef, …) passes headers; size cap raises IX_000_007; timeout raises IX_000_007; non-2xx raises IX_000_007.
    • detect_mime(bytes) classifies PDF/PNG/JPEG/TIFF correctly; unknown raises IX_000_005.
    • DocumentIngestor.build_pages(files, texts): PDF with 3 pages → 3 Page objects with page_no/width/height; multi-frame TIFF → multiple Pages; plain text entry → one Page; >100 PDF pages raises IX_000_006.
  • Implement. fetch_file uses httpx AsyncClient with timeouts from config, stream=True to enforce size cap incrementally. DocumentIngestor uses PyMuPDF for PDFs, PIL for images.

  • Green, commit, PR, merge.

Task 2.4: SetupStep

Branch: feat/step-setup

Files:

  • Create: src/ix/pipeline/setup_step.py

  • Create: tests/unit/test_setup_step.py

  • Failing tests:

    • validate raises IX_000_000 if request None; IX_000_002 if no files+no texts.
    • process downloads files (pytest-httpx mocks), assembles response_ix.context.pages, loads use case; unknown use case → IX_001_001.
  • Implement per spec §6. Use ingestion.fetch_file + DocumentIngestor.

  • Green, commit, PR, merge.

Task 2.5: OCRStep

Branch: feat/step-ocr

Files:

  • Create: src/ix/pipeline/ocr_step.py

  • Create: tests/unit/test_ocr_step.py

  • Failing tests:

    • validate raises IX_000_004 when geometries/text/ocr_only set but no files.
    • validate returns False for pure-text requests.
    • process runs FakeOCRClient, injects page tags, builds SegmentIndex when provenance on.
  • Implement per spec §6.

  • Green, commit, PR, merge.

Task 2.6: GenAIStep

Branch: feat/step-genai

Files:

  • Create: src/ix/pipeline/genai_step.py

  • Create: tests/unit/test_genai_step.py

  • Failing tests:

    • System prompt concat with citation instruction when provenance on.
    • Text content format: [p1_l0] foo\n[p1_l1] bar.
    • Response schema wrapped in ProvenanceWrappedResponse when provenance on.
    • FakeGenAIClient returns parsed result → written to ix_result.result.
    • IX_002_000 / IX_002_001 surfaced on client raise.
    • Provenance mapping produces ProvenanceData with the expected field paths.
  • Implement per spec §6. Use provenance.mapper.map_segment_refs_to_provenance.

  • Green, commit, PR, merge.

Task 2.7: ReliabilityStep

Branch: feat/step-reliability

Files:

  • Create: src/ix/pipeline/reliability_step.py

  • Create: tests/unit/test_reliability_step.py

  • Failing tests:

    • Skipped when include_provenance=False.
    • Per-type dispatch: Literal field → None flags; None value → None flags; short value → text_agreement=None.
    • Dates parse both sides before comparison.
    • Counters verified_fields and text_agreement_fields written.
    • Tests using BankStatementHeader + concrete ProvenanceData → exact flag values.
  • Implement using provenance.verify + provenance.normalize; dispatch via type hints on the use-case response schema (introspect via get_type_hints).

  • Green, commit, PR, merge.

Task 2.8: ResponseHandlerStep

Branch: feat/step-response-handler

Files:

  • Create: src/ix/pipeline/response_handler_step.py

  • Create: tests/unit/test_response_handler_step.py

  • Failing tests per spec §8: attach OCR text; strip geometries when not requested; delete context.

  • Implement.

  • Green, commit, PR, merge.

Task 2.9: End-to-end pipeline test with fakes

Branch: feat/pipeline-e2e-fakes

Files:

  • Create: tests/unit/test_pipeline_end_to_end.py

  • Create: tests/fixtures/synthetic_giro.pdf (generated)

  • Create: scripts/create_fixture_pdf.py

  • scripts/create_fixture_pdf.py builds a deterministic PDF with known header fields (bank name, IBAN, period, balances) using reportlab or PyMuPDF. Script re-runs on demand; output is committed.

  • Failing test: feed the fixture + canned OCR + canned LLM response through the full Pipeline([Setup, OCR, GenAI, Reliability, ResponseHandler]) and assert response_ix.ix_result.result == expected, provenance_verified[closing_balance] is True, timings populated.

  • Implement — only wiring; no new logic.

  • Green, commit, PR, merge.

Chunk 2 end state: Full pipeline runs in tests with fakes. No DB, no transport, no real clients. Running pytest tests/unit -v goes green end-to-end. ~9 merged PRs.


Chunk 3: Job store + REST adapter + worker loop

Purpose: Persist jobs in Postgres, expose REST endpoints, run the worker task in the FastAPI lifespan, deliver callbacks. At chunk end, the container (locally or on the server) accepts POST /jobs, runs the fake-backed pipeline against a real DB, and returns results via polling or callback.

Task 3.1: Alembic scaffolding + initial migration

Branch: feat/alembic-init

Files:

  • Create: alembic.ini, alembic/env.py, alembic/versions/001_initial_ix_jobs.py

  • Create: src/ix/store/__init__.py, src/ix/store/models.py, src/ix/store/engine.py

  • alembic/env.py: async engine, NullPool, reads IX_POSTGRES_URL.

  • models.py: IxJob ORM mapping to ix_jobs table per spec §4.

  • 001_initial_ix_jobs.py: CREATE TABLE + indexes (including UNIQUE on (client_id, request_id)). No NOTIFY trigger (NOTIFY is fired by writers, not DDL).

  • Smoke test: alembic upgrade head against a disposable postgres (via docker) creates the table and indexes. No unit test here — verified in integration tests.

  • Commit, PR, merge.

Task 3.2: Config module (AppConfig)

Branch: feat/config

Files:

  • Create: src/ix/config.py

  • Create: tests/unit/test_config.py

  • Failing tests: every env var in spec §9 loads from IX_* env; defaults match spec.

  • Implement AppConfig via pydantic-settings; no .env auto-load in tests — use monkeypatch.

  • Green, commit, PR, merge.

Task 3.3: JobsRepo (store CRUD)

Branch: feat/jobs-repo

Files:

  • Create: src/ix/store/jobs_repo.py

  • Create: tests/integration/test_jobs_repo.py

  • Create: tests/conftest.py fixtures — postgres_url, engine, session_factory (Forgejo CI service container).

  • Failing tests (integration, need real DB):

    • insert_pending(request, callback_url) creates row; returns job_id + ix_id.
    • Inserting with existing (client_id, request_id) returns the existing job_id (idempotency), status unchanged.
    • claim_next_pending() → locks a pending row and updates to running; returns None if none available; concurrent callers each claim distinct rows (SKIP LOCKED).
    • get(job_id) returns the full Job with nested request/response parsed.
    • mark_done(job_id, response) / mark_error(job_id, response_with_error) / update_callback_status(...).
    • sweep_orphans(now, max_running_age) → returns list of rescued job IDs; their status goes back to pending, attempts++.
  • Implement using SQLAlchemy 2.0 async. Each method is a single transaction.

  • Green, commit, PR, merge.

Task 3.4: FastAPI app + REST routes

Branch: feat/rest-adapter

Files:

  • Create: src/ix/app.py

  • Create: src/ix/adapters/__init__.py, src/ix/adapters/rest/__init__.py, src/ix/adapters/rest/routes.py, src/ix/adapters/rest/schemas.py

  • Create: tests/integration/test_rest_adapter.py

  • Failing tests (integration, FastAPI TestClient + real DB):

    • POST /jobs with valid body → 201, returns {job_id, ix_id, status: "pending"}.
    • POST /jobs idempotent on (client_id, request_id) — second call returns same job_id with 200.
    • GET /jobs/{id} returns the Job shape; 404 on unknown.
    • GET /jobs?client_id=…&request_id=… returns the row or 404.
    • GET /healthz returns JSON with postgres/ollama/ocr keys. In tests, ollama and ocr are mocked via dependency-injection hook.
    • GET /metrics returns 24h counters.
  • Implement create_app() factory. Lifespan: create engine, run alembic upgrade head, spawn worker task (Chunk 3.5), tear down on shutdown.

  • Green, commit, PR, merge.

Task 3.5: Worker loop + callback delivery

Branch: feat/worker-loop

Files:

  • Create: src/ix/worker/__init__.py, src/ix/worker/loop.py, src/ix/worker/callback.py

  • Modify: src/ix/app.py (lifespan spawns worker task)

  • Create: tests/integration/test_worker_loop.py

  • Failing tests (integration):

    • Worker claims a pending job, runs a fake pipeline, writes response, updates status to done.
    • On pipeline exception: status → error, response carries the error code.
    • On callback_url set and 200 response: callback_statusdelivered.
    • On callback 500 or timeout: callback_statusfailed; status stays done/error.
    • Worker startup orphan sweep: job left in running with started_at < now - 2 * per_job_timeout → reset to pending, attempts++.
  • Implement. Worker pipeline factory injected — tests pass a stub; production wiring builds the real Pipeline with FakeOCRClient / FakeGenAIClient for now (Chunk 4 swaps them).

  • Green, commit, PR, merge.

Task 3.6: Postgres queue adapter

Branch: feat/pg-queue-adapter

Files:

  • Create: src/ix/adapters/pg_queue/__init__.py, src/ix/adapters/pg_queue/listener.py

  • Modify: src/ix/app.py (lifespan spawns listener task if enabled)

  • Create: tests/integration/test_pg_queue_adapter.py

  • Failing tests (integration):

    • Caller inserts a row directly and NOTIFY ix_jobs_new, '<job_id>' → worker picks it up within 1 s.
    • Missed NOTIFY (e.g., listener not started yet) → 10 s fallback poll finds the row.
  • Implement. listener.py opens a dedicated asyncpg connection (outside the SQLAlchemy pool) to run LISTEN; on notify, emits an asyncio event the worker wait_for_notify_or_poll(10s) reacts to.

  • Green, commit, PR, merge.

Chunk 3 end state: FastAPI container serves the REST API, backed by a real Postgres. Pipeline still uses fakes under the hood (real Surya + Ollama land in Chunk 4). ~6 PRs.


Chunk 4: Real OCR + real LLM clients

Purpose: Wire SuryaOCRClient and OllamaClient into production. Tests gated on IX_TEST_OLLAMA=1. Pipeline factory switches from fakes to real clients based on env.

Task 4.1: OllamaClient (real)

Branch: feat/ollama-client

Files:

  • Create: src/ix/genai/ollama_client.py

  • Create: tests/unit/test_ollama_client.py (uses pytest-httpx to mock)

  • Create: tests/live/test_ollama_client_live.py (gated on IX_TEST_OLLAMA=1)

  • Failing unit tests: invoke POSTs to /api/chat with format=<schema>; parses response into the Pydantic schema; surfaces IX_002_000 on connection error / timeout; surfaces IX_002_001 on schema-parse failure.

  • Live test: real call to host.docker.internal:11434 (or 192.168.68.42:11434 from Mac) with gpt-oss:20b against a tiny BankStatementHeader-shaped schema; skipped unless IX_TEST_OLLAMA=1.

  • Implement. httpx AsyncClient with per-call timeout from config.

  • Green, commit, PR, merge.

Task 4.2: SuryaOCRClient (real)

Branch: feat/surya-client

Files:

  • Create: src/ix/ocr/surya_client.py

  • Create: tests/unit/test_surya_client.py (mocked surya.recognition.RecognitionPredictor)

  • Create: tests/live/test_surya_client_live.py (gated on IX_TEST_OLLAMA=1 — reuses the flag; rename to IX_TEST_LIVE=1 if that collides)

  • Failing unit tests with Surya API mocked: given 3 Pages, returns an OCRResult with 3 matching pages, each with lines + 8-coord polygons.

  • Live test: runs Surya against tests/fixtures/synthetic_giro.pdf; asserts extracted text contains the known IBAN substring.

  • Implement. selfcheck() loads the model at startup and runs a 1-page sanity OCR on a blank page; used by /healthz.

  • Add surya to pyproject.toml: surya-ocr + torch>=2.2 (CUDA 12.4 wheels).

  • Green, commit, PR, merge.

Task 4.3: Pipeline factory + /healthz wiring

Branch: feat/production-wiring

Files:

  • Modify: src/ix/app.py (build production pipeline in lifespan, not fakes)

  • Create: src/ix/genai/__init__.py (factory: make_client(config) -> GenAIClient)

  • Create: src/ix/ocr/__init__.py (factory: make_client(config) -> OCRClient)

  • Modify: src/ix/adapters/rest/routes.py (/healthz probes real clients)

  • Failing tests: factory returns OllamaClient / SuryaOCRClient in production mode; FakeOCRClient / FakeGenAIClient when IX_TEST_MODE=fake env is set (used by integration tests).

  • Implement.

  • Green, commit, PR, merge.

Chunk 4 end state: Running container can handle a real PDF end-to-end with real OCR and real LLM. Unit tests stay hermetic; live tests run on the Mac against the home server. ~3 PRs.


Chunk 5: Containerization + deployment + E2E

Purpose: Dockerize, configure the server, push-to-deploy, run the first live smoke test.

Task 5.1: Dockerfile + docker-compose

Branch: feat/dockerize

Files:

  • Create: Dockerfile

  • Create: docker-compose.yml

  • Modify: .env.example (final list of vars)

  • Dockerfile: base nvidia/cuda:12.4.0-runtime-ubuntu22.04, install Python 3.12 via deadsnakes, install uv, copy source, uv sync --no-dev, CMD alembic upgrade head && uvicorn ix.app:create_app --factory --host 0.0.0.0 --port 8994.

  • docker-compose.yml: single service infoxtractor, port 8994, runtime: nvidia, GPU reservation, env_file .env, monitoring labels, backup labels, extra_hosts: host.docker.internal:host-gateway.

  • Build locally (docker compose build) to verify.

  • Commit, PR, merge (no deploy yet — see next task).

Task 5.2: Server setup + post-receive hook

Branch: feat/deploy-setup

Files:

  • Create: docs/deployment.md

  • Create: scripts/setup_server.sh (one-shot: creates bare repo + post-receive hook + infoxtractor DB on postgis + .env on server)

  • setup_server.sh (run manually once): SSH to server@192.168.68.42, create /home/server/Public/infoxtractor/repos.git bare repo with post-receive hook that checks out to /home/server/Public/infoxtractor/app/, runs docker compose up -d --build, polls /healthz for 60 s, exits non-zero on failure.

  • Creates infoxtractor DB + role on the postgis container.

  • Writes /home/server/Public/infoxtractor/app/.env with real passwords (user provides via environment or prompt).

  • Commit, PR, merge. Run the script manually; doc the run in deployment.md.

Task 5.3: Add server git remote + first deploy

Branch: feat/first-deploy

  • Local: git remote add server ssh://server@192.168.68.42/home/server/Public/infoxtractor/repos.git.
  • Verify ollama pull gpt-oss:20b is done on the host (check docker exec ollama ollama list).
  • git push server main. Hook rebuilds. /healthz check. Smoke: curl http://192.168.68.42:8994/healthz.
  • Document remote setup in deployment.md.
  • No code PR — this task is ops.

Task 5.4: E2E smoke test script

Branch: feat/e2e-smoke

Files:

  • Create: scripts/e2e_smoke.py

  • Submits tests/fixtures/synthetic_giro.pdf via POST http://192.168.68.42:8994/jobs (from Mac), polls, asserts per spec §12. Exits non-zero on failure. Prints timings.

  • Runs from Mac after every git push server main (documented as part of deploy habit in AGENTS.md).

  • Commit, PR, merge, deploy. Run smoke script; paste output into the PR description.

Chunk 5 end state: Service live on http://192.168.68.42:8994, deploy gated by /healthz + E2E smoke. First consumer (mammon) can start building its integration.


Out of scope for this plan (owned by mammon or future ix work)

  • Mammon integration — owned by mammon repo; spec'd separately.
  • Second use case (receipt/invoice) — after bank_statement_header is proven live.
  • Multi-container worker, Prometheus exporter, OpenTelemetry exporter, vision path, Config Server, Kafka transport, Azure/AWS/OpenAI adapters — all in spec §14.

Review / handoff

After all chunks merged and deployed:

  1. Run scripts/e2e_smoke.py against live service; screenshot / paste output.
  2. Ensure monitoring dashboard shows infoxtractor healthy at http://192.168.68.42:8001.
  3. Confirm /healthz returns 200 for 5 minutes straight (no Surya OOMs, no Ollama missing model).
  4. Tag release: git tag v0.1.0 && git push forgejo v0.1.0.
  5. Open follow-up in mammon: "plan ix integration for needs_parser docs" referencing this spec + service URL.