Dirk Riemann 86538ee8de Implementation plan for ix MVP

Detailed, TDD-structured plan with 5 chunks covering ~30 feature-branch
tasks from foundation scaffolding through first live deploy + E2E smoke.
Each task is one PR; pipeline core comes hermetic-first, real Surya/Ollama
clients in Chunk 4, containerization + first deploy in Chunk 5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-18 10:34:30 +02:00

34 KiB

Raw Blame History

InfoXtractor MVP — Implementation Plan

For agentic workers: REQUIRED: Use superpowers:subagent-driven-development (if subagents available) or superpowers:executing-plans to implement this plan. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Ship an on-prem, async, LLM-powered structured-extraction microservice with one end-to-end use case (bank_statement_header) reachable from mammon via REST.

Architecture: FastAPI container + single asyncio worker + shared postgis job store. Transport-agnostic pipeline core with pluggable OCRClient (Surya) and GenAIClient (Ollama). Provenance-based reliability signals per field. No cloud services.

Tech Stack: Python 3.12, FastAPI, uvicorn, SQLAlchemy 2.0 async, asyncpg, Alembic, Pydantic v2, pydantic-settings, httpx, PyMuPDF, python-magic, Pillow, surya-ocr (CUDA), Ollama REST API, pytest (+ pytest-asyncio, pytest-postgresql fixture), uv for dep management.

Spec reference: docs/superpowers/specs/2026-04-18-ix-mvp-design.md (approved 2026-04-18 after 2 review rounds).

Habits (inherited from mammon; see AGENTS.md):

Every task lands as its own feature branch: feat/<task-slug>.
TDD: write failing test → minimal implementation → green → refactor → commit.
Every commit updates code + tests + docs (AGENTS.md, README.md, section in this plan) in one shot.
Push: git push forgejo feat/<task-slug> → PR via Forgejo API → wait for CI green → merge → git push server main to deploy.
Never skip hooks, never force-push main, never amend merged commits.
After each deploy, run scripts/e2e_smoke.py against the live service.

File structure (target)

infoxtractor/
├── AGENTS.md                     # Guiding principles, habits, stack, deploy
├── README.md                     # One-paragraph + pointers
├── Dockerfile                    # nvidia/cuda:12.4 + python 3.12 + surya
├── docker-compose.yml            # GPU reservation, env_file, monitoring labels
├── pyproject.toml                # deps via uv; no setup.py
├── uv.lock
├── alembic.ini
├── alembic/
│   ├── env.py
│   └── versions/
│       └── 001_initial_ix_jobs.py
├── src/ix/
│   ├── __init__.py
│   ├── app.py                    # FastAPI create_app factory + lifespan
│   ├── config.py                 # AppConfig (pydantic-settings)
│   ├── errors.py                 # IXException + IX_* error codes
│   ├── logging.py                # JSON formatter, ix_id context
│   ├── contracts/                # Pydantic data contracts
│   │   ├── __init__.py
│   │   ├── request.py            # RequestIX, Context, FileRef, Options, *Options
│   │   ├── response.py           # ResponseIX, IXResult, OCRResult, OCRDetails, Page, Line, Metadata
│   │   ├── provenance.py         # ProvenanceData, FieldProvenance, ExtractionSource, BoundingBox, SegmentCitation
│   │   └── job.py                # Job envelope (status enum, lifecycle)
│   ├── use_cases/
│   │   ├── __init__.py           # REGISTRY
│   │   └── bank_statement_header.py
│   ├── pipeline/
│   │   ├── __init__.py
│   │   ├── step.py               # Step ABC
│   │   ├── pipeline.py           # Pipeline orchestrator + Timer
│   │   ├── setup_step.py
│   │   ├── ocr_step.py
│   │   ├── genai_step.py
│   │   ├── reliability_step.py
│   │   └── response_handler_step.py
│   ├── segmentation/
│   │   ├── __init__.py
│   │   └── segment_index.py      # SegmentIndex + prompt formatting
│   ├── provenance/
│   │   ├── __init__.py
│   │   ├── mapper.py             # map_segment_refs_to_provenance
│   │   ├── normalize.py          # normalizers (string, number, date, iban)
│   │   └── verify.py             # provenance_verified + text_agreement logic
│   ├── ocr/
│   │   ├── __init__.py
│   │   ├── client.py             # OCRClient Protocol
│   │   ├── fake.py               # FakeOCRClient (for tests)
│   │   └── surya_client.py       # SuryaOCRClient (real)
│   ├── genai/
│   │   ├── __init__.py
│   │   ├── client.py             # GenAIClient Protocol, GenAIInvocationResult, GenAIUsage
│   │   ├── fake.py               # FakeGenAIClient (for tests)
│   │   └── ollama_client.py      # OllamaClient (real)
│   ├── ingestion/
│   │   ├── __init__.py
│   │   ├── fetch.py              # file downloader w/ auth headers, timeouts, size cap
│   │   ├── mime.py               # python-magic wrapper
│   │   └── pages.py              # DocumentIngestor: PDF/image/text → Page list
│   ├── store/
│   │   ├── __init__.py
│   │   ├── models.py             # SQLAlchemy ORM for ix_jobs
│   │   ├── engine.py             # lazy async engine, session factory
│   │   └── jobs_repo.py          # claim_next, insert, get, list_by_correlation, sweep_orphans, update
│   ├── worker/
│   │   ├── __init__.py
│   │   ├── loop.py               # worker task: claim → run pipeline → deliver callback
│   │   └── callback.py           # one-shot webhook delivery
│   ├── adapters/
│   │   ├── __init__.py
│   │   ├── rest/
│   │   │   ├── __init__.py
│   │   │   ├── routes.py         # /jobs, /jobs/{id}, /jobs (list), /healthz, /metrics
│   │   │   └── schemas.py        # request/response bodies
│   │   └── pg_queue/
│   │       ├── __init__.py
│   │       └── listener.py       # LISTEN ix_jobs_new + 10s fallback poll
│   └── metrics/
│       ├── __init__.py
│       └── counters.py           # plain-JSON counter queries
├── tests/
│   ├── __init__.py
│   ├── conftest.py
│   ├── fixtures/
│   │   ├── synthetic_giro.pdf    # generated from template
│   │   └── ocr_canned/           # canned Surya outputs for integration tests
│   ├── unit/
│   │   ├── test_contracts.py
│   │   ├── test_errors.py
│   │   ├── test_segment_index.py
│   │   ├── test_provenance_normalize.py
│   │   ├── test_provenance_verify.py
│   │   ├── test_provenance_mapper.py
│   │   ├── test_setup_step.py
│   │   ├── test_ocr_step.py
│   │   ├── test_genai_step.py
│   │   ├── test_reliability_step.py
│   │   ├── test_response_handler_step.py
│   │   ├── test_pipeline.py
│   │   ├── test_use_case_registry.py
│   │   ├── test_ingestion_fetch.py
│   │   ├── test_ingestion_pages.py
│   │   └── test_use_case_bank_statement_header.py
│   ├── integration/
│   │   ├── test_jobs_repo.py
│   │   ├── test_rest_adapter.py
│   │   ├── test_pg_queue_adapter.py
│   │   ├── test_worker_loop.py
│   │   └── test_pipeline_end_to_end.py  # fakes, real DB
│   └── live/
│       └── test_ollama_surya_smoke.py   # gated on IX_TEST_OLLAMA=1
├── scripts/
│   ├── e2e_smoke.py                 # post-deploy gate (Mac → :8994)
│   ├── create_fixture_pdf.py        # builds synthetic_giro.pdf deterministically
│   └── forgejo_pr.py                # wrapper: create branch → PR → merge
├── .env.example
├── .gitignore
└── docs/
    ├── spec-core-pipeline.md
    └── superpowers/
        ├── specs/2026-04-18-ix-mvp-design.md
        └── plans/2026-04-18-ix-mvp-implementation.md   (this file)

Boundary rules:

pipeline/ knows about contracts/, segmentation/, provenance/, ocr.client, genai.client. NOT store/, adapters/, worker/.
adapters/ knows about store/ and contracts/, NOT pipeline/ directly (it hands requests to the job store; the worker pulls from the store).
worker/ knows about store/, pipeline/, contracts/. Bridges the two.
store/ knows only about contracts/ (for JSONB serialization).
ocr/surya_client.py and genai/ollama_client.py are the only files that import external libraries beyond stdlib/FastAPI/SQLAlchemy — all other modules stay hermetic.

Chunk 1: Foundation (scaffolding + contracts + use case + SegmentIndex)

Purpose: Set up the project skeleton and land the data contracts, error model, use-case registry with the first use case, and SegmentIndex. No pipeline, no transport, no LLM — just the Pydantic/types core plus the scaffolding to run tests and CI.

Task 1.1: Project scaffolding

Branch: feat/scaffold

Files:

Create: pyproject.toml, .python-version, uv.lock, .env.example
Create: src/ix/__init__.py
Create: tests/__init__.py, tests/conftest.py
Create: .forgejo/workflows/ci.yml (pytest on push + PR)
Create: pytest.ini with asyncio_mode=auto
Write pyproject.toml with deps: fastapi, uvicorn[standard], sqlalchemy[asyncio]>=2, asyncpg, alembic, pydantic>=2, pydantic-settings, httpx, pymupdf, python-magic, pillow, python-dateutil. Dev: pytest, pytest-asyncio, pytest-httpx, ruff, mypy.
.env.example: every var from spec §9, all placeholders obvious (<password>, <host-internal-url>).
pytest.ini: asyncio_mode = auto, collect from tests/.
.forgejo/workflows/ci.yml: runs uv sync && uv run pytest tests/unit tests/integration -v; service container postgres:16; env IX_POSTGRES_URL points at service. Excludes tests/live/ (real Ollama).
Commit, push branch, create PR, merge, deploy — no deploy needed yet (nothing to deploy). Stop at merge.

Task 1.2: Error model

Branch: feat/errors

Files:

Create: src/ix/errors.py
Create: tests/unit/test_errors.py
Write failing tests for IXException and every IX_* code in spec §8. Each code is a class attribute: IXError.IX_000_000 = "IX_000_000: request_ix is None". Exception carries code + detail.
Implement enum-like class with __str__ producing "IX_000_000: message (detail=...)".
Green, commit, PR, merge.

Task 1.3: Data contracts — RequestIX / Options / Context / FileRef

Branch: feat/contracts-request

Files:

Create: src/ix/contracts/request.py
Create: src/ix/contracts/__init__.py
Create: tests/unit/test_contracts.py
Failing tests: round-trip RequestIX.model_validate_json(...) with each shape in spec §3. String-or-FileRef union. Defaults: include_provenance=True, service="surya", etc. Validation errors on unknown fields.
Pydantic models per spec §3. Options, OCROptions, GenAIOptions, ProvenanceOptions, Context, FileRef, RequestIX.
Green, commit, PR, merge.

Task 1.4: Data contracts — ResponseIX / Provenance / IXResult / OCRResult / Metadata / Job

Branch: feat/contracts-response

Files:

Create: src/ix/contracts/response.py
Create: src/ix/contracts/provenance.py
Create: src/ix/contracts/job.py
Modify: src/ix/contracts/__init__.py (export)
Modify: tests/unit/test_contracts.py
Failing tests:
- FieldProvenance with new provenance_verified / text_agreement fields; None-allowed.
- quality_metrics keys: fields_with_provenance, total_fields, coverage_rate, invalid_references, verified_fields, text_agreement_fields.
- ResponseIX.context excluded from model_dump (use exclude).
- Job envelope, status is a Literal, callback_status starts as None.
Implement per spec §3 + §9.3. ResponseIX.context uses Field(exclude=True); a sibling internal model _InternalContext holds pages, files, texts, use_case_request, use_case_response, segment_index. Keep it simple: one class, Field(exclude=True) on the attribute.
Green, commit, PR, merge.

Task 1.5: Use-case registry and first use case

Branch: feat/use-case-bank-statement-header

Files:

Create: src/ix/use_cases/__init__.py (REGISTRY)
Create: src/ix/use_cases/bank_statement_header.py
Create: tests/unit/test_use_case_registry.py
Create: tests/unit/test_use_case_bank_statement_header.py
Failing tests: REGISTRY["bank_statement_header"] returns (Request, BankStatementHeader); unknown name raises IX_001_001; Request().system_prompt contains "extract header metadata" substring.
Implement per spec §7. Pydantic models. Register on import of the module (side-effect registration, or explicit registry assembly in __init__.py). Prefer explicit — REGISTRY = {"bank_statement_header": (Request, BankStatementHeader)} — no import-time side effects.
Green, commit, PR, merge.

Task 1.6: SegmentIndex

Branch: feat/segment-index

Files:

Create: src/ix/segmentation/__init__.py
Create: src/ix/segmentation/segment_index.py
Create: tests/unit/test_segment_index.py
Failing tests (take from spec §9.1):
- build() assigns IDs p1_l0, p1_l1, … across the flat page list.
- <page> tag lines are excluded from IDs.
- lookup_segment("p1_l0") returns {page, bbox, text, file_index}; unknown → None.
- to_prompt_text() emits "[p1_l0] text\n…" and appends raw context.texts untagged at the end.
- BoundingBox normalization divides by page width/height.
Implement. SegmentIndex is built from an OCRResult + pages metadata; holds _id_to_position: dict[str, dict] and _ordered_ids: list[str].
Green, commit, PR, merge.

Task 1.7: Provenance normalizers

Branch: feat/provenance-normalize

Files:

Create: src/ix/provenance/__init__.py
Create: src/ix/provenance/normalize.py
Create: tests/unit/test_provenance_normalize.py
Failing tests for each normalizer in spec §6 ReliabilityStep:
- String: " FOO bar!!! " → "foo bar" (after NFKC + casefold + whitespace collapse + punctuation strip).
- Number: "CHF 1'234.56" ↔ Decimal("1234.56") → same canonical form.
- Date: "31.03.2026" ↔ date(2026,3,31) → "2026-03-31" via dateutil(dayfirst=True).
- IBAN: "de 89 3704 0044 0532 0130 00" → "DE89370400440532013000".
- Short-value rule: _should_skip_text_agreement("0", field_type=int) → True; "AB" for str → True.
Implement. Pure functions, no external state, fully unit-testable.
Green, commit, PR, merge.

Task 1.8: Provenance mapper + verifier

Branch: feat/provenance-mapper-verifier

Files:

Create: src/ix/provenance/mapper.py (map_segment_refs_to_provenance per spec §9.4)
Create: src/ix/provenance/verify.py (verify_field_value + text_agreement_for_field)
Create: tests/unit/test_provenance_mapper.py
Create: tests/unit/test_provenance_verify.py
Failing tests for mapper: given fake SegmentIndex + fake segment_citations → correct FieldProvenance.sources; invalid_references count; value resolution via dot-path ("result.invoice_number", "items.0.name"); max_sources_per_field cap.
Failing tests for verifier: provenance_verified true/false per field type; text_agreement with and without context.texts; Literal → None; None value → None; short value → text_agreement None; date parses both sides.
Implement; pure functions.
Green, commit, PR, merge.

Chunk 1 end state: pytest tests/unit runs green locally and in Forgejo Actions. No runtime service yet. ~8 merged PRs to main. Time estimate: one focused afternoon.

Chunk 2: Pipeline core

Purpose: Wire up the Step ABC + Pipeline orchestrator + all five steps + fake OCR/GenAI clients. At end of chunk, pipeline runs end-to-end with fakes and produces a full ResponseIX for bank_statement_header, entirely hermetic.

Task 2.1: Step ABC + Pipeline orchestrator + Timer

Branch: feat/pipeline-core

Files:

Create: src/ix/pipeline/__init__.py
Create: src/ix/pipeline/step.py
Create: src/ix/pipeline/pipeline.py
Create: tests/unit/test_pipeline.py
Failing tests using synthetic steps: order preserved; validate=False skips step; validate raise → error written + abort; process raise → error written + abort; each step's elapsed seconds added to metadata.timings.
Implement per spec §3/§4. Pipeline(steps=[...]). _execute_step wraps in timer + try/except, sets response_ix.error on raise.
Green, commit, PR, merge.

Task 2.2: OCRClient and GenAIClient protocols + fakes

Branch: feat/client-protocols

Files:

Create: src/ix/ocr/__init__.py, src/ix/ocr/client.py, src/ix/ocr/fake.py
Create: src/ix/genai/__init__.py, src/ix/genai/client.py, src/ix/genai/fake.py
Create: tests/unit/test_ocr_fake.py, tests/unit/test_genai_fake.py
Failing tests: FakeOCRClient(canned=OCRResult(...)) returns the canned result; FakeGenAIClient(parsed=MyModel(...)) returns a GenAIInvocationResult with that parsed instance + stubbed usage.
Implement Protocols + fakes. Protocols are @runtime_checkable.
Green, commit, PR, merge.

Task 2.3: Ingestion — fetch + MIME + pages

Branch: feat/ingestion

Files:

Create: src/ix/ingestion/__init__.py
Create: src/ix/ingestion/fetch.py
Create: src/ix/ingestion/mime.py
Create: src/ix/ingestion/pages.py
Create: tests/unit/test_ingestion_fetch.py (pytest-httpx mocks)
Create: tests/unit/test_ingestion_pages.py (fixture PDFs/images)
Failing tests:
- fetch_file(FileRef, …) passes headers; size cap raises IX_000_007; timeout raises IX_000_007; non-2xx raises IX_000_007.
- detect_mime(bytes) classifies PDF/PNG/JPEG/TIFF correctly; unknown raises IX_000_005.
- DocumentIngestor.build_pages(files, texts): PDF with 3 pages → 3 Page objects with page_no/width/height; multi-frame TIFF → multiple Pages; plain text entry → one Page; >100 PDF pages raises IX_000_006.
Implement. fetch_file uses httpx AsyncClient with timeouts from config, stream=True to enforce size cap incrementally. DocumentIngestor uses PyMuPDF for PDFs, PIL for images.
Green, commit, PR, merge.

Task 2.4: SetupStep

Branch: feat/step-setup

Files:

Create: src/ix/pipeline/setup_step.py
Create: tests/unit/test_setup_step.py
Failing tests:
- validate raises IX_000_000 if request None; IX_000_002 if no files+no texts.
- process downloads files (pytest-httpx mocks), assembles response_ix.context.pages, loads use case; unknown use case → IX_001_001.
Implement per spec §6. Use ingestion.fetch_file + DocumentIngestor.
Green, commit, PR, merge.

Task 2.5: OCRStep

Branch: feat/step-ocr

Files:

Create: src/ix/pipeline/ocr_step.py
Create: tests/unit/test_ocr_step.py
Failing tests:
- validate raises IX_000_004 when geometries/text/ocr_only set but no files.
- validate returns False for pure-text requests.
- process runs FakeOCRClient, injects page tags, builds SegmentIndex when provenance on.
Implement per spec §6.
Green, commit, PR, merge.

Task 2.6: GenAIStep

Branch: feat/step-genai

Files:

Create: src/ix/pipeline/genai_step.py
Create: tests/unit/test_genai_step.py
Failing tests:
- System prompt concat with citation instruction when provenance on.
- Text content format: [p1_l0] foo\n[p1_l1] bar.
- Response schema wrapped in ProvenanceWrappedResponse when provenance on.
- FakeGenAIClient returns parsed result → written to ix_result.result.
- IX_002_000 / IX_002_001 surfaced on client raise.
- Provenance mapping produces ProvenanceData with the expected field paths.
Implement per spec §6. Use provenance.mapper.map_segment_refs_to_provenance.
Green, commit, PR, merge.

Task 2.7: ReliabilityStep

Branch: feat/step-reliability

Files:

Create: src/ix/pipeline/reliability_step.py
Create: tests/unit/test_reliability_step.py
Failing tests:
- Skipped when include_provenance=False.
- Per-type dispatch: Literal field → None flags; None value → None flags; short value → text_agreement=None.
- Dates parse both sides before comparison.
- Counters verified_fields and text_agreement_fields written.
- Tests using BankStatementHeader + concrete ProvenanceData → exact flag values.
Implement using provenance.verify + provenance.normalize; dispatch via type hints on the use-case response schema (introspect via get_type_hints).
Green, commit, PR, merge.

Task 2.8: ResponseHandlerStep

Branch: feat/step-response-handler

Files:

Create: src/ix/pipeline/response_handler_step.py
Create: tests/unit/test_response_handler_step.py
Failing tests per spec §8: attach OCR text; strip geometries when not requested; delete context.
Implement.
Green, commit, PR, merge.

Task 2.9: End-to-end pipeline test with fakes

Branch: feat/pipeline-e2e-fakes

Files:

Create: tests/unit/test_pipeline_end_to_end.py
Create: tests/fixtures/synthetic_giro.pdf (generated)
Create: scripts/create_fixture_pdf.py
scripts/create_fixture_pdf.py builds a deterministic PDF with known header fields (bank name, IBAN, period, balances) using reportlab or PyMuPDF. Script re-runs on demand; output is committed.
Failing test: feed the fixture + canned OCR + canned LLM response through the full Pipeline([Setup, OCR, GenAI, Reliability, ResponseHandler]) and assert response_ix.ix_result.result == expected, provenance_verified[closing_balance] is True, timings populated.
Implement — only wiring; no new logic.
Green, commit, PR, merge.

Chunk 2 end state: Full pipeline runs in tests with fakes. No DB, no transport, no real clients. Running pytest tests/unit -v goes green end-to-end. ~9 merged PRs.

Chunk 3: Job store + REST adapter + worker loop

Purpose: Persist jobs in Postgres, expose REST endpoints, run the worker task in the FastAPI lifespan, deliver callbacks. At chunk end, the container (locally or on the server) accepts POST /jobs, runs the fake-backed pipeline against a real DB, and returns results via polling or callback.

Task 3.1: Alembic scaffolding + initial migration

Branch: feat/alembic-init

Files:

Create: alembic.ini, alembic/env.py, alembic/versions/001_initial_ix_jobs.py
Create: src/ix/store/__init__.py, src/ix/store/models.py, src/ix/store/engine.py
alembic/env.py: async engine, NullPool, reads IX_POSTGRES_URL.
models.py: IxJob ORM mapping to ix_jobs table per spec §4.
001_initial_ix_jobs.py: CREATE TABLE + indexes (including UNIQUE on (client_id, request_id)). No NOTIFY trigger (NOTIFY is fired by writers, not DDL).
Smoke test: alembic upgrade head against a disposable postgres (via docker) creates the table and indexes. No unit test here — verified in integration tests.
Commit, PR, merge.

Task 3.2: Config module (AppConfig)

Branch: feat/config

Files:

Create: src/ix/config.py
Create: tests/unit/test_config.py
Failing tests: every env var in spec §9 loads from IX_* env; defaults match spec.
Implement AppConfig via pydantic-settings; no .env auto-load in tests — use monkeypatch.
Green, commit, PR, merge.

Task 3.3: JobsRepo (store CRUD)

Branch: feat/jobs-repo

Files:

Create: src/ix/store/jobs_repo.py
Create: tests/integration/test_jobs_repo.py
Create: tests/conftest.py fixtures — postgres_url, engine, session_factory (Forgejo CI service container).
Failing tests (integration, need real DB):
- insert_pending(request, callback_url) creates row; returns job_id + ix_id.
- Inserting with existing (client_id, request_id) returns the existing job_id (idempotency), status unchanged.
- claim_next_pending() → locks a pending row and updates to running; returns None if none available; concurrent callers each claim distinct rows (SKIP LOCKED).
- get(job_id) returns the full Job with nested request/response parsed.
- mark_done(job_id, response) / mark_error(job_id, response_with_error) / update_callback_status(...).
- sweep_orphans(now, max_running_age) → returns list of rescued job IDs; their status goes back to pending, attempts++.
Implement using SQLAlchemy 2.0 async. Each method is a single transaction.
Green, commit, PR, merge.

Task 3.4: FastAPI app + REST routes

Branch: feat/rest-adapter

Files:

Create: src/ix/app.py
Create: src/ix/adapters/__init__.py, src/ix/adapters/rest/__init__.py, src/ix/adapters/rest/routes.py, src/ix/adapters/rest/schemas.py
Create: tests/integration/test_rest_adapter.py
Failing tests (integration, FastAPI TestClient + real DB):
- POST /jobs with valid body → 201, returns {job_id, ix_id, status: "pending"}.
- POST /jobs idempotent on (client_id, request_id) — second call returns same job_id with 200.
- GET /jobs/{id} returns the Job shape; 404 on unknown.
- GET /jobs?client_id=…&request_id=… returns the row or 404.
- GET /healthz returns JSON with postgres/ollama/ocr keys. In tests, ollama and ocr are mocked via dependency-injection hook.
- GET /metrics returns 24h counters.
Implement create_app() factory. Lifespan: create engine, run alembic upgrade head, spawn worker task (Chunk 3.5), tear down on shutdown.
Green, commit, PR, merge.

Task 3.5: Worker loop + callback delivery

Branch: feat/worker-loop

Files:

Create: src/ix/worker/__init__.py, src/ix/worker/loop.py, src/ix/worker/callback.py
Modify: src/ix/app.py (lifespan spawns worker task)
Create: tests/integration/test_worker_loop.py
Failing tests (integration):
- Worker claims a pending job, runs a fake pipeline, writes response, updates status to done.
- On pipeline exception: status → error, response carries the error code.
- On callback_url set and 200 response: callback_status → delivered.
- On callback 500 or timeout: callback_status → failed; status stays done/error.
- Worker startup orphan sweep: job left in running with started_at < now - 2 * per_job_timeout → reset to pending, attempts++.
Implement. Worker pipeline factory injected — tests pass a stub; production wiring builds the real Pipeline with FakeOCRClient / FakeGenAIClient for now (Chunk 4 swaps them).
Green, commit, PR, merge.

Task 3.6: Postgres queue adapter

Branch: feat/pg-queue-adapter

Files:

Create: src/ix/adapters/pg_queue/__init__.py, src/ix/adapters/pg_queue/listener.py
Modify: src/ix/app.py (lifespan spawns listener task if enabled)
Create: tests/integration/test_pg_queue_adapter.py
Failing tests (integration):
- Caller inserts a row directly and NOTIFY ix_jobs_new, '<job_id>' → worker picks it up within 1 s.
- Missed NOTIFY (e.g., listener not started yet) → 10 s fallback poll finds the row.
Implement. listener.py opens a dedicated asyncpg connection (outside the SQLAlchemy pool) to run LISTEN; on notify, emits an asyncio event the worker wait_for_notify_or_poll(10s) reacts to.
Green, commit, PR, merge.

Chunk 3 end state: FastAPI container serves the REST API, backed by a real Postgres. Pipeline still uses fakes under the hood (real Surya + Ollama land in Chunk 4). ~6 PRs.

Chunk 4: Real OCR + real LLM clients

Purpose: Wire SuryaOCRClient and OllamaClient into production. Tests gated on IX_TEST_OLLAMA=1. Pipeline factory switches from fakes to real clients based on env.

Task 4.1: OllamaClient (real)

Branch: feat/ollama-client

Files:

Create: src/ix/genai/ollama_client.py
Create: tests/unit/test_ollama_client.py (uses pytest-httpx to mock)
Create: tests/live/test_ollama_client_live.py (gated on IX_TEST_OLLAMA=1)
Failing unit tests: invoke POSTs to /api/chat with format=<schema>; parses response into the Pydantic schema; surfaces IX_002_000 on connection error / timeout; surfaces IX_002_001 on schema-parse failure.
Live test: real call to host.docker.internal:11434 (or 192.168.68.42:11434 from Mac) with gpt-oss:20b against a tiny BankStatementHeader-shaped schema; skipped unless IX_TEST_OLLAMA=1.
Implement. httpx AsyncClient with per-call timeout from config.
Green, commit, PR, merge.

Task 4.2: SuryaOCRClient (real)

Branch: feat/surya-client

Files:

Create: src/ix/ocr/surya_client.py
Create: tests/unit/test_surya_client.py (mocked surya.recognition.RecognitionPredictor)
Create: tests/live/test_surya_client_live.py (gated on IX_TEST_OLLAMA=1 — reuses the flag; rename to IX_TEST_LIVE=1 if that collides)
Failing unit tests with Surya API mocked: given 3 Pages, returns an OCRResult with 3 matching pages, each with lines + 8-coord polygons.
Live test: runs Surya against tests/fixtures/synthetic_giro.pdf; asserts extracted text contains the known IBAN substring.
Implement. selfcheck() loads the model at startup and runs a 1-page sanity OCR on a blank page; used by /healthz.
Add surya to pyproject.toml: surya-ocr + torch>=2.2 (CUDA 12.4 wheels).
Green, commit, PR, merge.

Task 4.3: Pipeline factory + `/healthz` wiring

Branch: feat/production-wiring

Files:

Modify: src/ix/app.py (build production pipeline in lifespan, not fakes)
Create: src/ix/genai/__init__.py (factory: make_client(config) -> GenAIClient)
Create: src/ix/ocr/__init__.py (factory: make_client(config) -> OCRClient)
Modify: src/ix/adapters/rest/routes.py (/healthz probes real clients)
Failing tests: factory returns OllamaClient / SuryaOCRClient in production mode; FakeOCRClient / FakeGenAIClient when IX_TEST_MODE=fake env is set (used by integration tests).
Implement.
Green, commit, PR, merge.

Chunk 4 end state: Running container can handle a real PDF end-to-end with real OCR and real LLM. Unit tests stay hermetic; live tests run on the Mac against the home server. ~3 PRs.

Chunk 5: Containerization + deployment + E2E

Purpose: Dockerize, configure the server, push-to-deploy, run the first live smoke test.

Task 5.1: Dockerfile + docker-compose

Branch: feat/dockerize

Files:

Create: Dockerfile
Create: docker-compose.yml
Modify: .env.example (final list of vars)
Dockerfile: base nvidia/cuda:12.4.0-runtime-ubuntu22.04, install Python 3.12 via deadsnakes, install uv, copy source, uv sync --no-dev, CMD alembic upgrade head && uvicorn ix.app:create_app --factory --host 0.0.0.0 --port 8994.
docker-compose.yml: single service infoxtractor, port 8994, runtime: nvidia, GPU reservation, env_file .env, monitoring labels, backup labels, extra_hosts: host.docker.internal:host-gateway.
Build locally (docker compose build) to verify.
Commit, PR, merge (no deploy yet — see next task).

Task 5.2: Server setup + post-receive hook

Branch: feat/deploy-setup

Files:

Create: docs/deployment.md
Create: scripts/setup_server.sh (one-shot: creates bare repo + post-receive hook + infoxtractor DB on postgis + .env on server)
setup_server.sh (run manually once): SSH to server@192.168.68.42, create /home/server/Public/infoxtractor/repos.git bare repo with post-receive hook that checks out to /home/server/Public/infoxtractor/app/, runs docker compose up -d --build, polls /healthz for 60 s, exits non-zero on failure.
Creates infoxtractor DB + role on the postgis container.
Writes /home/server/Public/infoxtractor/app/.env with real passwords (user provides via environment or prompt).
Commit, PR, merge. Run the script manually; doc the run in deployment.md.

Task 5.3: Add `server` git remote + first deploy

Branch: feat/first-deploy

Local: git remote add server ssh://server@192.168.68.42/home/server/Public/infoxtractor/repos.git.
Verify ollama pull gpt-oss:20b is done on the host (check docker exec ollama ollama list).
git push server main. Hook rebuilds. /healthz check. Smoke: curl http://192.168.68.42:8994/healthz.
Document remote setup in deployment.md.
No code PR — this task is ops.

Task 5.4: E2E smoke test script

Branch: feat/e2e-smoke

Files:

Create: scripts/e2e_smoke.py
Submits tests/fixtures/synthetic_giro.pdf via POST http://192.168.68.42:8994/jobs (from Mac), polls, asserts per spec §12. Exits non-zero on failure. Prints timings.
Runs from Mac after every git push server main (documented as part of deploy habit in AGENTS.md).
Commit, PR, merge, deploy. Run smoke script; paste output into the PR description.

Chunk 5 end state: Service live on http://192.168.68.42:8994, deploy gated by /healthz + E2E smoke. First consumer (mammon) can start building its integration.

Out of scope for this plan (owned by mammon or future ix work)

Mammon integration — owned by mammon repo; spec'd separately.
Second use case (receipt/invoice) — after bank_statement_header is proven live.
Multi-container worker, Prometheus exporter, OpenTelemetry exporter, vision path, Config Server, Kafka transport, Azure/AWS/OpenAI adapters — all in spec §14.

Review / handoff

After all chunks merged and deployed:

Run scripts/e2e_smoke.py against live service; screenshot / paste output.
Ensure monitoring dashboard shows infoxtractor healthy at http://192.168.68.42:8001.
Confirm /healthz returns 200 for 5 minutes straight (no Surya OOMs, no Ollama missing model).
Tag release: git tag v0.1.0 && git push forgejo v0.1.0.
Open follow-up in mammon: "plan ix integration for needs_parser docs" referencing this spec + service URL.

34 KiB Raw Blame History

InfoXtractor MVP — Implementation Plan

File structure (target)

Chunk 1: Foundation (scaffolding + contracts + use case + SegmentIndex)

Task 1.1: Project scaffolding

Task 1.2: Error model

Task 1.3: Data contracts — RequestIX / Options / Context / FileRef

Task 1.4: Data contracts — ResponseIX / Provenance / IXResult / OCRResult / Metadata / Job

Task 1.5: Use-case registry and first use case

Task 1.6: SegmentIndex

Task 1.7: Provenance normalizers

Task 1.8: Provenance mapper + verifier

Chunk 2: Pipeline core

Task 2.1: Step ABC + Pipeline orchestrator + Timer

Task 2.2: OCRClient and GenAIClient protocols + fakes

Task 2.3: Ingestion — fetch + MIME + pages

Task 2.4: SetupStep

Task 2.5: OCRStep

Task 2.6: GenAIStep

Task 2.7: ReliabilityStep

Task 2.8: ResponseHandlerStep

Task 2.9: End-to-end pipeline test with fakes

Chunk 3: Job store + REST adapter + worker loop

Task 3.1: Alembic scaffolding + initial migration

Task 3.2: Config module (AppConfig)

Task 3.3: JobsRepo (store CRUD)

Task 3.4: FastAPI app + REST routes

Task 3.5: Worker loop + callback delivery

Task 3.6: Postgres queue adapter

Chunk 4: Real OCR + real LLM clients

Task 4.1: OllamaClient (real)

Task 4.2: SuryaOCRClient (real)

Task 4.3: Pipeline factory + /healthz wiring

Chunk 5: Containerization + deployment + E2E

Task 5.1: Dockerfile + docker-compose

Task 5.2: Server setup + post-receive hook

Task 5.3: Add server git remote + first deploy

Task 5.4: E2E smoke test script

Out of scope for this plan (owned by mammon or future ix work)

Review / handoff

34 KiB

Raw Blame History

Task 4.3: Pipeline factory + `/healthz` wiring

Task 5.3: Add `server` git remote + first deploy