Establishes ix as an async, on-prem, LLM-powered structured extraction microservice. Full reference spec stays in docs/spec-core-pipeline.md; MVP spec (strict subset — Ollama only, Surya OCR, REST + Postgres-queue transports in parallel, in-repo use cases, provenance-based reliability signals) lives at docs/superpowers/specs/2026-04-18-ix-mvp-design.md. First use case: bank_statement_header (feeds mammon's needs_parser flow). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
388 lines
23 KiB
Markdown
388 lines
23 KiB
Markdown
# InfoXtractor (ix) MVP — Design
|
||
|
||
Date: 2026-04-18
|
||
Reference: `docs/spec-core-pipeline.md` (full, aspirational spec — MVP is a strict subset)
|
||
Status: Design approved (sections 1–8 walked through and accepted 2026-04-18)
|
||
|
||
## 0. One-paragraph summary
|
||
|
||
ix is an on-prem, async, LLM-powered microservice that extracts structured JSON from documents (PDFs, images, text) given a named *use case* (a Pydantic schema + system prompt). It returns the extracted fields together with per-field provenance (OCR segment IDs, bounding boxes, extracted-value agreement flags) that let calling services decide how much to trust each value. The MVP ships one use case (`bank_statement_header`), one OCR engine (Surya, pluggable), one LLM backend (Ollama, pluggable), and two transports in parallel (REST with optional webhook callback + a Postgres queue). Cloud services are explicitly forbidden. The first consumer is mammon, which uses ix as a fallback for `needs_parser` documents.
|
||
|
||
## 1. Guiding principles
|
||
|
||
- **On-prem always.** No OpenAI, Anthropic, Azure (DI/CV/OpenAI), AWS (Bedrock/Textract), Google Document AI, Mistral, etc. LLM = Ollama (:11434). OCR = local engines only. Secrets never leave the home server. The spec's cloud references are examples to replace, not inherit.
|
||
- **Grounded extraction, not DB truth.** ix returns best-effort fields with segment citations, provenance verification, and cross-OCR agreement flags. ix does *not* claim DB-grade truth. The reliability decision (trust / stage for review / reject) belongs to the caller.
|
||
- **Transport-agnostic pipeline core.** The pipeline (`RequestIX` → `ResponseIX`) knows nothing about HTTP, queues, or databases. Adapters (REST, Postgres queue) run alongside the core; both converge on one shared job store.
|
||
- **YAGNI for all spec features the MVP doesn't need.** Kafka, Config Server, Azure/AWS clients, vision, word-level provenance, reasoning-effort routing, Prometheus/OTEL exporters: deferred. Architecture leaves the interfaces so they can be added without touching the pipeline core.
|
||
|
||
## 2. Architecture
|
||
|
||
```
|
||
┌──────────────────────────────────────────────────────────────────┐
|
||
│ infoxtractor container (Docker on 192.168.68.42, port 8994) │
|
||
│ │
|
||
│ ┌──────────────────┐ ┌──────────────────────────┐ │
|
||
│ │ rest_adapter │ │ pg_queue_adapter │ │
|
||
│ │ (FastAPI) │ │ (asyncio worker) │ │
|
||
│ │ POST /jobs │ │ LISTEN ix_jobs_new + │ │
|
||
│ │ GET /jobs/{id} │ │ SELECT ... FOR UPDATE │ │
|
||
│ │ + callback_url │ │ SKIP LOCKED │ │
|
||
│ └────────┬─────────┘ └────────┬─────────────────┘ │
|
||
│ │ │ │
|
||
│ └──────────┬──────────────┘ │
|
||
│ ▼ │
|
||
│ ┌────────────────┐ │
|
||
│ │ ix_jobs table │ ── postgis :5431, DB=infoxtractor│
|
||
│ └────────┬───────┘ │
|
||
│ ▼ │
|
||
│ ┌─────────────────────────────┐ │
|
||
│ │ Pipeline core (spec §3–§4) │ │
|
||
│ │ │ │
|
||
│ │ SetupStep → OCRStep → │ │
|
||
│ │ GenAIStep → ReliabilityStep│ │
|
||
│ │ → ResponseHandler │ │
|
||
│ │ │ │
|
||
│ │ Uses: OCRClient (Surya), │ │
|
||
│ │ GenAIClient (Ollama),│ │
|
||
│ │ UseCaseRegistry │ │
|
||
│ └─────────────────────────────┘ │
|
||
│ │
|
||
└──────────────────────────────────────────────────────────────────┘
|
||
│ ▲
|
||
▼ │
|
||
host.docker.internal:11434 mammon or any on-prem caller —
|
||
Ollama (gpt-oss:20b default) polls GET /jobs/{id} until done
|
||
```
|
||
|
||
**Key shapes:**
|
||
- Spec's four steps + a new fifth: `ReliabilityStep` runs between `GenAIStep` and `ResponseHandlerStep`, computes per-field `provenance_verified` and `text_agreement` flags. Isolated so callers and tests can reason about reliability signals independently.
|
||
- Single worker at MVP (`PIPELINE_WORKER_CONCURRENCY=1`). Ollama + Surya both want the GPU serially.
|
||
- Two transports, one job store. REST is the primary; pg queue is scaffolded, uses the same table, same lifecycle.
|
||
- Use case registry in-repo (`ix/use_cases/__init__.py`); adding a new use case = new module + one registry line.
|
||
|
||
## 3. Data contracts
|
||
|
||
Subset of spec §2 / §9.3. Dropped fields are no-ops under the MVP's feature set.
|
||
|
||
### RequestIX
|
||
|
||
```python
|
||
class RequestIX(BaseModel):
|
||
use_case: str # registered name, e.g. "bank_statement_header"
|
||
ix_client_id: str # caller tag for logs/metrics, e.g. "mammon"
|
||
request_id: str # caller's correlation id; echoed back
|
||
ix_id: Optional[str] # transport-assigned short hex id
|
||
context: Context
|
||
options: Options = Options()
|
||
callback_url: Optional[str] # optional webhook delivery (one-shot, no retry)
|
||
|
||
class Context(BaseModel):
|
||
files: list[str] = [] # URLs or file:// paths
|
||
texts: list[str] = [] # extra text (e.g. Paperless OCR output)
|
||
|
||
class Options(BaseModel):
|
||
ocr: OCROptions = OCROptions()
|
||
gen_ai: GenAIOptions = GenAIOptions()
|
||
provenance: ProvenanceOptions = ProvenanceOptions()
|
||
|
||
class OCROptions(BaseModel):
|
||
use_ocr: bool = True
|
||
ocr_only: bool = False
|
||
include_ocr_text: bool = False
|
||
include_geometries: bool = False
|
||
service: Literal["surya"] = "surya" # kept so the adapter point is visible
|
||
|
||
class GenAIOptions(BaseModel):
|
||
gen_ai_model_name: Optional[str] = None # None → use-case default → IX_DEFAULT_MODEL
|
||
|
||
class ProvenanceOptions(BaseModel):
|
||
include_provenance: bool = True # default ON (reliability is the point)
|
||
max_sources_per_field: int = 10
|
||
```
|
||
|
||
**Dropped from spec (no-ops under MVP):** `OCROptions.computer_vision_scaling_factor`, `include_page_tags` (always on), `GenAIOptions.use_vision`/`vision_scaling_factor`/`vision_detail`/`reasoning_effort`, `ProvenanceOptions.granularity`/`include_bounding_boxes`/`source_type`/`min_confidence`, `RequestIX.version`.
|
||
|
||
### ResponseIX
|
||
|
||
Identical to spec §2.2 except `FieldProvenance` gains two fields:
|
||
|
||
```python
|
||
class FieldProvenance(BaseModel):
|
||
field_name: str
|
||
field_path: str
|
||
value: Any
|
||
sources: list[ExtractionSource]
|
||
confidence: Optional[float] = None # reserved; always None in MVP
|
||
provenance_verified: bool # NEW: cited segment actually contains value (normalized)
|
||
text_agreement: Optional[bool] # NEW: value appears in RequestIX.context.texts; None if no texts
|
||
```
|
||
|
||
`quality_metrics` gains two counters: `verified_fields`, `text_agreement_fields`.
|
||
|
||
### Job envelope (in `ix_jobs` table; returned by REST)
|
||
|
||
```python
|
||
class Job(BaseModel):
|
||
job_id: UUID
|
||
ix_id: str
|
||
client_id: str
|
||
request_id: str
|
||
status: Literal["pending", "running", "done", "error"]
|
||
request: RequestIX
|
||
response: Optional[ResponseIX]
|
||
callback_url: Optional[str]
|
||
callback_status: Optional[Literal["pending", "delivered", "failed"]]
|
||
attempts: int = 0
|
||
created_at: datetime
|
||
started_at: Optional[datetime]
|
||
finished_at: Optional[datetime]
|
||
```
|
||
|
||
## 4. Job store
|
||
|
||
```sql
|
||
CREATE DATABASE infoxtractor; -- on the existing postgis container
|
||
|
||
CREATE TABLE ix_jobs (
|
||
job_id UUID PRIMARY KEY,
|
||
ix_id TEXT NOT NULL,
|
||
client_id TEXT NOT NULL,
|
||
request_id TEXT NOT NULL,
|
||
status TEXT NOT NULL,
|
||
request JSONB NOT NULL,
|
||
response JSONB,
|
||
callback_url TEXT,
|
||
callback_status TEXT,
|
||
attempts INT NOT NULL DEFAULT 0,
|
||
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
|
||
started_at TIMESTAMPTZ,
|
||
finished_at TIMESTAMPTZ
|
||
);
|
||
CREATE INDEX ix_jobs_status_created ON ix_jobs (status, created_at) WHERE status = 'pending';
|
||
CREATE INDEX ix_jobs_client_request ON ix_jobs (client_id, request_id);
|
||
-- Postgres NOTIFY channel used by the pg_queue_adapter: 'ix_jobs_new'
|
||
```
|
||
|
||
Callers that prefer direct SQL (the `pg_queue_adapter` contract): insert a row with `status='pending'`, then `NOTIFY ix_jobs_new, '<job_id>'`. The worker also falls back to a 10 s poll so a missed notify or ix restart doesn't strand a job.
|
||
|
||
## 5. REST surface
|
||
|
||
| Method | Path | Purpose |
|
||
|---|---|---|
|
||
| `POST` | `/jobs` | Body = `RequestIX` (+ optional `callback_url`). → `201 {job_id, ix_id, status: "pending"}`. Idempotent on `(ix_client_id, request_id)` — same pair returns the existing `job_id` with `200`. |
|
||
| `GET` | `/jobs/{job_id}` | → full `Job`. Source of truth regardless of submission path or callback outcome. |
|
||
| `GET` | `/jobs?client_id=…&request_id=…` | Lookup-by-correlation (caller idempotency helper). Returns latest match or `404`. |
|
||
| `GET` | `/healthz` | `{ollama: ok/fail, postgres: ok/fail, ocr: ok/fail}`. Used by `infrastructure` monitoring dashboard. |
|
||
| `GET` | `/metrics` | Counters: `jobs_pending`, `jobs_running`, `jobs_done_24h`, `jobs_error_24h`, per-use-case avg seconds. Plain JSON, no Prometheus format for MVP. |
|
||
|
||
**Callback delivery** (when `callback_url` is set): one POST of the full `Job` body, 10 s timeout. 2xx → `callback_status='delivered'`. Anything else → `'failed'`. No retry. Callers always have `GET /jobs/{id}` as the authoritative fallback.
|
||
|
||
## 6. Pipeline steps
|
||
|
||
Interface per spec §3 (`async validate` + `async process`). Pipeline orchestration per spec §4 (first error aborts; each step wrapped in a `Timer` landing in `Metadata.timings`).
|
||
|
||
### SetupStep
|
||
|
||
- **validate**: `request_ix` non-null; `context.files` or `context.texts` non-empty.
|
||
- **process**:
|
||
- Copy `request_ix.context.texts` → `response_ix.context.texts`.
|
||
- Download each URL in `context.files` to `/tmp/ix/<ix_id>/` in parallel. MIME detection via `python-magic`. Supported: PDF, PNG, JPEG, TIFF. Unsupported → `IX_000_005`.
|
||
- Load use case: `request_cls, response_cls = REGISTRY[request_ix.use_case]`. Store instances in `response_ix.context.use_case_request` / `use_case_response`. Echo `use_case_request.use_case_name` → `response_ix.use_case_name`.
|
||
- Build flat `response_ix.context.pages`: one entry per PDF page (via PyMuPDF), one per image frame, one per text entry. Hard cap 100 pages/PDF → `IX_000_006` on violation.
|
||
|
||
### OCRStep
|
||
|
||
- **validate**: returns `True` iff `(use_ocr or ocr_only or include_geometries or include_ocr_text) and context.files`. Otherwise `False` → step skipped (text-only requests).
|
||
- **process**: `ocr_result = await OCRClient.ocr(context.pages)` → `response_ix.ocr_result`. Always inject `<page file="{item_index}" number="{page_no}">` tags (simplifies grounding). If `include_provenance`: build `SegmentIndex` (line granularity, normalized bboxes 0-1) and store in `context.segment_index`.
|
||
- **OCRClient interface**:
|
||
```python
|
||
class OCRClient(Protocol):
|
||
async def ocr(self, pages: list[Page]) -> OCRResult: ...
|
||
```
|
||
MVP implementation: `SuryaOCRClient` (GPU via `surya-ocr` PyPI package, CUDA on the RTX 3090).
|
||
|
||
### GenAIStep
|
||
|
||
- **validate**: `ocr_only` → `False` (skip). Use case must exist. OCR text or `context.texts` must be non-empty (else `IX_001_000`).
|
||
- **process**:
|
||
- System prompt = `use_case_request.system_prompt`. If `include_provenance`: append spec §9.2 citation instruction verbatim.
|
||
- User text: segment-tagged (`[p1_l0] …`) when provenance is on; plain concatenated OCR + texts otherwise.
|
||
- Response schema: `UseCaseResponse` directly, or the dynamic `ProvenanceWrappedResponse(result=..., segment_citations=...)` per spec §7.2e when provenance is on.
|
||
- Model: `request_ix.options.gen_ai.gen_ai_model_name` → `use_case_request.default_model` → `IX_DEFAULT_MODEL`.
|
||
- Call `GenAIClient.invoke(request_kwargs, response_schema)`; parsed model → `ix_result.result`, usage + model_name → `ix_result.meta_data`.
|
||
- If provenance: call `ProvenanceUtils.map_segment_refs_to_provenance(...)` per spec §9.4, write `response_ix.provenance`.
|
||
- **GenAIClient interface**:
|
||
```python
|
||
class GenAIClient(Protocol):
|
||
async def invoke(self, request_kwargs: dict, response_schema: type[BaseModel]) -> GenAIInvocationResult: ...
|
||
```
|
||
MVP implementation: `OllamaClient` — `POST http://host.docker.internal:11434/api/chat` with `format = <JSON schema from Pydantic>` (Ollama structured outputs).
|
||
|
||
### ReliabilityStep (new; runs when `include_provenance` is True)
|
||
|
||
For each `FieldProvenance` in `response_ix.provenance.fields`:
|
||
|
||
- **`provenance_verified`**: for each cited segment, compare `text_snippet` to the extracted `value` after normalization (see below). If any cited segment agrees → `True`. Else `False`.
|
||
- **`text_agreement`**: if `request_ix.context.texts` is empty → `None`. Else run the same comparison against the concatenated texts → `True` / `False`.
|
||
|
||
**Normalization rules** (cheap, language-neutral, applied to both sides before `in`-check):
|
||
|
||
- Strings: Unicode NFKC, casefold, collapse whitespace, strip common punctuation.
|
||
- Numbers (`int`, `float`, `Decimal` values): digits-and-sign only; strip currency symbols, thousands separators, decimal-separator variants (`.`/`,`); require exact match to 2 decimal places for amounts.
|
||
- Dates: parse to ISO `YYYY-MM-DD`; compare as strings. Accept common German / Swiss / US formats.
|
||
- IBANs: uppercase, strip spaces.
|
||
- Very short values (≤ 2 chars, or numeric |value| < 10): `text_agreement` skipped (returns `None`) — too noisy to be a useful signal.
|
||
|
||
Records are mutations to the provenance structure only; does **not** drop fields. Caller sees every extracted field + the flags.
|
||
|
||
Writes `quality_metrics.verified_fields` and `quality_metrics.text_agreement_fields` summary counts.
|
||
|
||
### ResponseHandlerStep
|
||
|
||
Per spec §8, unchanged. Attach flat OCR text when `include_ocr_text`; strip `ocr_result.pages` unless `include_geometries`; delete `context` before serialization.
|
||
|
||
## 7. Use case registry
|
||
|
||
```
|
||
ix/use_cases/
|
||
__init__.py # REGISTRY: dict[str, tuple[type[UseCaseRequest], type[UseCaseResponse]]]
|
||
bank_statement_header.py
|
||
```
|
||
|
||
Adding a use case = new module exporting a `Request(BaseModel)` (`use_case_name`, `default_model`, `system_prompt`) and a `UseCaseResponse(BaseModel)`, then one `REGISTRY["<name>"] = (Request, UseCaseResponse)` line.
|
||
|
||
### First use case: `bank_statement_header`
|
||
|
||
```python
|
||
class BankStatementHeader(BaseModel):
|
||
bank_name: str
|
||
account_iban: Optional[str]
|
||
account_type: Optional[Literal["checking", "credit", "savings"]]
|
||
currency: str
|
||
statement_date: Optional[date]
|
||
statement_period_start: Optional[date]
|
||
statement_period_end: Optional[date]
|
||
opening_balance: Optional[Decimal]
|
||
closing_balance: Optional[Decimal]
|
||
|
||
class Request(BaseModel):
|
||
use_case_name: str = "Bank Statement Header"
|
||
default_model: str = "gpt-oss:20b"
|
||
system_prompt: str = (
|
||
"You extract header metadata from a single bank or credit-card statement. "
|
||
"Return only facts that appear in the document; leave a field null if uncertain. "
|
||
"Balances must use the document's numeric format (e.g. '1234.56' or '-123.45'); "
|
||
"do not invent a currency symbol. Account type: 'checking' for current/Giro accounts, "
|
||
"'credit' for credit-card statements, 'savings' otherwise. Always return the IBAN "
|
||
"with spaces removed. Never fabricate a value to fill a required-looking field."
|
||
)
|
||
```
|
||
|
||
**Why these fields:** each appears at most once per document (one cite per field → strong `provenance_verified` signal); all reconcile against something mammon already stores (IBAN → `Account.iban`, period → verified-range chain, closing_balance → next month's opening_balance and `StatementBalance`); schema is flat (no nested arrays where Ollama structured output tends to drift).
|
||
|
||
## 8. Errors and warnings
|
||
|
||
Error-code subset from spec §12.2 (reusing codes as-is where meaning is identical):
|
||
|
||
| Code | Trigger |
|
||
|---|---|
|
||
| `IX_000_000` | `request_ix` is None |
|
||
| `IX_000_002` | No context (neither files nor texts) |
|
||
| `IX_000_004` | OCR required but no files provided |
|
||
| `IX_000_005` | File MIME type not supported |
|
||
| `IX_000_006` | PDF page-count cap exceeded |
|
||
| `IX_001_000` | `use_case` empty, or extraction context (OCR + texts) empty after setup |
|
||
| `IX_001_001` | Use case name not in `REGISTRY` |
|
||
|
||
Warnings (non-fatal, appended to `response_ix.warning`): empty OCR result, provenance requested with `use_ocr=False`, unknown model falling back to default.
|
||
|
||
## 9. Configuration (`AppConfig` via `pydantic-settings`)
|
||
|
||
| Key env var | Default | Meaning |
|
||
|---|---|---|
|
||
| `IX_POSTGRES_URL` | `postgresql+asyncpg://infoxtractor:…@host.docker.internal:5431/infoxtractor` | Job store |
|
||
| `IX_OLLAMA_URL` | `http://host.docker.internal:11434` | LLM backend |
|
||
| `IX_DEFAULT_MODEL` | `gpt-oss:20b` | Fallback model |
|
||
| `IX_OCR_ENGINE` | `surya` | Adapter selector (only value in MVP) |
|
||
| `IX_TMP_DIR` | `/tmp/ix` | Download scratch |
|
||
| `IX_PIPELINE_WORKER_CONCURRENCY` | `1` | Worker semaphore cap |
|
||
| `IX_PIPELINE_REQUEST_TIMEOUT_SECONDS` | `2700` | Per-job timeout (45 min) |
|
||
| `IX_RENDER_MAX_PIXELS_PER_PAGE` | `75000000` | Per-page render cap |
|
||
| `IX_LOG_LEVEL` | `INFO` | |
|
||
| `IX_CALLBACK_TIMEOUT_SECONDS` | `10` | Webhook POST timeout |
|
||
|
||
No Azure, OpenAI, or AWS variables — those paths do not exist in the codebase.
|
||
|
||
## 10. Observability (minimal)
|
||
|
||
- **Logs**: JSON-structured via `logging` + custom formatter. Every line carries `ix_id`, `client_id`, `request_id`, `use_case`. Steps emit `step_start` / `step_end` events with elapsed ms.
|
||
- **Timings**: every step's elapsed-seconds recorded in `response_ix.metadata.timings` (same shape as spec §2).
|
||
- **Traces**: OpenTelemetry span scaffolding present, no exporter wired. Drop-in later.
|
||
- **Prometheus**: deferred.
|
||
|
||
## 11. Deployment
|
||
|
||
- Repo: `goldstein/infoxtractor` on Forgejo, plus `server` bare-repo remote with `post-receive` hook mirroring mammon.
|
||
- Port 8994 (LAN-only via UFW; not exposed publicly — internal service).
|
||
- Postgres: new `infoxtractor` database on existing postgis container.
|
||
- Ollama reached via `host.docker.internal:11434`.
|
||
- Monitoring label: `infrastructure.web_url=http://192.168.68.42:8994`.
|
||
- Backup: `backup.enable=true`, `backup.type=postgres`, `backup.name=infoxtractor`.
|
||
- Dockerfile: CUDA-enabled base (`nvidia/cuda:12.4-runtime-ubuntu22.04` + Python 3.12) so Surya can use the 3090. CMD: `alembic upgrade head && uvicorn ix.app:create_app --factory --host 0.0.0.0 --port 8994`.
|
||
|
||
## 12. Testing strategy
|
||
|
||
Strict TDD — each unit is written test-first.
|
||
|
||
1. **Unit tests** (fast, hermetic): every `Step`, `SegmentIndex`, provenance-verification normalizers, `OCRClient` contract, `GenAIClient` contract, error mapping. No DB, no Ollama, no network.
|
||
2. **Integration tests** (DB + fakes): pipeline end-to-end with stub `OCRClient` (replays canned OCR results) and stub `GenAIClient` (replays canned LLM JSON). Covers step wiring + transports + job lifecycle + callback success/failure + pg queue notify. Run against a real postgres service container in Forgejo Actions (mammon CI pattern).
|
||
3. **E2E smoke against deployed app**: `scripts/e2e_smoke.py` on the Mac calls `POST http://192.168.68.42:8994/jobs` with a redacted bank-statement fixture (`tests/fixtures/dkb_giro_2026_03.pdf`), polls `GET /jobs/{id}` until done, asserts:
|
||
- `status == "done"`
|
||
- `provenance.fields["result.closing_balance"].provenance_verified is True`
|
||
- `text_agreement is True` when Paperless-style texts are submitted
|
||
- Timings under 60 s
|
||
Runs after every `git push server main` as the deploy gate. If it fails, the commit is reverted before merging the deploy PR.
|
||
|
||
## 13. Mammon integration (sketch — outside this spec's scope, owned by mammon)
|
||
|
||
Belongs in a mammon-side follow-up spec. Captured here only so readers of ix know the MVP's first consumer.
|
||
|
||
- Paperless poller keeps current behavior for format-matched docs.
|
||
- For `needs_parser` docs: submit to ix (`use_case="bank_statement_header"`, `files=[paperless_download_url]`, `texts=[paperless_content]`).
|
||
- ix job id recorded on the `Import` row. A new poller on the mammon side checks `GET /jobs/{id}` until done.
|
||
- Result is staged (new `pending_headers` table — not `StatementBalance`). A new "Investigate" panel surfaces staged headers with per-field `provenance_verified` + `text_agreement` flags.
|
||
- User confirms → write to `StatementBalance`. Over time, when a deterministic parser is added for the bank, compare ix's past extractions against the deterministic output to measure ix accuracy.
|
||
|
||
## 14. Deferred from full spec (explicit)
|
||
|
||
- Kafka transport (§15)
|
||
- Config Server (§9.1 in full spec, §10 here): use cases are in-repo for MVP
|
||
- Azure DI / Computer Vision OCR backends
|
||
- OpenAI, Anthropic, AWS Bedrock GenAI backends
|
||
- S3 adapter
|
||
- `use_vision` + vision scaling/detail
|
||
- Word-level provenance granularity
|
||
- `reasoning_effort` parameter routing
|
||
- Prometheus exporter (/metrics stays JSON for MVP)
|
||
- OTEL gRPC exporter (spans present, no exporter)
|
||
- Legacy aliases (`prompt_template_base`, `kwargs_use_case`)
|
||
- Second-opinion multi-model ensembling
|
||
- Schema `version` field
|
||
- Per-request rate limiting
|
||
|
||
Every deferred item is additive: the `OCRClient` / `GenAIClient` / transport-adapter interfaces already leave the plug points, and the pipeline core is unaware of which implementation is in use.
|
||
|
||
## 15. Implementation workflow (habit reminder)
|
||
|
||
Every unit of work follows the cross-project habit:
|
||
|
||
1. `git checkout -b feat/<name>`
|
||
2. TDD: write failing test, write code, green, refactor
|
||
3. Commit in small logical chunks; update `AGENTS.md` / `README.md` / `docs/` in the same commit as the code
|
||
4. `git push forgejo feat/<name>`
|
||
5. Create PR via Forgejo API
|
||
6. Wait for tests to pass
|
||
7. Merge
|
||
8. `git push server main` to deploy; run `scripts/e2e_smoke.py` against the live service
|
||
|
||
Never skip hooks, never force-push main, never bypass tests.
|