infoxtractor/src/ix/contracts/__init__.py
Dirk Riemann 02db3b05cc
All checks were successful
tests / test (push) Successful in 1m2s
tests / test (pull_request) Successful in 1m0s
feat(contracts): ResponseIX + Provenance + Job envelope (spec §3, §9.3)
Completes the data-contract layer. Highlights:

- `ResponseIX.context` is an internal mutable accumulator used by pipeline
  steps (pages, files, texts, use_case classes, segment index). It MUST NOT
  leak into the serialised response, so we mark the field with
  `Field(exclude=True)` and carry the shape in a small `_InternalContext`
  sub-model with `extra="allow"` so steps can stash arbitrary state without
  schema churn. Tested: `model_dump()` and `model_dump_json()` both drop it.

- `FieldProvenance` gains `provenance_verified: bool | None` and
  `text_agreement: bool | None` — the two MVP reliability flags written by
  the new ReliabilityStep. Both default None so rows predating the
  ReliabilityStep (empty LLM output, cloud-import replay) parse cleanly.

- `quality_metrics` stays a free-form `dict[str, Any]` — the MVP adds
  `verified_fields` and `text_agreement_fields` counters without carving
  them into the schema, which keeps future metric additions free.

- `Job.status` and `Job.callback_status` are `Literal[...]` so Pydantic
  rejects unknown states at the edge. Invariant
  (`status='done' iff response.error is None`) stays worker-enforced —
  callers sometimes hydrate in-flight rows and we do not want validation
  to reject them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 10:50:22 +02:00

59 lines
1.1 KiB
Python

"""Pydantic v2 data contracts shared by the pipeline, adapters, and store.
Re-exports the public symbols from sibling modules so call sites can write
``from ix.contracts import RequestIX`` without chasing the submodule layout.
"""
from __future__ import annotations
from ix.contracts.job import CallbackStatus, Job, JobStatus
from ix.contracts.provenance import (
BoundingBox,
ExtractionSource,
FieldProvenance,
ProvenanceData,
SegmentCitation,
)
from ix.contracts.request import (
Context,
FileRef,
GenAIOptions,
OCROptions,
Options,
ProvenanceOptions,
RequestIX,
)
from ix.contracts.response import (
IXResult,
Line,
Metadata,
OCRDetails,
OCRResult,
Page,
ResponseIX,
)
__all__ = [
"BoundingBox",
"CallbackStatus",
"Context",
"ExtractionSource",
"FieldProvenance",
"FileRef",
"GenAIOptions",
"IXResult",
"Job",
"JobStatus",
"Line",
"Metadata",
"OCRDetails",
"OCROptions",
"OCRResult",
"Options",
"Page",
"ProvenanceData",
"ProvenanceOptions",
"RequestIX",
"ResponseIX",
"SegmentCitation",
]