infoxtractor/src/ix/store/models.py
Dirk Riemann 1c60c30084
All checks were successful
tests / test (push) Successful in 1m15s
tests / test (pull_request) Successful in 1m2s
feat(store): Alembic scaffolding + initial ix_jobs migration (spec §4)
Lands the async-friendly Alembic env (NullPool, reads IX_POSTGRES_URL), the
hand-written 001 migration matching the spec's table layout exactly
(CHECK on status, partial index on pending rows, UNIQUE on
(client_id, request_id)), the SQLAlchemy 2.0 ORM mapping, and a lazy
engine/session factory. The factory reads the URL through ix.config when
available; Task 3.2 makes that the only path.

Smoke-tested: alembic upgrade head + downgrade base against a live
postgres:16 produce the expected table shape and tear down cleanly.
Unit tests assert the migration source contains every required column/index
so the migration can't drift from spec at import time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:37:21 +02:00

86 lines
3.2 KiB
Python

"""SQLAlchemy 2.0 ORM for ``ix_jobs``.
Shape matches the initial migration (``alembic/versions/001_initial_ix_jobs.py``)
which in turn matches spec §4. JSONB columns carry the RequestIX / ResponseIX
Pydantic payloads; we don't wrap them in custom TypeDecorators — the repo does
an explicit ``model_dump(mode="json")`` on write and ``model_validate`` on read
so the ORM stays a thin mapping layer and the Pydantic round-trip logic stays
colocated with the other contract code.
The status column is a plain string — the CHECK constraint in the DB enforces
the allowed values. Using a SQLAlchemy ``Enum`` type here would double-bind
the enum values on both sides and force a migration each time we add a state.
"""
from __future__ import annotations
from datetime import datetime
from typing import Any
from uuid import UUID
from sqlalchemy import CheckConstraint, DateTime, Index, Integer, Text, text
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.dialects.postgresql import UUID as PgUUID
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
class Base(DeclarativeBase):
"""Shared declarative base for the store package."""
class IxJob(Base):
"""ORM mapping for the ``ix_jobs`` table.
One row per submitted extraction job. Lifecycle: pending → running →
(done | error). The worker is the only writer that flips status past
pending; the REST / pg_queue adapters only insert.
"""
__tablename__ = "ix_jobs"
__table_args__ = (
CheckConstraint(
"status IN ('pending', 'running', 'done', 'error')",
name="ix_jobs_status_check",
),
CheckConstraint(
"callback_status IS NULL OR callback_status IN "
"('pending', 'delivered', 'failed')",
name="ix_jobs_callback_status_check",
),
Index(
"ix_jobs_status_created",
"status",
"created_at",
postgresql_where=text("status = 'pending'"),
),
Index(
"ix_jobs_client_request",
"client_id",
"request_id",
unique=True,
),
)
job_id: Mapped[UUID] = mapped_column(PgUUID(as_uuid=True), primary_key=True)
ix_id: Mapped[str] = mapped_column(Text, nullable=False)
client_id: Mapped[str] = mapped_column(Text, nullable=False)
request_id: Mapped[str] = mapped_column(Text, nullable=False)
status: Mapped[str] = mapped_column(Text, nullable=False)
request: Mapped[dict[str, Any]] = mapped_column(JSONB, nullable=False)
response: Mapped[dict[str, Any] | None] = mapped_column(JSONB, nullable=True)
callback_url: Mapped[str | None] = mapped_column(Text, nullable=True)
callback_status: Mapped[str | None] = mapped_column(Text, nullable=True)
attempts: Mapped[int] = mapped_column(
Integer, nullable=False, server_default=text("0")
)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True),
nullable=False,
server_default=text("now()"),
)
started_at: Mapped[datetime | None] = mapped_column(
DateTime(timezone=True), nullable=True
)
finished_at: Mapped[datetime | None] = mapped_column(
DateTime(timezone=True), nullable=True
)