infoxtractor/src/ix/use_cases/bank_statement_header.py
Dirk Riemann 5ee74f367c
All checks were successful
tests / test (push) Successful in 1m52s
tests / test (pull_request) Successful in 1m45s
chore(model): switch default IX_DEFAULT_MODEL to qwen3:14b (already on host)
The home server's Ollama doesn't have gpt-oss:20b pulled; qwen3:14b is
already there and is what mammon's chat agent uses. Switching the default
now so the first deploy passes the /healthz ollama probe without an extra
`ollama pull` step. The spec lists gpt-oss:20b as a concrete example;
qwen3:14b is equally on-prem and Ollama-structured-output-compatible.

Touched: AppConfig default, BankStatementHeader Request.default_model,
.env.example, setup_server.sh ollama-list check, AGENTS.md, deployment.md,
live tests. Unit tests that hard-coded the old model string but don't
assert the default were left alone.

Also: ASCII en-dash in e2e_smoke.py Paperless-style text (ruff RUF001).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:20:23 +02:00

53 lines
2.1 KiB
Python

"""`bank_statement_header` — first (and, for MVP, only) use case.
Shape mirrors spec §7. The module defines the pair of Pydantic models
(``Request`` = prompt/model config, ``BankStatementHeader`` = extraction
schema) without registering itself — registration happens in
:mod:`ix.use_cases` so import-time side effects stay out.
All header fields are ``Optional`` except ``bank_name`` and ``currency``;
the spec lets every other field be null when the document doesn't show it.
The flat (no-nested-list) schema is chosen because Ollama's structured
output stays most reliable when the top level contains only scalars.
"""
from __future__ import annotations
from datetime import date
from decimal import Decimal
from typing import Literal
from pydantic import BaseModel, ConfigDict
class Request(BaseModel):
"""Prompt + default-model config for this use case."""
model_config = ConfigDict(extra="forbid")
use_case_name: str = "Bank Statement Header"
default_model: str = "qwen3:14b"
system_prompt: str = (
"You extract header metadata from a single bank or credit-card statement. "
"Return only facts that appear in the document; leave a field null if uncertain. "
"Balances must use the document's numeric format (e.g. '1234.56' or '-123.45'); "
"do not invent a currency symbol. Account type: 'checking' for current/Giro accounts, "
"'credit' for credit-card statements, 'savings' otherwise. Always return the IBAN "
"with spaces removed. Never fabricate a value to fill a required-looking field."
)
class BankStatementHeader(BaseModel):
"""Extraction schema for the bank-statement header fields."""
model_config = ConfigDict(extra="forbid")
bank_name: str
account_iban: str | None = None
account_type: Literal["checking", "credit", "savings"] | None = None
currency: str
statement_date: date | None = None
statement_period_start: date | None = None
statement_period_end: date | None = None
opening_balance: Decimal | None = None
closing_balance: Decimal | None = None