feat(pipeline): SetupStep (spec §6.1) #12

Merged
goldstein merged 1 commit from feat/step-setup into main 2026-04-18 09:14:19 +00:00
Owner

Chunk 2, Task 2.4.

First pipeline step: validates context, fetches files in parallel, sniffs MIMEs, loads use case, builds pages.

Tests

7 new tests (148 total). uv run pytest tests/unit -q -> 148 passed. uv run ruff check src tests -> clean.

Merge gate

Forgejo Actions trigger bug is still in effect — local test + ruff are the gate.

Chunk 2, Task 2.4. First pipeline step: validates context, fetches files in parallel, sniffs MIMEs, loads use case, builds pages. ## Tests 7 new tests (148 total). `uv run pytest tests/unit -q` -> 148 passed. `uv run ruff check src tests` -> clean. ## Merge gate Forgejo Actions trigger bug is still in effect — local test + ruff are the gate.
goldstein added 1 commit 2026-04-18 09:14:15 +00:00
feat(pipeline): SetupStep — validate + fetch + MIME + pages (spec §6.1)
All checks were successful
tests / test (push) Successful in 1m13s
tests / test (pull_request) Successful in 1m19s
97aa24f478
First pipeline step. Validates the request (IX_000_002 on empty context),
normalises every Context.files entry to a FileRef, downloads them in
parallel via asyncio.gather, byte-sniffs MIMEs (IX_000_005 for
unsupported), loads the use-case pair from REGISTRY (IX_001_001 on
miss), and builds the flat pages + page_metadata list on
response_ix.context.

Fetcher / ingestor / MIME detector / tmp_dir / fetch_config all inject
via the constructor so unit tests stay hermetic — production wires the
real ix.ingestion defaults via the app factory.

7 unit tests in tests/unit/test_setup_step.py cover validate errors,
happy path (fetcher + ingestor invoked correctly, context populated,
use_case_name echoed), FileRef headers pass through, unsupported MIME
-> IX_000_005, unknown use case -> IX_001_001, text-only request, and
the _InternalContext type assertion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
goldstein merged commit 632acdcd26 into main 2026-04-18 09:14:19 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: goldstein/infoxtractor#12
No description provided.