Files
alfred/alfred/domain/release/parser/__init__.py
T
francwa 075a827b0e feat(release): wire v2 EASY path for known release groups
The annotate-based v2 pipeline now handles releases ending in -KONTRAST,
-ELiTE, or -RARBG. Unknown groups still fall through to the legacy
SHITTY heuristic in services.py — nothing changes for them.

Pipeline (alfred/domain/release/parser/pipeline.py):
- tokenize(): string-ops separator split, strips [site.tag] first.
- annotate(): right-to-left group detection (priority to codec-GROUP
  shape, fallback to any non-source dashed token), GroupSchema lookup
  via the kb port, then lockstep walk of tokens against schema chunks.
  Optional chunks skip on mismatch, mandatory mismatches return None so
  the caller falls back gracefully. CODEC pre-consumed by a codec-GROUP
  trailing token correctly skips the CODEC chunk in the body walk.
- assemble(): folds annotated tokens into a ParsedRelease-compatible
  dict (title joined by '.', group from the codec-GROUP token's extras).

Schema (alfred/domain/release/parser/schema.py):
- GroupSchema + SchemaChunk frozen value objects.
- TokenRole.GROUP added.

Port + adapter:
- ReleaseKnowledge.group_schema(name) lookup added (case-insensitive).
- YamlReleaseKnowledge loads alfred/knowledge/release/release_groups/
  *.yaml at construction time; learned overrides in
  data/knowledge/release/release_groups/ also picked up.

Knowledge:
- release_groups/kontrast.yaml, elite.yaml, rarbg.yaml declare the
  canonical chunk_order. ELiTE marks source as optional (Foundation.S02
  has no WEBRip token).

Services:
- parse_release tries the v2 path first; on None falls through to the
  legacy implementation untouched.

Tests:
- tests/domain/release/test_parser_v2_easy.py (10 cases) cover group
  detection (codec-GROUP, dashed-source skip, no-dash → unknown),
  schema-driven annotation (movie, TV episode, season pack with
  optional source, unknown group returns None), and field assembly.
- Existing tests/domain/test_release_fixtures.py (30 cases) stay green:
  5 EASY fixtures now produced by v2, 25 SHITTY/PATH OF PAIN fixtures
  still produced by the legacy path. Verified via spy on v2.assemble.

Suite: 1007 passed, 8 skipped.

Refs: project_release_parser_v2_specs (memory)
2026-05-20 00:21:11 +02:00

32 lines
1.4 KiB
Python

"""Release parser v2 — annotate-based pipeline.
This package is the future home of ``parse_release``. It restructures the
parsing logic around a **tokenize → annotate → assemble** pipeline:
1. **tokenize**: split the release name into atomic tokens.
2. **annotate**: walk tokens left-to-right, assigning each one a
:class:`TokenRole` (TITLE, YEAR, SEASON, RESOLUTION, …) using the
injected :class:`~alfred.domain.release.ports.knowledge.ReleaseKnowledge`.
3. **assemble**: fold the annotated tokens into a :class:`ParsedRelease`.
The pipeline has three internal paths driven by the detected release group:
- **EASY**: known group (KONTRAST, RARBG, …) with a schema-driven layout
declared in ``knowledge/release/release_groups/<group>.yaml``.
- **SHITTY**: unknown group, best-effort matching against the global
knowledge sets, with a 0-100 confidence score.
- **PATH OF PAIN**: score below threshold OR critical chunks missing —
signaled to the caller, who decides whether to involve the LLM/user.
Today the package exposes scaffolding only (token VOs and a thin pipeline
stub). The legacy ``parse_release`` in ``release.services`` keeps serving
production until each piece of the v2 pipeline is wired in.
"""
from __future__ import annotations
from .schema import GroupSchema, SchemaChunk
from .tokens import Token, TokenRole
__all__ = ["GroupSchema", "SchemaChunk", "Token", "TokenRole"]