Files
alfred/alfred/domain/release/parser/schema.py
T
francwa 075a827b0e feat(release): wire v2 EASY path for known release groups
The annotate-based v2 pipeline now handles releases ending in -KONTRAST,
-ELiTE, or -RARBG. Unknown groups still fall through to the legacy
SHITTY heuristic in services.py — nothing changes for them.

Pipeline (alfred/domain/release/parser/pipeline.py):
- tokenize(): string-ops separator split, strips [site.tag] first.
- annotate(): right-to-left group detection (priority to codec-GROUP
  shape, fallback to any non-source dashed token), GroupSchema lookup
  via the kb port, then lockstep walk of tokens against schema chunks.
  Optional chunks skip on mismatch, mandatory mismatches return None so
  the caller falls back gracefully. CODEC pre-consumed by a codec-GROUP
  trailing token correctly skips the CODEC chunk in the body walk.
- assemble(): folds annotated tokens into a ParsedRelease-compatible
  dict (title joined by '.', group from the codec-GROUP token's extras).

Schema (alfred/domain/release/parser/schema.py):
- GroupSchema + SchemaChunk frozen value objects.
- TokenRole.GROUP added.

Port + adapter:
- ReleaseKnowledge.group_schema(name) lookup added (case-insensitive).
- YamlReleaseKnowledge loads alfred/knowledge/release/release_groups/
  *.yaml at construction time; learned overrides in
  data/knowledge/release/release_groups/ also picked up.

Knowledge:
- release_groups/kontrast.yaml, elite.yaml, rarbg.yaml declare the
  canonical chunk_order. ELiTE marks source as optional (Foundation.S02
  has no WEBRip token).

Services:
- parse_release tries the v2 path first; on None falls through to the
  legacy implementation untouched.

Tests:
- tests/domain/release/test_parser_v2_easy.py (10 cases) cover group
  detection (codec-GROUP, dashed-source skip, no-dash → unknown),
  schema-driven annotation (movie, TV episode, season pack with
  optional source, unknown group returns None), and field assembly.
- Existing tests/domain/test_release_fixtures.py (30 cases) stay green:
  5 EASY fixtures now produced by v2, 25 SHITTY/PATH OF PAIN fixtures
  still produced by the legacy path. Verified via spy on v2.assemble.

Suite: 1007 passed, 8 skipped.

Refs: project_release_parser_v2_specs (memory)
2026-05-20 00:21:11 +02:00

48 lines
1.4 KiB
Python

"""Group schema value objects.
A :class:`GroupSchema` describes the canonical chunk layout of releases
from a known group (KONTRAST, RARBG, ELiTE, …). It is the EASY-road
contract: when a release ends in ``-<GROUP>`` and we know the group,
the annotator walks the schema instead of running the heuristic SHITTY
matchers.
Schemas are loaded from ``knowledge/release/release_groups/<group>.yaml``
by an infrastructure adapter and surfaced via the
:class:`~alfred.domain.release.ports.knowledge.ReleaseKnowledge` port.
"""
from __future__ import annotations
from dataclasses import dataclass
from .tokens import TokenRole
@dataclass(frozen=True)
class SchemaChunk:
"""One entry in a group's chunk order.
``role`` is the :class:`TokenRole` the chunk maps to. ``optional``
is True for chunks that may be absent (e.g. ``year`` on TV releases,
``source`` on bare ELiTE TV releases).
"""
role: TokenRole
optional: bool = False
@dataclass(frozen=True)
class GroupSchema:
"""Schema for a known release group.
``chunks`` is the left-to-right canonical order. The annotator walks
tokens and chunks in lockstep: an optional chunk that doesn't match
the current token is skipped (the chunk index advances, the token
index stays), a mandatory chunk that doesn't match aborts the EASY
path and falls back to SHITTY.
"""
name: str
separator: str
chunks: tuple[SchemaChunk, ...]