075a827b0e
The annotate-based v2 pipeline now handles releases ending in -KONTRAST, -ELiTE, or -RARBG. Unknown groups still fall through to the legacy SHITTY heuristic in services.py — nothing changes for them. Pipeline (alfred/domain/release/parser/pipeline.py): - tokenize(): string-ops separator split, strips [site.tag] first. - annotate(): right-to-left group detection (priority to codec-GROUP shape, fallback to any non-source dashed token), GroupSchema lookup via the kb port, then lockstep walk of tokens against schema chunks. Optional chunks skip on mismatch, mandatory mismatches return None so the caller falls back gracefully. CODEC pre-consumed by a codec-GROUP trailing token correctly skips the CODEC chunk in the body walk. - assemble(): folds annotated tokens into a ParsedRelease-compatible dict (title joined by '.', group from the codec-GROUP token's extras). Schema (alfred/domain/release/parser/schema.py): - GroupSchema + SchemaChunk frozen value objects. - TokenRole.GROUP added. Port + adapter: - ReleaseKnowledge.group_schema(name) lookup added (case-insensitive). - YamlReleaseKnowledge loads alfred/knowledge/release/release_groups/ *.yaml at construction time; learned overrides in data/knowledge/release/release_groups/ also picked up. Knowledge: - release_groups/kontrast.yaml, elite.yaml, rarbg.yaml declare the canonical chunk_order. ELiTE marks source as optional (Foundation.S02 has no WEBRip token). Services: - parse_release tries the v2 path first; on None falls through to the legacy implementation untouched. Tests: - tests/domain/release/test_parser_v2_easy.py (10 cases) cover group detection (codec-GROUP, dashed-source skip, no-dash → unknown), schema-driven annotation (movie, TV episode, season pack with optional source, unknown group returns None), and field assembly. - Existing tests/domain/test_release_fixtures.py (30 cases) stay green: 5 EASY fixtures now produced by v2, 25 SHITTY/PATH OF PAIN fixtures still produced by the legacy path. Verified via spy on v2.assemble. Suite: 1007 passed, 8 skipped. Refs: project_release_parser_v2_specs (memory)
48 lines
1.4 KiB
Python
48 lines
1.4 KiB
Python
"""Group schema value objects.
|
|
|
|
A :class:`GroupSchema` describes the canonical chunk layout of releases
|
|
from a known group (KONTRAST, RARBG, ELiTE, …). It is the EASY-road
|
|
contract: when a release ends in ``-<GROUP>`` and we know the group,
|
|
the annotator walks the schema instead of running the heuristic SHITTY
|
|
matchers.
|
|
|
|
Schemas are loaded from ``knowledge/release/release_groups/<group>.yaml``
|
|
by an infrastructure adapter and surfaced via the
|
|
:class:`~alfred.domain.release.ports.knowledge.ReleaseKnowledge` port.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
from dataclasses import dataclass
|
|
|
|
from .tokens import TokenRole
|
|
|
|
|
|
@dataclass(frozen=True)
|
|
class SchemaChunk:
|
|
"""One entry in a group's chunk order.
|
|
|
|
``role`` is the :class:`TokenRole` the chunk maps to. ``optional``
|
|
is True for chunks that may be absent (e.g. ``year`` on TV releases,
|
|
``source`` on bare ELiTE TV releases).
|
|
"""
|
|
|
|
role: TokenRole
|
|
optional: bool = False
|
|
|
|
|
|
@dataclass(frozen=True)
|
|
class GroupSchema:
|
|
"""Schema for a known release group.
|
|
|
|
``chunks`` is the left-to-right canonical order. The annotator walks
|
|
tokens and chunks in lockstep: an optional chunk that doesn't match
|
|
the current token is skipped (the chunk index advances, the token
|
|
index stays), a mandatory chunk that doesn't match aborts the EASY
|
|
path and falls back to SHITTY.
|
|
"""
|
|
|
|
name: str
|
|
separator: str
|
|
chunks: tuple[SchemaChunk, ...]
|