075a827b0e
The annotate-based v2 pipeline now handles releases ending in -KONTRAST, -ELiTE, or -RARBG. Unknown groups still fall through to the legacy SHITTY heuristic in services.py — nothing changes for them. Pipeline (alfred/domain/release/parser/pipeline.py): - tokenize(): string-ops separator split, strips [site.tag] first. - annotate(): right-to-left group detection (priority to codec-GROUP shape, fallback to any non-source dashed token), GroupSchema lookup via the kb port, then lockstep walk of tokens against schema chunks. Optional chunks skip on mismatch, mandatory mismatches return None so the caller falls back gracefully. CODEC pre-consumed by a codec-GROUP trailing token correctly skips the CODEC chunk in the body walk. - assemble(): folds annotated tokens into a ParsedRelease-compatible dict (title joined by '.', group from the codec-GROUP token's extras). Schema (alfred/domain/release/parser/schema.py): - GroupSchema + SchemaChunk frozen value objects. - TokenRole.GROUP added. Port + adapter: - ReleaseKnowledge.group_schema(name) lookup added (case-insensitive). - YamlReleaseKnowledge loads alfred/knowledge/release/release_groups/ *.yaml at construction time; learned overrides in data/knowledge/release/release_groups/ also picked up. Knowledge: - release_groups/kontrast.yaml, elite.yaml, rarbg.yaml declare the canonical chunk_order. ELiTE marks source as optional (Foundation.S02 has no WEBRip token). Services: - parse_release tries the v2 path first; on None falls through to the legacy implementation untouched. Tests: - tests/domain/release/test_parser_v2_easy.py (10 cases) cover group detection (codec-GROUP, dashed-source skip, no-dash → unknown), schema-driven annotation (movie, TV episode, season pack with optional source, unknown group returns None), and field assembly. - Existing tests/domain/test_release_fixtures.py (30 cases) stay green: 5 EASY fixtures now produced by v2, 25 SHITTY/PATH OF PAIN fixtures still produced by the legacy path. Verified via spy on v2.assemble. Suite: 1007 passed, 8 skipped. Refs: project_release_parser_v2_specs (memory)
32 lines
1.4 KiB
Python
32 lines
1.4 KiB
Python
"""Release parser v2 — annotate-based pipeline.
|
|
|
|
This package is the future home of ``parse_release``. It restructures the
|
|
parsing logic around a **tokenize → annotate → assemble** pipeline:
|
|
|
|
1. **tokenize**: split the release name into atomic tokens.
|
|
2. **annotate**: walk tokens left-to-right, assigning each one a
|
|
:class:`TokenRole` (TITLE, YEAR, SEASON, RESOLUTION, …) using the
|
|
injected :class:`~alfred.domain.release.ports.knowledge.ReleaseKnowledge`.
|
|
3. **assemble**: fold the annotated tokens into a :class:`ParsedRelease`.
|
|
|
|
The pipeline has three internal paths driven by the detected release group:
|
|
|
|
- **EASY**: known group (KONTRAST, RARBG, …) with a schema-driven layout
|
|
declared in ``knowledge/release/release_groups/<group>.yaml``.
|
|
- **SHITTY**: unknown group, best-effort matching against the global
|
|
knowledge sets, with a 0-100 confidence score.
|
|
- **PATH OF PAIN**: score below threshold OR critical chunks missing —
|
|
signaled to the caller, who decides whether to involve the LLM/user.
|
|
|
|
Today the package exposes scaffolding only (token VOs and a thin pipeline
|
|
stub). The legacy ``parse_release`` in ``release.services`` keeps serving
|
|
production until each piece of the v2 pipeline is wired in.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
from .schema import GroupSchema, SchemaChunk
|
|
from .tokens import Token, TokenRole
|
|
|
|
__all__ = ["GroupSchema", "SchemaChunk", "Token", "TokenRole"]
|