feat(release): v2 enricher pass for audio/video-meta/edition/language

The EASY pipeline now extracts the full ParsedRelease surface from known-group releases, not just the structural backbone. Behavior is unchanged for releases that don't carry these tokens. Pipeline (parser/pipeline.py): - Structural walk (renamed _annotate_structural): no longer requires body to be fully consumed. Tokens passed over between schema chunks remain UNKNOWN so the enricher pass can claim them. - _find_chunk(): scans forward in the body for the next token matching a given role, skipping already-annotated tokens. Lets optional and mandatory chunks both tolerate intercalated enricher tokens. - _annotate_enrichers(): new non-positional pass. Walks UNKNOWN tokens and tags AUDIO_CODEC / AUDIO_CHANNELS / BIT_DEPTH / HDR / EDITION / LANGUAGE. Multi-token sequences from kb.audio / kb.video_meta / kb.editions are matched first (longest-first ordering preserved from the YAML), single tokens after. - _apply_sequences(): mutates the token list, tagging the first token of a matched sequence with extra['sequence']=<canonical value> and trailing members with extra['sequence_member']='True' so assemble skips them. - _detect_channel_pairs(): handles the '5.1' / '7.1' case where the '.' separator splits the layout into two tokens. Strips a trailing '-GROUP' suffix on the second before joining. Assemble: - New fields populated: languages (list), audio_codec, audio_channels, bit_depth, hdr_format, edition. Each role-handler skips sequence_member tokens. - media_type heuristic extended: edition in {COMPLETE, INTEGRALE, COLLECTION} + no season → tv_complete (mirrors legacy). Tests: - 4 new TestEnrichers cases covering bit_depth+audio_codec+channels, HDR sequence + edition sequence + TrueHD.Atmos + 7.1, multi-language with DTS-HD.MA sequence, TV episode with single language. - All 14 v2 tests + 30 fixture tests still green. Suite: 1011 passed, 8 skipped. Refs: project_release_parser_v2_specs (memory)
2026-05-20 00:26:05 +02:00
parent 075a827b0e
commit 7dc7f0c241
3 changed files with 446 additions and 172 deletions
@@ -140,3 +140,65 @@ class TestAssemble:
        assert fields["source"] is None  # ELiTE omits it
        assert fields["tech_string"] == "1080p.x265"
        assert fields["group"] == "ELiTE"
+
+
+class TestEnrichers:
+    """Non-positional roles populated alongside the structural walk.
+
+    These releases would have failed the v2 EASY path before the enricher
+    pass landed (leftover unknown tokens would force a fallback). They
+    now succeed in v2 with rich metadata.
+    """
+
+    def test_bit_depth_and_audio(self) -> None:
+        name = "Back.in.Action.2025.1080p.WEBRip.10bit.DDP.5.1.x265-KONTRAST"
+        tokens, tag = tokenize(name, _KB)
+        annotated = annotate(tokens, _KB)
+        assert annotated is not None
+        fields = assemble(annotated, tag, name, _KB)
+
+        assert fields["title"] == "Back.in.Action"
+        assert fields["bit_depth"] == "10bit"
+        assert fields["audio_codec"] == "DDP"
+        assert fields["audio_channels"] == "5.1"
+
+    def test_hdr_sequence(self) -> None:
+        # DV.HDR10 sequence + TrueHD.Atmos sequence + 7.1 channels +
+        # DIRECTORS.CUT edition all in one release.
+        name = (
+            "Some.Movie.2024.DIRECTORS.CUT.2160p.BluRay.DV.HDR10."
+            "TrueHD.Atmos.7.1.x265-KONTRAST"
+        )
+        tokens, tag = tokenize(name, _KB)
+        annotated = annotate(tokens, _KB)
+        assert annotated is not None
+        fields = assemble(annotated, tag, name, _KB)
+
+        assert fields["edition"] == "DIRECTORS.CUT"
+        assert fields["hdr_format"] == "DV.HDR10"
+        assert fields["audio_codec"] == "TrueHD.Atmos"
+        assert fields["audio_channels"] == "7.1"
+
+    def test_multiple_languages(self) -> None:
+        name = "Movie.2020.FRENCH.MULTI.1080p.WEBRip.DTS.HD.MA.5.1.x265-KONTRAST"
+        tokens, tag = tokenize(name, _KB)
+        annotated = annotate(tokens, _KB)
+        assert annotated is not None
+        fields = assemble(annotated, tag, name, _KB)
+
+        assert fields["languages"] == ["FRENCH", "MULTI"]
+        assert fields["audio_codec"] == "DTS-HD.MA"
+        assert fields["audio_channels"] == "5.1"
+
+    def test_tv_with_language(self) -> None:
+        name = "Show.S01E05.FRENCH.1080p.WEBRip.x265-KONTRAST"
+        tokens, tag = tokenize(name, _KB)
+        annotated = annotate(tokens, _KB)
+        assert annotated is not None
+        fields = assemble(annotated, tag, name, _KB)
+
+        assert fields["title"] == "Show"
+        assert fields["season"] == 1
+        assert fields["episode"] == 5
+        assert fields["languages"] == ["FRENCH"]
+        assert fields["media_type"] == "tv_show"