feat(release): v2 enricher pass for audio/video-meta/edition/language
The EASY pipeline now extracts the full ParsedRelease surface from
known-group releases, not just the structural backbone. Behavior is
unchanged for releases that don't carry these tokens.
Pipeline (parser/pipeline.py):
- Structural walk (renamed _annotate_structural): no longer requires
body to be fully consumed. Tokens passed over between schema chunks
remain UNKNOWN so the enricher pass can claim them.
- _find_chunk(): scans forward in the body for the next token matching
a given role, skipping already-annotated tokens. Lets optional and
mandatory chunks both tolerate intercalated enricher tokens.
- _annotate_enrichers(): new non-positional pass. Walks UNKNOWN tokens
and tags AUDIO_CODEC / AUDIO_CHANNELS / BIT_DEPTH / HDR / EDITION /
LANGUAGE. Multi-token sequences from kb.audio / kb.video_meta /
kb.editions are matched first (longest-first ordering preserved from
the YAML), single tokens after.
- _apply_sequences(): mutates the token list, tagging the first token
of a matched sequence with extra['sequence']=<canonical value> and
trailing members with extra['sequence_member']='True' so assemble
skips them.
- _detect_channel_pairs(): handles the '5.1' / '7.1' case where the
'.' separator splits the layout into two tokens. Strips a trailing
'-GROUP' suffix on the second before joining.
Assemble:
- New fields populated: languages (list), audio_codec, audio_channels,
bit_depth, hdr_format, edition. Each role-handler skips
sequence_member tokens.
- media_type heuristic extended: edition in {COMPLETE, INTEGRALE,
COLLECTION} + no season → tv_complete (mirrors legacy).
Tests:
- 4 new TestEnrichers cases covering bit_depth+audio_codec+channels,
HDR sequence + edition sequence + TrueHD.Atmos + 7.1, multi-language
with DTS-HD.MA sequence, TV episode with single language.
- All 14 v2 tests + 30 fixture tests still green. Suite: 1011 passed,
8 skipped.
Refs: project_release_parser_v2_specs (memory)
This commit is contained in:
@@ -43,6 +43,19 @@ callers).
|
||||
annotation (movie, TV episode, season pack with optional source),
|
||||
and field assembly.
|
||||
|
||||
- **Release parser v2 — enricher pass** completes the EASY pipeline.
|
||||
The structural schema walk now tolerates non-positional tokens
|
||||
between chunks (instead of aborting on leftover tokens), and a second
|
||||
pass tags them with audio / video-meta / edition / language roles.
|
||||
Multi-token sequences from `audio.yaml`, `video.yaml`, `editions.yaml`
|
||||
(e.g. `DTS.HD.MA`, `DV.HDR10`, `TrueHD.Atmos`, `DIRECTORS.CUT`) are
|
||||
matched before single tokens. Channel layouts like `5.1` and `7.1`
|
||||
(split into two tokens by the `.` separator) are detected as
|
||||
consecutive pairs. Sequence members carry an `extra["sequence_member"]`
|
||||
marker so `assemble` extracts the canonical value only from the
|
||||
primary token. KONTRAST releases with audio / HDR / edition / language
|
||||
metadata now produce a fully populated `ParsedRelease`.
|
||||
|
||||
- **Real-world release fixtures** under `tests/fixtures/releases/{easy,shitty,path_of_pain}/`,
|
||||
each documenting an expected `ParsedRelease` plus the future `routing`
|
||||
(library / torrents / seed_hardlinks) for the upcoming `organize_media`
|
||||
|
||||
Reference in New Issue
Block a user