alfred

Author	SHA1	Message	Date
francwa	b7979c0f8b	refactor(release): freeze ParsedRelease + enrich_from_probe returns new instance ParsedRelease is now @dataclass(frozen=True). The enrichment passes that used to patch fields in place now produce new instances: - enrich_from_probe(parsed, info, kb) returns a new ParsedRelease via dataclasses.replace (no allocation when no field changed). - inspect_release rebinds 'parsed' after detect_media_type (wrapped in MediaTypeToken — the strict isinstance check now also runs on replace) and after enrich_from_probe. languages becomes a tuple[str, ...] so the VO is properly immutable. Parser pipeline packs languages as a tuple in the assemble dict. Callers updated: inspect_release, testing/recognize_folders_in_downloads.py. Tests updated: 22 enrich_from_probe call sites rebound, language assertions switched to tuple literals, test_release_fixtures normalizes result['languages'] back to list for YAML-fixture comparison. Suite: 1077 passed.	2026-05-21 07:51:49 +02:00
francwa	e62dc90bd1	refactor(release): make tech_string a derived property ParsedRelease.tech_string was a stored str field re-computed in two places (assemble() at parse time, enrich_from_probe() after the probe). The second site was a reactive fix (`e79ca46`) for filename builders that saw a stale value. Turn it into an @property so it stays in sync with quality/source/codec by construction. - Drop the field from the dataclass + the key from assemble()'s dict. - Drop tech_string="" from parse_release's malformed-name fallback. - Drop the manual recomputation at the end of enrich_from_probe. - Inject the property into asdict() result in the fixtures runner (same treatment as is_season_pack). - Update tests that passed tech_string= to the constructor; rewrite the TestTechString case that mutated p.tech_string manually.	2026-05-21 07:33:53 +02:00
francwa	3737f66851	refactor(release): simplify SHITTY to dict-driven token tagging Replace the ~480-line legacy heuristic block in services.py with a small dict-driven pass in pipeline._annotate_shitty: each token is looked up against the kb buckets (resolutions / sources / codecs / distributors / year / sxxexx) with first-match-wins semantics, the leftmost contiguous UNKNOWN run becomes the title, done. SHITTY's scope is intentionally narrow — releases that look like scene names but don't have a registered group schema. Anything more exotic (parenthesized tech, bare-dashed title fragments, YT slugs, franchise boxes) is PATH OF PAIN territory and stays out of here. - annotate() no longer returns None; SHITTY is the always-on fallback - services.py shrunk from ~525 to ~85 lines (legacy extractors gone) - 4 fixtures get xfail markers documenting PoP-grade pathologies (deutschland franchise box, sleaford YT slug, super_mario bilingual, predator space-separators — the last one moved from shitty/ → pop/) - ReleaseFixture grows xfail_reason; the parametrized suite wires the pytest.mark.xfail(strict=False) automatically	2026-05-20 01:03:25 +02:00
francwa	7dc7f0c241	feat(release): v2 enricher pass for audio/video-meta/edition/language The EASY pipeline now extracts the full ParsedRelease surface from known-group releases, not just the structural backbone. Behavior is unchanged for releases that don't carry these tokens. Pipeline (parser/pipeline.py): - Structural walk (renamed _annotate_structural): no longer requires body to be fully consumed. Tokens passed over between schema chunks remain UNKNOWN so the enricher pass can claim them. - _find_chunk(): scans forward in the body for the next token matching a given role, skipping already-annotated tokens. Lets optional and mandatory chunks both tolerate intercalated enricher tokens. - _annotate_enrichers(): new non-positional pass. Walks UNKNOWN tokens and tags AUDIO_CODEC / AUDIO_CHANNELS / BIT_DEPTH / HDR / EDITION / LANGUAGE. Multi-token sequences from kb.audio / kb.video_meta / kb.editions are matched first (longest-first ordering preserved from the YAML), single tokens after. - _apply_sequences(): mutates the token list, tagging the first token of a matched sequence with extra['sequence']=<canonical value> and trailing members with extra['sequence_member']='True' so assemble skips them. - _detect_channel_pairs(): handles the '5.1' / '7.1' case where the '.' separator splits the layout into two tokens. Strips a trailing '-GROUP' suffix on the second before joining. Assemble: - New fields populated: languages (list), audio_codec, audio_channels, bit_depth, hdr_format, edition. Each role-handler skips sequence_member tokens. - media_type heuristic extended: edition in {COMPLETE, INTEGRALE, COLLECTION} + no season → tv_complete (mirrors legacy). Tests: - 4 new TestEnrichers cases covering bit_depth+audio_codec+channels, HDR sequence + edition sequence + TrueHD.Atmos + 7.1, multi-language with DTS-HD.MA sequence, TV episode with single language. - All 14 v2 tests + 30 fixture tests still green. Suite: 1011 passed, 8 skipped. Refs: project_release_parser_v2_specs (memory)	2026-05-20 00:26:05 +02:00
francwa	075a827b0e	feat(release): wire v2 EASY path for known release groups The annotate-based v2 pipeline now handles releases ending in -KONTRAST, -ELiTE, or -RARBG. Unknown groups still fall through to the legacy SHITTY heuristic in services.py — nothing changes for them. Pipeline (alfred/domain/release/parser/pipeline.py): - tokenize(): string-ops separator split, strips [site.tag] first. - annotate(): right-to-left group detection (priority to codec-GROUP shape, fallback to any non-source dashed token), GroupSchema lookup via the kb port, then lockstep walk of tokens against schema chunks. Optional chunks skip on mismatch, mandatory mismatches return None so the caller falls back gracefully. CODEC pre-consumed by a codec-GROUP trailing token correctly skips the CODEC chunk in the body walk. - assemble(): folds annotated tokens into a ParsedRelease-compatible dict (title joined by '.', group from the codec-GROUP token's extras). Schema (alfred/domain/release/parser/schema.py): - GroupSchema + SchemaChunk frozen value objects. - TokenRole.GROUP added. Port + adapter: - ReleaseKnowledge.group_schema(name) lookup added (case-insensitive). - YamlReleaseKnowledge loads alfred/knowledge/release/release_groups/ *.yaml at construction time; learned overrides in data/knowledge/release/release_groups/ also picked up. Knowledge: - release_groups/kontrast.yaml, elite.yaml, rarbg.yaml declare the canonical chunk_order. ELiTE marks source as optional (Foundation.S02 has no WEBRip token). Services: - parse_release tries the v2 path first; on None falls through to the legacy implementation untouched. Tests: - tests/domain/release/test_parser_v2_easy.py (10 cases) cover group detection (codec-GROUP, dashed-source skip, no-dash → unknown), schema-driven annotation (movie, TV episode, season pack with optional source, unknown group returns None), and field assembly. - Existing tests/domain/test_release_fixtures.py (30 cases) stay green: 5 EASY fixtures now produced by v2, 25 SHITTY/PATH OF PAIN fixtures still produced by the legacy path. Verified via spy on v2.assemble. Suite: 1007 passed, 8 skipped. Refs: project_release_parser_v2_specs (memory)	2026-05-20 00:21:11 +02:00

5 Commits