alfred

Author	SHA1	Message	Date
francwa	02e478a157	refactor(domain): freeze Movie and Episode, switch track collections to tuple Movie and Episode become @dataclass(frozen=True, eq=False), with audio_tracks/subtitle_tracks held as tuple[...] instead of list[...]. Identity-based equality is preserved via the existing __eq__/__hash__. __post_init__ coercion (imdb_id, title, season_number, episode_number) uses object.__setattr__ to stay compatible with frozen. The MediaWithTracks mixin contract is updated to tuple accordingly. Callers projecting enrichment results (probe output, file metadata) now rebuild via dataclasses.replace(...) — same pattern recently adopted for ParsedRelease. Season and TVShow stay mutable for now: freezing the aggregate root would cascade a full reconstruction on every add_episode, deferred.	2026-05-21 13:40:22 +02:00
francwa	3dc73a5214	feat(release): add fullwidth vertical bar ｜ (U+FF5C) to separators CJK release names sometimes use the fullwidth vertical bar as a token separator, as do occasional decorative YouTube-style uploads. Adding the codepoint to separators.yaml lets the tokenizer split on it instead of leaving the wide pipe glued onto an adjacent token. The tokenizer in alfred/domain/release/parser/pipeline.py iterates the separator list as plain strings (no regex), so a multi-byte UTF-8 separator works without any code change.	2026-05-21 08:05:56 +02:00
francwa	88f156b7a4	refactor(subtitles): rename SubtitleCandidate → SubtitleScanResult The old name conflated 'might become a placed subtitle' with 'what a scan pass produced'. The class is the output of a scan/identify pass — language/format may still be None while classification is in progress, confidence reflects classifier certainty, raw_tokens holds filename fragments under analysis. SubtitleScanResult says that directly. Pure rename + refreshed docstring; no behavior change. Touches the domain entity, the matcher/identifier/utils services, the manage_subtitles use case, the placer, the metadata store, the shared-media cross-ref comment, and 7 test modules.	2026-05-21 08:05:46 +02:00
francwa	5107cb32c0	feat(release): InspectedResult.recommended_action centralizes exclusion decision Add a derived 'recommended_action' property on InspectedResult that collapses the orchestrator's go / wait / skip decision into one value: - 'skip' → no main_video, or media_type == 'other' - 'ask_user' → media_type == 'unknown', or road == 'path_of_pain' - 'process' → confident parse with a main video on disk The ordering is part of the contract (skip > ask_user > process) — documented in the property docstring. Until now every consumer (workflows, the agent, the orchestrator sketch) had to re-derive this from the road / media_type / main_video triple, with subtle drift between sites. One place, one rule. Exposed through the analyze_release tool so the LLM can route on it. Spec YAML updated to describe the new field. Suite: 1083 passed (+6 new tests in tests/application/test_inspect.py covering the four branches and the precedence rules).	2026-05-21 07:54:17 +02:00
francwa	b7979c0f8b	refactor(release): freeze ParsedRelease + enrich_from_probe returns new instance ParsedRelease is now @dataclass(frozen=True). The enrichment passes that used to patch fields in place now produce new instances: - enrich_from_probe(parsed, info, kb) returns a new ParsedRelease via dataclasses.replace (no allocation when no field changed). - inspect_release rebinds 'parsed' after detect_media_type (wrapped in MediaTypeToken — the strict isinstance check now also runs on replace) and after enrich_from_probe. languages becomes a tuple[str, ...] so the VO is properly immutable. Parser pipeline packs languages as a tuple in the assemble dict. Callers updated: inspect_release, testing/recognize_folders_in_downloads.py. Tests updated: 22 enrich_from_probe call sites rebound, language assertions switched to tuple literals, test_release_fixtures normalizes result['languages'] back to list for YAML-fixture comparison. Suite: 1077 passed.	2026-05-21 07:51:49 +02:00
francwa	9f1ce94690	refactor(application): inject kb/prober into resolve_destination use cases Remove the module-level _KB / _PROBER singletons from alfred/application/filesystem/resolve_destination.py. The four resolve_{season,episode,movie,series}_destination use cases now take kb: ReleaseKnowledge and prober: MediaProber as required arguments, matching the shape of inspect_release. The singletons now live at the agent-tools frontier (alfred/agent/tools/filesystem.py), where the LLM-facing wrappers instantiate YamlReleaseKnowledge / FfprobeMediaProber once and thread them through. The wrappers' Python signatures are unchanged — the inspect-based JSON-schema generator in agent/registry.py still sees the same LLM-passable params. analyze_release drops the dirty 'from ... import _KB' indirection. Tests inject their own stubs by keyword (prober=_StubProber(...)) via thin convenience wrappers, replacing the prior monkeypatch.setattr(rd, '_PROBER', ...) pattern. testing/debug_release.py: instantiate YamlReleaseKnowledge() / FfprobeMediaProber() inline at the two call sites. Suite: 1077 passed.	2026-05-21 07:46:13 +02:00
francwa	5e0ed11672	refactor(release): rename ParsePath enum to TokenizationRoute ParsePath collided with pathlib.Path in mental models, and was one letter from the parse_path attribute that stores its value — confusion on confusion. Road (EASY/SHITTY/PATH_OF_PAIN) is the parser-confidence axis; TokenizationRoute (DIRECT/SANITIZED/AI) is the tokenization-method axis. They're orthogonal and the new name makes that obvious. Field name parse_path stays — it's the right name for the attribute that holds the route. String values ("direct", "sanitized", "ai") stay too, so YAML fixtures and the analyze_release tool spec are unchanged. Only the type symbol changes: - value_objects.py: class rename + docstring spelling out orthogonality with Road. - services.py: 3 call sites. - scoring.py: docstring cross-reference updated. - tests/domain/release/test_parser_v2_scoring.py: import + 3 call sites.	2026-05-21 07:39:42 +02:00
francwa	0246f85ef8	refactor(release): move codec mappings from code to YAML knowledge The three module-level dicts in enrich_from_probe (ffprobe codec name to scene token, channel count to layout) were exactly the kind of domain lookup table CLAUDE.md says belongs in YAML, not in Python. Move them to alfred/knowledge/release/probe_mappings.yaml, load through a new ReleaseKnowledge.probe_mappings port field, and add a kb parameter to enrich_from_probe so the consumer reads the maps via the same injection pattern as everything else. - New knowledge file: alfred/knowledge/release/probe_mappings.yaml - New loader: load_probe_mappings() in infrastructure/knowledge/release.py (normalizes channel-count keys back to int). - Port: ReleaseKnowledge gains probe_mappings: dict. - Adapter: YamlReleaseKnowledge populates it at __init__. - Consumer: enrich_from_probe(parsed, info, kb) reads the three sub-maps from kb.probe_mappings; unknown codecs still fall back to uppercase raw value, same behaviour as before. - Call sites updated: inspect_release passes kb through; the testing script gets its kb wiring (it was already broken since the ReleaseKnowledge refactor); all 22 enrich_from_probe call sites in tests/application/test_enrich_from_probe.py pass _KB.	2026-05-21 07:37:42 +02:00
francwa	e62dc90bd1	refactor(release): make tech_string a derived property ParsedRelease.tech_string was a stored str field re-computed in two places (assemble() at parse time, enrich_from_probe() after the probe). The second site was a reactive fix (`e79ca46`) for filename builders that saw a stale value. Turn it into an @property so it stays in sync with quality/source/codec by construction. - Drop the field from the dataclass + the key from assemble()'s dict. - Drop tech_string="" from parse_release's malformed-name fallback. - Drop the manual recomputation at the end of enrich_from_probe. - Inject the property into asdict() result in the fixtures runner (same treatment as is_season_pack). - Update tests that passed tech_string= to the constructor; rewrite the TestTechString case that mutated p.tech_string manually.	2026-05-21 07:33:53 +02:00
francwa	688c37bbec	docs(changelog): recap session 2026-05-20 tech-debt cleanup Consolidate the five domain-purity refactors of the session under [Unreleased]: RuleScopeLevel enum, FilePath VO post_init, Language strict + from_raw, ParsedRelease.normalised → clean, ParsedRelease enum strictness. Removes the duplicate min_movie_size_bytes entry (now sits under its proper Removed section).	2026-05-20 23:57:06 +02:00
francwa	757e4045ee	refactor(release): ParsedRelease.media_type & parse_path are strict enums The fields were already typed as MediaTypeToken / ParsePath, but a tolerant __post_init__ coerced raw strings into their enum form. With MediaTypeToken(str, Enum) (and ParsePath idem), the coercion served no purpose — callers that pass '.value' got back the enum anyway, and callers that pass an unknown string got a ValidationError just like they would now. Strict mode: constructor rejects non-enum values directly. The two in-tree builders (parse_release() and the parser pipeline) already produce enum values; all .value sites have been removed. Drops the unused _VALID_MEDIA_TYPES / _VALID_PARSE_PATHS lookup tables.	2026-05-20 23:52:30 +02:00
francwa	c3767aacb6	refactor(release): rename ParsedRelease.normalised → clean Le champ s'appelait normalised mais ne faisait pas la normalisation suggérée par son nom (dots instead of spaces). En pratique il contient raw - site_tag - apostrophes, qui sert uniquement à season_folder_name() via _strip_episode_from_normalized. Renommé en 'clean' qui décrit ce qu'il contient réellement, docstring corrigée.	2026-05-20 23:50:05 +02:00
francwa	5bcf22b408	refactor(shared): Language VO is strict; from_raw() factory for un-normalized input object.__setattr__ inside __post_init__ on a frozen dataclass is a code smell — it bypasses the immutability guarantee to mutate fields mid-construction. Split the responsibilities: * Direct constructor is strict — rejects un-normalized input (uppercase iso, whitespace in aliases, etc.) so once a Language exists in the system, its fields are guaranteed canonical. * Language.from_raw() factory handles arbitrary YAML/user input — it lowercases the iso, dedups/normalizes aliases, then constructs. Only caller that built from raw data (LanguageRegistry loading YAML) moves to from_raw(). Test fixtures already pass normalized data so they keep using the direct constructor.	2026-05-20 23:48:30 +02:00
francwa	cfa9f54d9f	refactor(shared): FilePath VO uses __post_init__ instead of custom __init__ Custom __init__ on a @dataclass(frozen=True) is a code smell — it bypasses the generated dataclass __init__ and re-implements the str/Path coercion + frozen-aware setattr by hand. Replaced with a single __post_init__ that performs the same normalization. Same public API (FilePath(str) and FilePath(Path) both work), same behavior, no callers touched.	2026-05-20 23:47:03 +02:00
francwa	f0aaf50c97	refactor(subtitles): RuleScope.level → RuleScopeLevel enum Six niveaux possibles (global, release_group, movie, show, season, episode) étaient passés en str libre, le commentaire docstring servant de seule documentation. Introduit RuleScopeLevel(str, Enum) — toujours sérialisable en YAML, mais le set fixe est désormais imposé par le typage. to_dict() sort explicitement .value pour rester safe côté écrivains YAML.	2026-05-20 23:46:22 +02:00
francwa	a09262b33f	chore(settings): remove unused min_movie_size_bytes Le champ + son validator étaient orphelins depuis la suppression de MovieService.validate_movie_file. L'exclusion par extension (application/release/supported_media.py) + le PoP couvrent désormais la règle 'vrai film vs sample'. Si on a un jour besoin d'un seuil de taille, il ira dans data/knowledge/, pas dans settings.	2026-05-20 23:41:41 +02:00
francwa	9c7cd66d2b	Merge branch 'refactor/flatten-shared-media'	2026-05-20 23:35:52 +02:00
francwa	83dbed887b	refactor(domain): flatten shared/media package into single module Six small files (audio, video, subtitle, info, matching, tracks_mixin + __init__) collapsed into one ~250 LoC media.py module. Python treats media.py and media/__init__.py interchangeably, so the 12 import sites that read 'from alfred.domain.shared.media import ...' continue to work without changes. Reasoning: the whole bounded context fits on one screen; splitting into sub-modules added more navigation friction than it saved. Tests stay green (1077 passed).	2026-05-20 23:35:49 +02:00
francwa	0c9489e16b	Merge branch 'feat/parser-phase-d'	2026-05-20 23:30:36 +02:00
francwa	621bb96995	fix(release/parser): pre-strip apostrophes so titles like Don't parse cleanly Apostrophes are in the forbidden-chars list, which made any release with a title like "Don't" or "L'avare" short-circuit to the AI fallback (parse_path=ai, everything UNKNOWN). They are now stripped up front from the name before the well-formed check and tokenize, so the parse completes normally. The raw name is preserved on the VO; only the title field loses its apostrophe. parse_path becomes 'sanitized' when an apostrophe was stripped, to surface that the parser cleaned something up. Fixtures updated: - shitty/honey_uhd_hdr/ — went from total UNKNOWN to a clean parse (title=Honey.Dont, year=2025, quality=2160p, source=WEBRip, codec=x265, group=Amen). - path_of_pain/the_prodigy_full_chaos/ — went from total failure to partial success (title, year, source, codec extracted). Remaining gaps (1080i, multi-word audio, Blu-ray-with-dash) are tracked separately in tech debt.	2026-05-20 23:29:10 +02:00
francwa	448ef3b79c	fix(release/parser): recognize Sxx-yy season range as tv_complete `Der.Tatortreiniger.S01-06.GERMAN...` previously parsed as a movie with 'S01-06' glued to the title. The parser now matches the season-range form in _parse_season_episode (returning season=first, episode=None), and the assemble step detects the range token to promote media_type to 'tv_complete'. The first season is exposed as `season` so `is_season_pack` fires (season is not None and episode is None) — useful for routing to a series root folder. Fixture shitty/tatortreiniger_flat_multiseason/ updated: - title: Der.Tatortreiniger.S01-06 → Der.Tatortreiniger - season: null → 1 - media_type: movie → tv_complete - is_season_pack: false → true	2026-05-20 23:26:40 +02:00
francwa	b1c7f35ffb	fix(release/parser): drop pure-punctuation TITLE tokens at assembly Releases using ' - ' as a separator (Vinyl - 1x01 - FHD) tokenize to ['Vinyl', '-', '1x01', '-', 'FHD'] — the standalone '-' tokens were ending up in title_parts and leaked into the joined title ('Vinyl.-'). We can't add '-' to the separator list (it would break codec-GROUP), so we filter at assembly: a TITLE token with no alphanumeric characters carries no title content. Side win: same logic eliminates the UTF-8 wide-pipe '｜' from the khruangbin_yt_wide_pipe fixture title. Fixtures updated: - shitty/vinyl_1x01_format/expected.yaml (title: Vinyl.- → Vinyl) - path_of_pain/khruangbin_yt_wide_pipe/expected.yaml (｜ dropped)	2026-05-20 23:24:40 +02:00
francwa	5bbdc9081f	fix(release/parser): collapse chained multi-episode markers to full range S14E09E10E11 previously parsed to episode=9, episode_end=10 — E11 was silently dropped. The parser now takes episodes[-1] as episode_end so the full chain is captured (episode=9, episode_end=11). Intermediate values stay implied. Fixture shitty/archer_multi_episode/ updated from anti-regression of the bug to anti-regression of the fix.	2026-05-20 23:23:08 +02:00
francwa	5d7b214af2	Merge branch 'refactor/language-port'	2026-05-20 23:20:18 +02:00
francwa	18267d0165	refactor(language): LanguageRepository port + SubtitleKnowledgeBase wired to it Mirror the MediaProber / FilesystemScanner pattern for language lookup: - New Protocol `LanguageRepository` in alfred.domain.shared.ports covering from_iso, from_any, all, __contains__, __len__ — the surface previously coupled to the concrete LanguageRegistry. - SubtitleKnowledgeBase types its `language_registry` parameter against the Protocol; the concrete LanguageRegistry stays in infrastructure as the YAML-backed adapter and remains the default when no repository is injected. - New unit tests in tests/infrastructure/test_language_registry.py cover the adapter surface (from_iso, from_any, membership, case-insensitivity, non-string inputs). Behaviour is unchanged for existing callers. The split opens the door to in-memory fakes in future tests without loading the full ISO 639 YAML.	2026-05-20 23:18:25 +02:00
francwa	19fe8a519a	Merge branch 'feat/release-inspect-orchestrator' Inspection pipeline groundwork: - MediaProber.probe() port extension (full media inspection on the port) - inspect_release orchestrator + InspectedResult frozen VO - enrich_from_probe now refreshes tech_string - resolve_*_destination use cases consume inspect_release - detect_media_type & enrich_from_probe moved to application/release	2026-05-20 09:31:22 +02:00
francwa	a0d1846ff2	refactor(release): move detect_media_type & enrich_from_probe to application/release Both helpers are inspection-pipeline pieces, not filesystem use cases — they belong next to inspect_release, not next to move_media / resolve_destination / list_folder. The move also kills the lazy import that was hiding inside _resolve_parsed: alfred.application.filesystem.resolve_destination no longer triggers a cycle through alfred.application.filesystem __init__ when loading inspect_release. Top-level import restored. Call sites updated: inspect.py, test_detect_media_type.py, test_enrich_from_probe.py, testing/recognize_folders_in_downloads.py. Module docstrings + test-file docstrings updated to match the new location.	2026-05-20 09:29:58 +02:00
francwa	0fb59a4581	feat(filesystem): wire inspect_release into resolve_destination The four resolve__destination use cases now route through a private _resolve_parsed helper that picks the right entry point: - source path provided AND it exists -> inspect_release(name, path) runs the full pipeline (parse + media-type refinement + probe + enrich), so missing tech tokens (quality, codec, ...) get filled by ffprobe and the refreshed tech_string lands in the destination folder / file names. - source path missing or absent -> parse_release(name) only, same behavior as before. Back-compat: tests using fake /dl/.mkv paths still pass unchanged. resolve_episode_destination / resolve_movie_destination reuse their existing source_file parameter as the inspection target. The two folder-move use cases (season / series) gain a new OPTIONAL source_path parameter — threaded through the agent tool wrappers and documented in the YAML specs. The lazy import inside _resolve_parsed avoids a circular import: inspect_release imports detect_media_type / enrich_from_probe from the same application.filesystem package whose __init__ re-exports resolve_destination. Three new tests in TestProbeEnrichmentWiring with a stub MediaProber prove the wiring: movie picks up probe quality, season picks it up via source_path, and a missing path correctly skips probe (back-compat guard).	2026-05-20 09:26:30 +02:00
francwa	e79ca462b8	fix(release): refresh tech_string after enrich_from_probe enrich_from_probe fills None fields on ParsedRelease (quality, source, codec, audio_*, languages) but left tech_string at its parser-time value — so the filename builders (movie_folder_name, episode_filename, …) saw stale tech tokens even after a successful probe. Re-derive tech_string the same way the parser does — quality.source.codec joined by dots, skipping None — at the end of enrich_from_probe. Token- level values still win because enrich only fills None fields. Four new tests in TestTechString cover: enrichment rebuilds it, existing source survives, no-info input leaves it untouched, fully empty parsed produces ''.	2026-05-20 09:26:09 +02:00
francwa	03aa844d7d	feat(release): inspect_release orchestrator + InspectedResult VO New application-layer entry point that composes the four inspection layers in one call: 1. parse_release(name, kb) -> (ParsedRelease, ParseReport) 2. detect_media_type(parsed, path, kb) -> patch parsed.media_type 3. find_main_video(path, kb) -> Path \| None (top-level scan) 4. prober.probe(video) + enrich -> when video exists and media_type not in {unknown, other} Returns a frozen InspectedResult(parsed, report, source_path, main_video, media_info, probe_used). kb and prober are injected — no module-level singletons in inspect.py. analyze_release tool now delegates to inspect_release; its output gains two fields, confidence (0-100) and road (easy/shitty/path_of_pain), surfaced from ParseReport so the LLM can route by confidence. Spec updated to document them. 12 new tests covering happy paths, probe gating (no video, media_type 'other', probe failure), mutation contract (detect refining parsed.media_type, enrich filling None fields), resilience (nonexistent path), and frozen contract. Suite: 1058 passing.	2026-05-20 09:15:29 +02:00
francwa	c303efea48	refactor(probe): consolidate full probe() into MediaProber port Add probe(video) -> MediaInfo \| None to the MediaProber Protocol and implement it on FfprobeMediaProber. The standalone alfred/infrastructure/filesystem/ffprobe.py module is removed; all callers (analyze_release / probe_media tools, testing scripts) now go through the adapter. Tests for the probe path moved to tests/infrastructure/test_ffprobe_prober.py (patching subprocess.run at the adapter module level). Unblocks the upcoming inspect_release orchestrator, which needs the port — not a free function — to compose parse + main-video selection + probe in one shot.	2026-05-20 09:11:24 +02:00
francwa	5db350a1df	Merge branch 'feat/release-parser-scoring'	2026-05-20 08:47:38 +02:00
francwa	12dc796ea2	docs(changelog): freeze confidence scoring + exclusion work block	2026-05-20 08:47:29 +02:00
francwa	9ddd85929e	feat(release): pre-pipeline exclusion helpers Add the application-layer helpers that decide which files are worth parsing, sitting one notch above parse_release. - is_supported_video(path, kb): extension-only check against kb.video_extensions. Lowercased suffix lookup. Directories and broken symlinks return False. - find_main_video(folder, kb): top-level scan only (no recursion into subdirectories — releases that wrap their video in Sample/ are PATH_OF_PAIN territory). Lexicographically-first eligible file wins when several qualify (deterministic, no size-based ranking). A bare file as folder argument is supported for single-file releases. No size threshold and no filename heuristics ('sample' / 'trailer'): the parser's job is to extract structure, not to second-guess non-standard release shapes. PoP catches the rest. 17 tests under tests/application/test_supported_media.py.	2026-05-20 01:34:32 +02:00
francwa	ed7680b58f	docs(changelog): log parse-confidence scoring + ParseReport tuple	2026-05-20 01:21:47 +02:00
francwa	b4c9efd13b	feat(release): parse_release returns (ParsedRelease, ParseReport) Wire the scoring foundations into the parser entry point. parse_release now returns a tuple — the structural ParsedRelease and a diagnostic ParseReport carrying confidence (0-100), road (EASY / SHITTY / PATH_OF_PAIN), the residual UNKNOWN tokens, and the list of critical fields that couldn't be filled. EASY is decided structurally (a group schema matched), independently of the score. SHITTY vs PATH_OF_PAIN is decided by score against the 60 cutoff from scoring.yaml. Malformed names (forbidden chars) emit a zero-confidence PoP report and short-circuit to parse_path=AI as before. ParsePath stays as-is (DIRECT / SANITIZED / AI) — it records how we tokenized, not how confident we are. The two dimensions are now properly separated. Call sites propagated: - alfred/application/filesystem/resolve_destination.py (4 occurrences) - alfred/agent/tools/filesystem.py - tests/domain/test_release.py - tests/domain/test_release_fixtures.py - tests/application/test_detect_media_type.py New tests/domain/release/test_parser_v2_scoring.py (22 cases) locks ParseReport validation, compute_score arithmetic, decide_road thresholding, the collector helpers, and the end-to-end tuple contract.	2026-05-20 01:21:30 +02:00
francwa	98c688f29b	feat(release): foundations for parse-confidence scoring Add the building blocks for Phase A scoring without yet wiring them into parse_release. Nothing changes at runtime — parse_release still returns a single ParsedRelease — but the pieces needed to upgrade it in a follow-up commit are now in place. - alfred/knowledge/release/scoring.yaml: weights / penalties / thresholds. Title and media_type are heavy (30 / 20), structural fields medium (year 15, season 10), tech fields light (5 each). Unknown-token penalty 5 capped at -30. SHITTY/PoP cutoff at 60. - load_scoring() loader with safe defaults baked in: a missing or partial YAML only de-tunes, never breaks. - ReleaseKnowledge port grows a 'scoring: dict' field. YamlReleaseKnowledge populates it from load_scoring(). - New parser/scoring.py module with Road enum (EASY / SHITTY / PATH_OF_PAIN, distinct from ParsePath which records the tokenization route), and pure functions: compute_score, decide_road, collect_unknown_tokens, collect_missing_critical. - ParseReport frozen VO in value_objects.py — exported alongside ParsedRelease.	2026-05-20 01:21:17 +02:00
francwa	fcd80763e2	Merge branch 'refactor/release-parser-v2'	2026-05-20 01:08:20 +02:00
francwa	629387591f	docs(changelog): freeze release parser v2 work block (2026-05-20)	2026-05-20 01:08:17 +02:00
francwa	230a7ab88a	docs(changelog): log SHITTY simplification + distributor split	2026-05-20 01:03:52 +02:00
francwa	3737f66851	refactor(release): simplify SHITTY to dict-driven token tagging Replace the ~480-line legacy heuristic block in services.py with a small dict-driven pass in pipeline._annotate_shitty: each token is looked up against the kb buckets (resolutions / sources / codecs / distributors / year / sxxexx) with first-match-wins semantics, the leftmost contiguous UNKNOWN run becomes the title, done. SHITTY's scope is intentionally narrow — releases that look like scene names but don't have a registered group schema. Anything more exotic (parenthesized tech, bare-dashed title fragments, YT slugs, franchise boxes) is PATH OF PAIN territory and stays out of here. - annotate() no longer returns None; SHITTY is the always-on fallback - services.py shrunk from ~525 to ~85 lines (legacy extractors gone) - 4 fixtures get xfail markers documenting PoP-grade pathologies (deutschland franchise box, sleaford YT slug, super_mario bilingual, predator space-separators — the last one moved from shitty/ → pop/) - ReleaseFixture grows xfail_reason; the parametrized suite wires the pytest.mark.xfail(strict=False) automatically	2026-05-20 01:03:25 +02:00
francwa	fd3bd1ad8c	feat(release): distinguish streaming distributors from sources Introduce a separate dimension for streaming-platform tags (NF, AMZN, DSNP, HMAX, ATVP, …) so they stop polluting the encoding-source field. WEB-DL is the source; the platform that released it is the distributor. - new distributors.yaml knowledge file - ReleaseKnowledge port exposes distributors set - TokenRole.DISTRIBUTOR + ParsedRelease.distributor field - removed NF/AMZN/DSNP/HMAX/ATVP from sources.yaml - notre_planete fixture now records distributor: NF	2026-05-20 01:03:11 +02:00
francwa	7dc7f0c241	feat(release): v2 enricher pass for audio/video-meta/edition/language The EASY pipeline now extracts the full ParsedRelease surface from known-group releases, not just the structural backbone. Behavior is unchanged for releases that don't carry these tokens. Pipeline (parser/pipeline.py): - Structural walk (renamed _annotate_structural): no longer requires body to be fully consumed. Tokens passed over between schema chunks remain UNKNOWN so the enricher pass can claim them. - _find_chunk(): scans forward in the body for the next token matching a given role, skipping already-annotated tokens. Lets optional and mandatory chunks both tolerate intercalated enricher tokens. - _annotate_enrichers(): new non-positional pass. Walks UNKNOWN tokens and tags AUDIO_CODEC / AUDIO_CHANNELS / BIT_DEPTH / HDR / EDITION / LANGUAGE. Multi-token sequences from kb.audio / kb.video_meta / kb.editions are matched first (longest-first ordering preserved from the YAML), single tokens after. - _apply_sequences(): mutates the token list, tagging the first token of a matched sequence with extra['sequence']=<canonical value> and trailing members with extra['sequence_member']='True' so assemble skips them. - _detect_channel_pairs(): handles the '5.1' / '7.1' case where the '.' separator splits the layout into two tokens. Strips a trailing '-GROUP' suffix on the second before joining. Assemble: - New fields populated: languages (list), audio_codec, audio_channels, bit_depth, hdr_format, edition. Each role-handler skips sequence_member tokens. - media_type heuristic extended: edition in {COMPLETE, INTEGRALE, COLLECTION} + no season → tv_complete (mirrors legacy). Tests: - 4 new TestEnrichers cases covering bit_depth+audio_codec+channels, HDR sequence + edition sequence + TrueHD.Atmos + 7.1, multi-language with DTS-HD.MA sequence, TV episode with single language. - All 14 v2 tests + 30 fixture tests still green. Suite: 1011 passed, 8 skipped. Refs: project_release_parser_v2_specs (memory)	2026-05-20 00:26:05 +02:00
francwa	075a827b0e	feat(release): wire v2 EASY path for known release groups The annotate-based v2 pipeline now handles releases ending in -KONTRAST, -ELiTE, or -RARBG. Unknown groups still fall through to the legacy SHITTY heuristic in services.py — nothing changes for them. Pipeline (alfred/domain/release/parser/pipeline.py): - tokenize(): string-ops separator split, strips [site.tag] first. - annotate(): right-to-left group detection (priority to codec-GROUP shape, fallback to any non-source dashed token), GroupSchema lookup via the kb port, then lockstep walk of tokens against schema chunks. Optional chunks skip on mismatch, mandatory mismatches return None so the caller falls back gracefully. CODEC pre-consumed by a codec-GROUP trailing token correctly skips the CODEC chunk in the body walk. - assemble(): folds annotated tokens into a ParsedRelease-compatible dict (title joined by '.', group from the codec-GROUP token's extras). Schema (alfred/domain/release/parser/schema.py): - GroupSchema + SchemaChunk frozen value objects. - TokenRole.GROUP added. Port + adapter: - ReleaseKnowledge.group_schema(name) lookup added (case-insensitive). - YamlReleaseKnowledge loads alfred/knowledge/release/release_groups/ *.yaml at construction time; learned overrides in data/knowledge/release/release_groups/ also picked up. Knowledge: - release_groups/kontrast.yaml, elite.yaml, rarbg.yaml declare the canonical chunk_order. ELiTE marks source as optional (Foundation.S02 has no WEBRip token). Services: - parse_release tries the v2 path first; on None falls through to the legacy implementation untouched. Tests: - tests/domain/release/test_parser_v2_easy.py (10 cases) cover group detection (codec-GROUP, dashed-source skip, no-dash → unknown), schema-driven annotation (movie, TV episode, season pack with optional source, unknown group returns None), and field assembly. - Existing tests/domain/test_release_fixtures.py (30 cases) stay green: 5 EASY fixtures now produced by v2, 25 SHITTY/PATH OF PAIN fixtures still produced by the legacy path. Verified via spy on v2.assemble. Suite: 1007 passed, 8 skipped. Refs: project_release_parser_v2_specs (memory)	2026-05-20 00:21:11 +02:00
francwa	a2c917618f	feat(release): scaffold v2 parser package (annotate-based pipeline) New package alfred/domain/release/parser/ lays the foundation for the release parser refactor (specs in memory). Exposes: - Token: frozen VO carrying text + stream index + TokenRole + extra dict. with_role() returns a new instance (no mutation). - TokenRole: str-backed enum split into structural (TITLE/YEAR/SEASON_EP/ GROUP), technical (RESOLUTION/SOURCE/CODEC/AUDIO_*/BIT_DEPTH/HDR/ EDITION/LANGUAGE), and meta (SITE_TAG/UNKNOWN) families. - pipeline.strip_site_tag(): pulls a [site.tag] prefix or suffix. - pipeline.tokenize(): release name -> list[Token] (all UNKNOWN), string-ops split on kb.separators (no regex, per CLAUDE.md). - pipeline.annotate(): documented stub. Walk order recorded in docstring (group right-to-left, then season/episode, year, tech, title). Legacy parse_release in release.services remains the live implementation until the annotate step lands. Scaffolding tests verify Token API, site-tag stripping (prefix/suffix), and tokenize output shape. Refs: project_release_parser_v2_specs (memory)	2026-05-20 00:12:33 +02:00
francwa	9f10f4e0ad	Merge branch 'refactor/domain-release-knowledge' Final DDD purification of the release parser. Domain layer no longer imports anything from infrastructure, no YAML at import time, and ParsedRelease's filesystem-builders are pure (Option B). - ReleaseKnowledge Protocol port + YamlReleaseKnowledge adapter - parse_release(name, kb) explicit injection - ParsedRelease.title_sanitized field; builders accept already-safe strings - Callers (resolve_destination, detect_media_type, find_video, analyze_release) thread the kb through - 987 tests pass	2026-05-19 22:05:36 +02:00
francwa	cd814c7922	docs(changelog): log refactor/domain-release-knowledge work block	2026-05-19 22:05:29 +02:00
francwa	6802933acd	test(release): adapt suite to explicit ReleaseKnowledge injection - test_release.py / test_release_fixtures.py: module-level _KB = YamlReleaseKnowledge() + thin _parse(name) helper threading it into parse_release. test_show_folder_name_strips_windows_chars renamed to test_show_folder_name_uses_already_safe_title to reflect the Option B contract (caller sanitizes via kb.sanitize_for_fs). - test_detect_media_type.py: same _KB pattern, all detect_media_type(parsed, path) calls now pass kb. - test_filesystem_extras.py: find_video_file(path) calls now pass kb. - test_enrich_from_probe.py: _bare() helper adds the new title_sanitized field. - test_resolve_destination.py: drop _sanitize import + TestSanitize class (helper deleted), add tmdb_title_safe arg to _resolve_series_folder calls. 987 passed, 8 skipped.	2026-05-19 22:05:26 +02:00
francwa	bf37a9d09e	refactor(release): thread ReleaseKnowledge through callers Wires the new explicit-kb signatures into every caller of the release parser and the filesystem-extension helpers. - application/filesystem/resolve_destination.py: module-level singleton _KB: ReleaseKnowledge = YamlReleaseKnowledge(); each use case now calls parse_release(release_name, _KB) and sanitizes TMDB strings via _KB.sanitize_for_fs(...) before passing them to the pure ParsedRelease builders. Local _sanitize helper + _WIN_FORBIDDEN regex dropped. - application/filesystem/detect_media_type.py: signature is now detect_media_type(parsed, source_path, kb); uses kb.metadata_extensions, kb.video_extensions, kb.non_video_extensions. - infrastructure/filesystem/find_video.py: find_video_file(path, kb) uses kb.video_extensions instead of an imported constant. - agent/tools/filesystem.py::analyze_release imports the application _KB singleton and passes it through to parse_release / detect_media_type / find_video_file.	2026-05-19 22:05:19 +02:00
francwa	4a74fff9cc	refactor(release): purify domain — parse_release(name, kb) + ParsedRelease Option B Removes the last domain → infrastructure leak in the release parser. services.py: - parse_release(name, kb) takes the knowledge as an explicit parameter. - Every helper (_tokenize, _is_well_formed, _extract_tech, _extract_languages, _extract_audio, _extract_video_meta, _extract_edition, _extract_title, _infer_media_type) takes kb. - No more module-level YAML loading. value_objects.py — Option B: - Sanitization happens once at parse time; ParsedRelease now carries a title_sanitized: str field alongside title. - Builder methods (show_folder_name, episode_filename, movie_folder_name, movie_filename) become pure: they accept already-sanitized tmdb_title_safe / tmdb_episode_title_safe arguments. Callers at the use-case boundary sanitize via kb.sanitize_for_fs(...) before passing in. - All domain-knowledge constants removed (_RESOLUTIONS, _SOURCES, _CODECS, _AUDIO, _VIDEO_META, _EDITIONS, _HDR_EXTRA, _MEDIA_TYPE_TOKENS, _LANGUAGE_TOKENS, _FORBIDDEN_CHARS, _*_EXTENSIONS, _WIN_FORBIDDEN_TABLE, _sanitize_for_fs). The module is now pure DDD.	2026-05-19 22:05:10 +02:00

1 2 3 4

196 Commits