alfred

Author	SHA1	Message	Date
francwa	98c688f29b	feat(release): foundations for parse-confidence scoring Add the building blocks for Phase A scoring without yet wiring them into parse_release. Nothing changes at runtime — parse_release still returns a single ParsedRelease — but the pieces needed to upgrade it in a follow-up commit are now in place. - alfred/knowledge/release/scoring.yaml: weights / penalties / thresholds. Title and media_type are heavy (30 / 20), structural fields medium (year 15, season 10), tech fields light (5 each). Unknown-token penalty 5 capped at -30. SHITTY/PoP cutoff at 60. - load_scoring() loader with safe defaults baked in: a missing or partial YAML only de-tunes, never breaks. - ReleaseKnowledge port grows a 'scoring: dict' field. YamlReleaseKnowledge populates it from load_scoring(). - New parser/scoring.py module with Road enum (EASY / SHITTY / PATH_OF_PAIN, distinct from ParsePath which records the tokenization route), and pure functions: compute_score, decide_road, collect_unknown_tokens, collect_missing_critical. - ParseReport frozen VO in value_objects.py — exported alongside ParsedRelease.	2026-05-20 01:21:17 +02:00
francwa	fcd80763e2	Merge branch 'refactor/release-parser-v2'	2026-05-20 01:08:20 +02:00
francwa	629387591f	docs(changelog): freeze release parser v2 work block (2026-05-20)	2026-05-20 01:08:17 +02:00
francwa	230a7ab88a	docs(changelog): log SHITTY simplification + distributor split	2026-05-20 01:03:52 +02:00
francwa	3737f66851	refactor(release): simplify SHITTY to dict-driven token tagging Replace the ~480-line legacy heuristic block in services.py with a small dict-driven pass in pipeline._annotate_shitty: each token is looked up against the kb buckets (resolutions / sources / codecs / distributors / year / sxxexx) with first-match-wins semantics, the leftmost contiguous UNKNOWN run becomes the title, done. SHITTY's scope is intentionally narrow — releases that look like scene names but don't have a registered group schema. Anything more exotic (parenthesized tech, bare-dashed title fragments, YT slugs, franchise boxes) is PATH OF PAIN territory and stays out of here. - annotate() no longer returns None; SHITTY is the always-on fallback - services.py shrunk from ~525 to ~85 lines (legacy extractors gone) - 4 fixtures get xfail markers documenting PoP-grade pathologies (deutschland franchise box, sleaford YT slug, super_mario bilingual, predator space-separators — the last one moved from shitty/ → pop/) - ReleaseFixture grows xfail_reason; the parametrized suite wires the pytest.mark.xfail(strict=False) automatically	2026-05-20 01:03:25 +02:00
francwa	fd3bd1ad8c	feat(release): distinguish streaming distributors from sources Introduce a separate dimension for streaming-platform tags (NF, AMZN, DSNP, HMAX, ATVP, …) so they stop polluting the encoding-source field. WEB-DL is the source; the platform that released it is the distributor. - new distributors.yaml knowledge file - ReleaseKnowledge port exposes distributors set - TokenRole.DISTRIBUTOR + ParsedRelease.distributor field - removed NF/AMZN/DSNP/HMAX/ATVP from sources.yaml - notre_planete fixture now records distributor: NF	2026-05-20 01:03:11 +02:00
francwa	7dc7f0c241	feat(release): v2 enricher pass for audio/video-meta/edition/language The EASY pipeline now extracts the full ParsedRelease surface from known-group releases, not just the structural backbone. Behavior is unchanged for releases that don't carry these tokens. Pipeline (parser/pipeline.py): - Structural walk (renamed _annotate_structural): no longer requires body to be fully consumed. Tokens passed over between schema chunks remain UNKNOWN so the enricher pass can claim them. - _find_chunk(): scans forward in the body for the next token matching a given role, skipping already-annotated tokens. Lets optional and mandatory chunks both tolerate intercalated enricher tokens. - _annotate_enrichers(): new non-positional pass. Walks UNKNOWN tokens and tags AUDIO_CODEC / AUDIO_CHANNELS / BIT_DEPTH / HDR / EDITION / LANGUAGE. Multi-token sequences from kb.audio / kb.video_meta / kb.editions are matched first (longest-first ordering preserved from the YAML), single tokens after. - _apply_sequences(): mutates the token list, tagging the first token of a matched sequence with extra['sequence']=<canonical value> and trailing members with extra['sequence_member']='True' so assemble skips them. - _detect_channel_pairs(): handles the '5.1' / '7.1' case where the '.' separator splits the layout into two tokens. Strips a trailing '-GROUP' suffix on the second before joining. Assemble: - New fields populated: languages (list), audio_codec, audio_channels, bit_depth, hdr_format, edition. Each role-handler skips sequence_member tokens. - media_type heuristic extended: edition in {COMPLETE, INTEGRALE, COLLECTION} + no season → tv_complete (mirrors legacy). Tests: - 4 new TestEnrichers cases covering bit_depth+audio_codec+channels, HDR sequence + edition sequence + TrueHD.Atmos + 7.1, multi-language with DTS-HD.MA sequence, TV episode with single language. - All 14 v2 tests + 30 fixture tests still green. Suite: 1011 passed, 8 skipped. Refs: project_release_parser_v2_specs (memory)	2026-05-20 00:26:05 +02:00
francwa	075a827b0e	feat(release): wire v2 EASY path for known release groups The annotate-based v2 pipeline now handles releases ending in -KONTRAST, -ELiTE, or -RARBG. Unknown groups still fall through to the legacy SHITTY heuristic in services.py — nothing changes for them. Pipeline (alfred/domain/release/parser/pipeline.py): - tokenize(): string-ops separator split, strips [site.tag] first. - annotate(): right-to-left group detection (priority to codec-GROUP shape, fallback to any non-source dashed token), GroupSchema lookup via the kb port, then lockstep walk of tokens against schema chunks. Optional chunks skip on mismatch, mandatory mismatches return None so the caller falls back gracefully. CODEC pre-consumed by a codec-GROUP trailing token correctly skips the CODEC chunk in the body walk. - assemble(): folds annotated tokens into a ParsedRelease-compatible dict (title joined by '.', group from the codec-GROUP token's extras). Schema (alfred/domain/release/parser/schema.py): - GroupSchema + SchemaChunk frozen value objects. - TokenRole.GROUP added. Port + adapter: - ReleaseKnowledge.group_schema(name) lookup added (case-insensitive). - YamlReleaseKnowledge loads alfred/knowledge/release/release_groups/ *.yaml at construction time; learned overrides in data/knowledge/release/release_groups/ also picked up. Knowledge: - release_groups/kontrast.yaml, elite.yaml, rarbg.yaml declare the canonical chunk_order. ELiTE marks source as optional (Foundation.S02 has no WEBRip token). Services: - parse_release tries the v2 path first; on None falls through to the legacy implementation untouched. Tests: - tests/domain/release/test_parser_v2_easy.py (10 cases) cover group detection (codec-GROUP, dashed-source skip, no-dash → unknown), schema-driven annotation (movie, TV episode, season pack with optional source, unknown group returns None), and field assembly. - Existing tests/domain/test_release_fixtures.py (30 cases) stay green: 5 EASY fixtures now produced by v2, 25 SHITTY/PATH OF PAIN fixtures still produced by the legacy path. Verified via spy on v2.assemble. Suite: 1007 passed, 8 skipped. Refs: project_release_parser_v2_specs (memory)	2026-05-20 00:21:11 +02:00
francwa	a2c917618f	feat(release): scaffold v2 parser package (annotate-based pipeline) New package alfred/domain/release/parser/ lays the foundation for the release parser refactor (specs in memory). Exposes: - Token: frozen VO carrying text + stream index + TokenRole + extra dict. with_role() returns a new instance (no mutation). - TokenRole: str-backed enum split into structural (TITLE/YEAR/SEASON_EP/ GROUP), technical (RESOLUTION/SOURCE/CODEC/AUDIO_*/BIT_DEPTH/HDR/ EDITION/LANGUAGE), and meta (SITE_TAG/UNKNOWN) families. - pipeline.strip_site_tag(): pulls a [site.tag] prefix or suffix. - pipeline.tokenize(): release name -> list[Token] (all UNKNOWN), string-ops split on kb.separators (no regex, per CLAUDE.md). - pipeline.annotate(): documented stub. Walk order recorded in docstring (group right-to-left, then season/episode, year, tech, title). Legacy parse_release in release.services remains the live implementation until the annotate step lands. Scaffolding tests verify Token API, site-tag stripping (prefix/suffix), and tokenize output shape. Refs: project_release_parser_v2_specs (memory)	2026-05-20 00:12:33 +02:00
francwa	9f10f4e0ad	Merge branch 'refactor/domain-release-knowledge' Final DDD purification of the release parser. Domain layer no longer imports anything from infrastructure, no YAML at import time, and ParsedRelease's filesystem-builders are pure (Option B). - ReleaseKnowledge Protocol port + YamlReleaseKnowledge adapter - parse_release(name, kb) explicit injection - ParsedRelease.title_sanitized field; builders accept already-safe strings - Callers (resolve_destination, detect_media_type, find_video, analyze_release) thread the kb through - 987 tests pass	2026-05-19 22:05:36 +02:00
francwa	cd814c7922	docs(changelog): log refactor/domain-release-knowledge work block	2026-05-19 22:05:29 +02:00
francwa	6802933acd	test(release): adapt suite to explicit ReleaseKnowledge injection - test_release.py / test_release_fixtures.py: module-level _KB = YamlReleaseKnowledge() + thin _parse(name) helper threading it into parse_release. test_show_folder_name_strips_windows_chars renamed to test_show_folder_name_uses_already_safe_title to reflect the Option B contract (caller sanitizes via kb.sanitize_for_fs). - test_detect_media_type.py: same _KB pattern, all detect_media_type(parsed, path) calls now pass kb. - test_filesystem_extras.py: find_video_file(path) calls now pass kb. - test_enrich_from_probe.py: _bare() helper adds the new title_sanitized field. - test_resolve_destination.py: drop _sanitize import + TestSanitize class (helper deleted), add tmdb_title_safe arg to _resolve_series_folder calls. 987 passed, 8 skipped.	2026-05-19 22:05:26 +02:00
francwa	bf37a9d09e	refactor(release): thread ReleaseKnowledge through callers Wires the new explicit-kb signatures into every caller of the release parser and the filesystem-extension helpers. - application/filesystem/resolve_destination.py: module-level singleton _KB: ReleaseKnowledge = YamlReleaseKnowledge(); each use case now calls parse_release(release_name, _KB) and sanitizes TMDB strings via _KB.sanitize_for_fs(...) before passing them to the pure ParsedRelease builders. Local _sanitize helper + _WIN_FORBIDDEN regex dropped. - application/filesystem/detect_media_type.py: signature is now detect_media_type(parsed, source_path, kb); uses kb.metadata_extensions, kb.video_extensions, kb.non_video_extensions. - infrastructure/filesystem/find_video.py: find_video_file(path, kb) uses kb.video_extensions instead of an imported constant. - agent/tools/filesystem.py::analyze_release imports the application _KB singleton and passes it through to parse_release / detect_media_type / find_video_file.	2026-05-19 22:05:19 +02:00
francwa	4a74fff9cc	refactor(release): purify domain — parse_release(name, kb) + ParsedRelease Option B Removes the last domain → infrastructure leak in the release parser. services.py: - parse_release(name, kb) takes the knowledge as an explicit parameter. - Every helper (_tokenize, _is_well_formed, _extract_tech, _extract_languages, _extract_audio, _extract_video_meta, _extract_edition, _extract_title, _infer_media_type) takes kb. - No more module-level YAML loading. value_objects.py — Option B: - Sanitization happens once at parse time; ParsedRelease now carries a title_sanitized: str field alongside title. - Builder methods (show_folder_name, episode_filename, movie_folder_name, movie_filename) become pure: they accept already-sanitized tmdb_title_safe / tmdb_episode_title_safe arguments. Callers at the use-case boundary sanitize via kb.sanitize_for_fs(...) before passing in. - All domain-knowledge constants removed (_RESOLUTIONS, _SOURCES, _CODECS, _AUDIO, _VIDEO_META, _EDITIONS, _HDR_EXTRA, _MEDIA_TYPE_TOKENS, _LANGUAGE_TOKENS, _FORBIDDEN_CHARS, _*_EXTENSIONS, _WIN_FORBIDDEN_TABLE, _sanitize_for_fs). The module is now pure DDD.	2026-05-19 22:05:10 +02:00
francwa	c3a3cb50c9	refactor(release): introduce ReleaseKnowledge Protocol port + YamlReleaseKnowledge adapter Adds the port/adapter pair that lets the release domain consume parsing knowledge without importing infrastructure or loading YAML at import time. - alfred/domain/release/ports/knowledge.py declares the read-only query surface: token sets (resolutions, sources, codecs, language_tokens, forbidden_chars, hdr_extra), structured dicts (audio, video_meta, editions, media_type_tokens), separators list, file-extension sets, and sanitize_for_fs(text). - alfred/infrastructure/knowledge/release_kb.py loads every YAML once at construction and exposes them as attributes, with an immutable str.maketrans table backing sanitize_for_fs. No domain code is wired to the port yet — that lands in the next commit.	2026-05-19 22:05:01 +02:00
francwa	14941d47c0	Merge branch 'refactor/domain-io-extraction' Extract all I/O (subprocess, filesystem, YAML loading) from the domain layer via ports/adapters. domain/subtitles/ now has zero imports from infrastructure/. The remaining domain → infra leak (release knowledge loaded at import time) is documented in tech-debt for a dedicated branch.	2026-05-19 15:16:59 +02:00
francwa	df798f55cc	refactor(subtitles): introduce SubtitleKnowledge Protocol port Domain services (SubtitleIdentifier, PatternDetector) used to import the concrete SubtitleKnowledgeBase class directly from infrastructure for their type hint. With this commit they depend on a structural Protocol in alfred/domain/subtitles/ports/knowledge.py declaring just the 7 read-only query methods the domain actually consumes. The concrete YAML-backed SubtitleKnowledgeBase in infrastructure remains the sole adapter — no rename, no shim. With this change alfred/domain/subtitles/ has zero imports from alfred/infrastructure/. Also extend the changelog entry covering the full domain-io-extraction branch.	2026-05-19 15:15:43 +02:00
francwa	535935cc73	docs(changelog): summarize refactor/domain-io-extraction work block	2026-05-19 15:11:17 +02:00
francwa	6e252d1e81	refactor(subtitles): inject default rules into SubtitleRuleSet.resolve() aggregates.py used to call SubtitleKnowledgeBase().default_rules() via a DEFAULT_RULES() helper, which silently pulled the infrastructure layer (YAML loader) into the domain on every resolve. Make the dependency explicit: resolve() now takes the default rules as a parameter, and the caller (the ManageSubtitles use case) loads them from the KB once and passes them in. Domain stays I/O-free. - Drop DEFAULT_RULES helper and the SubtitleKnowledgeBase import from alfred/domain/subtitles/aggregates.py - SubtitleRuleSet.resolve(default_rules: SubtitleMatchingRules) - manage_subtitles use case passes kb.default_rules() at the call site - Tests use a local SubtitleMatchingRules stand-in instead of relying on KB defaults	2026-05-19 15:10:06 +02:00
francwa	903e9e7117	refactor(subtitles): move SubtitlePlacer to application layer The placer performs filesystem I/O (os.link) — it belongs in the application layer, not the domain. Domain services should be pure. - Move alfred/domain/subtitles/services/placer.py to alfred/application/subtitles/placer.py - Move tests/domain/test_subtitle_placer.py to tests/application/test_subtitle_placer.py - Update all callers (manage_subtitles use case, metadata store, tests) - Drop placer re-exports from domain.subtitles.services.__init__	2026-05-19 15:07:39 +02:00
francwa	9556bf9e08	refactor(domain): strip live filesystem I/O from VOs and entities DDD-pure cleanup — entities and value objects no longer query the world at read time. FilePath: drop .exists() / .is_file() / .is_dir(). The VO is now a pure address; ask the injected FilesystemScanner for live state. Movie: drop .has_file() / .is_downloaded(). Invariant: when the application sets file_path, it has already constated the file exists; downstream readers trust the snapshot. Episode: same — drop .has_file() / .is_downloaded(). SubtitlePlacer: drop the pre-check .exists() calls. The placer now attempts os.link() and reports FileNotFoundError / FileExistsError as skip reasons. Removes a TOCTOU race as a bonus. Tests adjusted: the FilePath VO method tests are gone (the methods are gone), test_has_file_false_when_no_path replaced by a plain assertion on file_path is None. Placer tests are unchanged — the skip-reason strings ('not found', 'already exists') match the new try/except paths. The 'snapshot value objects' pattern (ProbedMediaInfo, TmdbMovieInfo) that this cleanup enables is documented in refactor_domain_io.md, to be applied when a future use case actually needs richer metadata — not now, no speculative VOs.	2026-05-19 14:58:59 +02:00
francwa	e6ee700825	refactor(subtitles): inject MediaProber/FilesystemScanner ports into domain services Domain services no longer call subprocess or pathlib directly. Introduces two Protocol ports in domain/shared/ports/: MediaProber.list_subtitle_streams(video) -> list[SubtitleStreamInfo] FilesystemScanner.scan_dir / stat / read_text -> list[FileEntry] \| ... Concrete adapters live in infrastructure/: FfprobeMediaProber (wraps subprocess + ffprobe + JSON) PathlibFilesystemScanner (wraps pathlib + os reads) SubtitleIdentifier and PatternDetector now take (kb, prober, scanner) at construction time. Their internals work over FileEntry snapshots and SubtitleStreamInfo records — no more ad-hoc Path.is_file/iterdir/stat or embedded subprocess.run loops. _count_entries now takes raw SRT text (returned by scanner.read_text) so SRT-only entry counting stays out of the FS layer. manage_subtitles use case instantiates the two adapters once and injects them into both services. Tests pass real adapters and patch `alfred.infrastructure.probe.ffprobe_prober.subprocess.run` for the ffprobe-failure cases. _classify_single tests build FileEntry via a small helper. Domain is now free of subprocess / direct filesystem reads in the subtitle pipeline. The only remaining I/O hooks are FilePath VO convenience methods (exists/is_file/is_dir) which stay as a deliberate affordance on the value object.	2026-05-19 14:52:24 +02:00
francwa	ced72547f7	refactor(knowledge): extract YAML loaders from domain to infrastructure The domain layer no longer reads YAML files. All knowledge loaders move from `alfred/domain//knowledge/` to `alfred/infrastructure/knowledge/`: domain/release/knowledge.py → infrastructure/knowledge/release.py domain/shared/knowledge/language_registry.py → infrastructure/knowledge/language_registry.py domain/subtitles/knowledge/{loader,base}.py → infrastructure/knowledge/subtitles/{loader,base}.py Callers in domain/release/{services,value_objects}.py, domain/subtitles/{aggregates,services/}.py, and application/filesystem/manage_subtitles.py updated to absolute imports. Re-exports of KnowledgeLoader/SubtitleKnowledgeBase from domain/subtitles/__init__.py dropped (no shim per project convention). Tests follow the moved targets.	2026-05-19 14:35:18 +02:00
francwa	f338b08706	refactor(release): type media_type/parse_path as true enums ParsedRelease.media_type is now MediaTypeToken (not str) and parse_path is ParsePath (not str). __post_init__ keeps a tolerant constructor that coerces raw strings via the enum, so callers passing 'movie'/'direct' still work transparently. Since both enums inherit from str, existing string comparisons and JSON serialization remain unchanged.	2026-05-19 14:21:27 +02:00
francwa	da484d7474	refactor(release): typed enums + __post_init__ validation on ParsedRelease ParsedRelease accepted any string for media_type/parse_path and had no validation on numeric ranges (season=-5 was silently accepted). Tighten both ends: - New str-backed Enums MediaTypeToken and ParsePath. Inherit from str so every existing comparison ('== "movie"'), JSON serialization, and TMDB DTO interop keeps working unchanged. - ParsedRelease.__post_init__ now validates: raw/group non-empty, year in 1888-2100, season 0-100, episode 0-9999, episode_end >= episode, media_type/parse_path against the enum allowlist. - services.py uses the enum .value members everywhere instead of bare string literals — kills the typo risk.	2026-05-19 14:17:56 +02:00
francwa	481eeb5afd	refactor(domain): identity-based equality + dedup track helpers Two related DDD fixes for Movie and Episode entities: - Identity equality: @dataclass(eq=False) with custom __eq__/__hash__. Movie is identified by imdb_id, Episode by (season, episode) within the TVShow aggregate. Auto-generated field-by-field equality was incorrectly making two Movie instances with the same imdb_id but different audio_tracks compare unequal — breaks dedup/caching. - MediaWithTracks mixin: the 5 audio/subtitle helpers (has_audio_in / audio_languages / has_subtitles_in / has_forced_subs / subtitle_languages) were duplicated verbatim between Movie and Episode. Extracted to shared/media/tracks_mixin.py; both entities now inherit. Bonus: dropped the object.__setattr__ coercion dance in Movie.__post_init__ — the class isn't frozen so plain assignment is the right call.	2026-05-19 14:17:47 +02:00
francwa	7cd24f3a31	refactor(domain): freeze media track value objects AudioTrack, VideoTrack, SubtitleTrack and MediaInfo are snapshots of a single ffprobe run — model them as proper immutable value objects. - @dataclass(frozen=True) on all four - MediaInfo track collections become tuple[...] instead of list[...] - ffprobe adapter rewritten to build tuples up-front instead of appending/setattr'ing on a constructed instance	2026-05-19 14:17:27 +02:00
francwa	eb8995cfc3	refactor(subtitles): drop dead scanner module SubtitleScanner was an earlier iteration superseded by SubtitleIdentifier and never imported in production code (only by its own tests). Removing both keeps the bounded context clean and shrinks the surface.	2026-05-19 14:17:15 +02:00
francwa	f6eef59fca	refactor: tech debt mini-pass (items 5, 6, 7, 20) Low-risk cleanup items, no functional change to the parser. The philosophy remains: keep the parser simple, the AI handles edge cases. - Extract duplicated 'fs-safe title → dot-folder-name' regex into to_dot_folder_name() in domain/shared/value_objects.py. Used by both MovieTitle.normalized() and TVShow.get_folder_name() (item #5). - ParsedRelease.languages now uses field(default_factory=list) instead of a manual __post_init__ assigning [] via object.__setattr__ (#6). - tv_shows/entities.py module docstring: prepend ASCII ownership tree for quicker visual scan of the aggregate hierarchy (#7). - file_extensions.yaml: split subtitle sidecars (.srt/.sub/.idx/.ass/.ssa) into a dedicated 'subtitle:' category instead of lumping them under 'metadata:'. _METADATA_EXTENSIONS at the value_objects.py level remains the union of both — detect_media_type behavior unchanged. New loader load_subtitle_extensions() exposes the distinct subtitle set for future callers in the subtitles domain (#20). Suite: 1020 passed, 8 skipped.	2026-05-18 16:24:28 +02:00
francwa	273510dff8	test(fixtures): seed PATH OF PAIN bucket with 10 worst-case fixtures 10 pathological release names mined from the real downloads folder. Each fixture locks in the current parse_release output (including its silent losses and false positives) so future parser improvements are intentional, not silent drift. Cases: - Khruangbin yt-dlp slug (UTF-8 wide pipe '｜', YT ID as group) - Deutschland 83-86-89 franchise box (group=S03 misdetection) - Chérie Le BéBé (accented chars preserved, VFF language) - Jimmy Carr 8-word stand-up special title - [ OxTorrent.vc ] prefix + XviD codec (site_tag prefix) - Prodiges S12E01 with episode title + air-date silently lost - The Prodigy: apostrophe + Blu-ray dash + 1080i + multi-word audio = full AI-path degeneration (everything UNKNOWN) - Sleaford Mods yt-dlp slug (YT ID glued to year) - Super Mario Bros [FR-EN] (bilingual tag mistaken for group) - Gilmore Girls Complete S01-S07 (the well-behaved exception: COMPLETE token correctly drives tv_complete + REPACK + 10bit) Also adds shitty + path_of_pain to the per-bucket sanity assertion. Suite: 1020 passed, 8 skipped.	2026-05-18 15:57:56 +02:00
francwa	c1831e3f46	test(fixtures): drop derry_duplicate_naming (was a copy-paste artifact) The release name mixed two distinct releases — not a real-world case worth anti-regression. SHITTY bucket now holds 14 fixtures (down from 15).	2026-05-18 15:51:11 +02:00
francwa	aa182458b8	test(fixtures): seed SHITTY release bucket with 15 anti-regression cases Add 15 expected.yaml fixtures under tests/fixtures/releases/shitty/ covering the awkward but real-world release names from the downloads folder. Each fixture locks in the current parse_release behavior so future parser changes are intentional, not silent drift. Cases captured: - Angel INTEGRALE 3-level hierarchy (tv_complete media_type) - Buffy custom French title with dots preserved - Archer S14E09E10E11 multi-episode (E11 lost — tech debt) - Notre Planète lowercase s01e01 - Vinyl ' - 1x01 - FHD' (stray dash artifact — tech debt) - Deutschland.83 (year-suffix as part of title) - Tatortreiniger S01-06 range (falls to movie — tech debt) - Derry Girls duplicated title - Jurassic Park bare folder (media_type=unknown) - La Nuit au Musée bilingual MULTI - Chérie j'ai agrandi (ASCII-stripped apostrophe, parses fine) - Honey Don't (unescaped apostrophe — full AI-path degeneration) - Hook MULTi.SUBS movie with Subs/ folder - Predator Badlands space separators (group=UNKNOWN — tech debt) - Westworld S04 Subs.Only (no video file) Each fixture also captures the future 3-flow routing (library / torrents / seed_hardlinks) ahead of the organize_media refactor. Suite: 1011 passed, 8 skipped.	2026-05-18 15:48:41 +02:00
francwa	774f71c8cc	chore(gitignore): track CHANGELOG.md explicitly The blanket *.md ignore was hiding CHANGELOG.md, forcing 'git add -f' on every update. Allow-list it so the file lives under normal git tracking. CLAUDE.md stays local (user keeps it personal until a dedicated repo).	2026-05-18 15:39:04 +02:00
francwa	7bc50fd5b8	test: add real-world release fixtures (EASY bucket) Captures 5 canonical releases from /mnt/testipool/downloads as parametrized fixtures under tests/fixtures/releases/easy/. Each fixture declares the release name, expected ParsedRelease fields, original tree, and the future routing (library / torrents / seed_hardlinks) for the upcoming organize_media refactor. Today only the 'parsed' section is asserted; tree is materialized into a tmp_path to catch typos. Routing is captured ahead of the planner work — it becomes verifiable once organize_media lands. Cases: back_in_action (movie), slow_horses_single_ep (TV single), foundation_season_pack (S02 + .nfo noise), long_walk_with_noise (movie + KONTRAST.TOP.txt), sinners_yts (YTS bracket-heavy + Subs/ dir). Also tracks CHANGELOG.md under [Unreleased] / Added.	2026-05-18 15:36:19 +02:00
francwa	f17abdbaec	chore: cleanup — remove shims, fix ruff warnings, ignore noisy rules - Removed backward-compat shims _sanitise_for_fs / _strip_episode_from_normalised in domain/release/value_objects.py (zero callers). - Fixed ruff warnings across the codebase: * PLW1510: explicit check=False on subprocess.run calls * PLC0415: promoted lazy imports to module top where no cycle exists (manage_subtitles, placer, qbittorrent/client, file_manager) * E402: fixed module-level import ordering in language_registry.py and subtitles/knowledge/loader.py * F841 / B007: removed unused locals (identifier.py) * C416: replaced unnecessary set comprehension with set() in release/knowledge.py - Ruff config: ignore PLR0911/PLR0912 globally (noisy on mappers and orchestrator use-cases) and PLW0603 (intentional for the memory singleton). - Updated tech debt memory: P1 done, ShowStatus actually complete (was a stale note).	2026-05-18 00:02:45 +02:00
francwa	1d50b63af2	Merge branch 'dev/sprint-cleanup' Multi-week sprint: ISO 639-2/B language unification, release parser unification + data-driven tokenizer, removal of fossil services (movies/tv_shows/subtitles), subtitle services split into a package, MediaInfo split, test suite expansion (990 passing). See CHANGELOG.md [Unreleased] for the user-facing summary.	2026-05-17 23:42:05 +02:00
francwa	891ba502a2	chore: apply pre-commit auto-fixes (trim trailing whitespace, EOF)	2026-05-17 23:41:54 +02:00
francwa	e07c9ec77b	chore: sprint cleanup — language unification, parser unification, fossils removal Several weeks of work accumulated without being committed. Grouped here for clarity; see CHANGELOG.md [Unreleased] for the user-facing summary. Highlights ---------- P1 #2 — ISO 639-2/B canonical migration - New Language VO + LanguageRegistry (alfred/domain/shared/knowledge/). - iso_languages.yaml as single source of truth for language codes. - SubtitleKnowledgeBase now delegates lookup to LanguageRegistry; subtitles.yaml only declares subtitle-specific tokens (vostfr, vf, vff, …). - SubtitlePreferences default → ["fre", "eng"]; subtitle filenames written as {iso639_2b}.srt (legacy fr.srt still read via alias). - Scanner: dropped _LANG_KEYWORDS / _SDH_TOKENS / _FORCED_TOKENS / SUBTITLE_EXTENSIONS hardcoded dicts. - Fixed: 'hi' token no longer marks SDH (conflicted with Hindi alias). - Added settings.min_movie_size_bytes (was a module constant). P1 #3 — Release parser unification + data-driven tokenizer - parse_release() is now the single source of truth for release-name parsing. - alfred/knowledge/release/separators.yaml declares the token separators used by the tokenizer (., space, [, ], (, ), _). New conventions can be added without code changes. - Tokenizer now splits on any configured separator instead of name.split('.'). Releases like 'The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]' parse via the direct path without sanitization fallback. - Site-tag extraction always runs first; well-formedness only rejects truly forbidden chars. - _parse_season_episode() extended with NxNN / NxNNxNN alt forms. - Removed dead helpers: _sanitize, _normalize. Domain cleanup - Deleted fossil services with zero production callers: alfred/domain/movies/services.py alfred/domain/tv_shows/services.py alfred/domain/subtitles/services.py (replaced by subtitles/services/ package) alfred/domain/subtitles/repositories.py - Split monolithic subtitle services into a package (identifier, matcher, placer, pattern_detector, utils) + dedicated knowledge/ package. - MediaInfo split into dedicated package (alfred/domain/shared/media/: audio, video, subtitle, info, matching). Persistence cleanup - Removed dead JSON repositories (movie/subtitle/tvshow_repository.py). Tests - Major expansion of the test suite organized to mirror the source tree. - Removed obsolete _edge_cases test files superseded by structured tests. - Suite: 990 passed, 8 skipped. Misc - .gitignore: exclude env_backup/ and .bak. - Adjustments across agent/llm, app.py, application/filesystem, and infrastructure/filesystem to align with the new domain layout.	2026-05-17 23:38:00 +02:00
francwa	ba6f016d49	feat: generic MetadataStore + read_release_metadata + query_library - Extract MetadataStore from SubtitleMetadataStore (alfred/infrastructure/metadata/). Generic load/save + typed update helpers (update_parse, update_probe, update_tmdb) for the per-release .alfred/metadata.yaml. - SubtitleMetadataStore becomes a thin facade — owns subtitle_history shape, delegates I/O to MetadataStore. - Agent._execute_tool_call auto-persists successful analyze_release / probe_media / find_media_imdb_id results to the release's .alfred file. find_media_imdb_id follows release_focus when it has no path argument. - New tools: · read_release_metadata(release_path) — cacheable, key=release_path. Returns the .alfred content or has_metadata=false. · query_library(name) — substring scan across configured library roots. - Both new tools added to CORE_TOOLS (always visible).	2026-05-15 11:02:25 +02:00
francwa	3c7c6695f2	feat(memory): Phase 1 — STM ToolResultsCache + ReleaseFocus + cache flag in YAML specs Adds two STM components and a transparent cache hook in the agent loop so read-only tools don't re-do work the agent already did in this session. New STM components: - ToolResultsCache — {tool_name: {key: result}}, session-scoped. to_dict() exposes only the key inventory (not payloads) to keep the prompt cheap. - ReleaseFocus — current_release_path + working_set list, updated automatically when a path-keyed inspector runs. YAML spec layer: - New optional 'cache: { key: <param_name> }' block in ToolSpec. - Validated at load time: cache.key must be a declared parameter. - Surfaced on Tool dataclass as cache_key: str \| None. Agent._execute_tool_call: - Pre-exec cache lookup; hit short-circuits and adds _from_cache=true. - Post-exec: stores successful results, updates release_focus for path-keyed tools, refreshes episodic.last_search_results when find_torrent's hit served the response (so get_torrent_by_index keeps pointing at the right list). Cacheable tools (5): analyze_release, probe_media, list_folder, find_media_imdb_id, find_torrent.	2026-05-15 10:44:14 +02:00
francwa	2db3198ef2	feat(agent): migrate all remaining tools to YAML specs (21/21 covered) Adds YAML specs for the 14 tools that were still description-from-docstring: filesystem: - set_path_for_folder, list_folder, analyze_release, probe_media, move_media, manage_subtitles, create_seed_links, learn api: - find_media_imdb_id, find_torrent, get_torrent_by_index, add_torrent_to_qbittorrent, add_torrent_by_index language: - set_language Each spec follows the established shape (summary / description / when_to_use / when_not_to_use / next_steps / parameters with why_needed + example / returns) and the Python function docstring is slimmed to a one-line pointer. Registry now reports: 21 tools, 21 with YAML spec, 0 doc-only.	2026-05-14 21:18:43 +02:00
francwa	23a9dd7990	refactor(memory): rename workflow.target -> params, type -> name The Workflow STM component stored an active workflow as {type, target, stage, started_at}. Now that start_workflow takes a workflow_name and a params dict, those keys match what they actually hold: type -> name (the YAML workflow name, e.g. media.organize_media) target -> params (the dict passed to start_workflow) ShortTermMemory.start_workflow parameters renamed accordingly. All consumers (prompt builder workflow scope + STM context, start/end workflow tools) updated.	2026-05-14 21:11:23 +02:00
francwa	74a52ba6a3	feat(agent): workflow-scoped tool catalog + start/end_workflow meta-tools Introduce a scope-aware agent so the LLM never sees the full 21-tool catalog at once. The system prompt now describes either: - idle mode: core noyau (5 tools: set_language, set_path_for_folder, list_folder, start_workflow, end_workflow) + a list of available workflows with their goals; - active mode: the noyau plus the tools declared by the active workflow's YAML, with the step plan inlined into the prompt. Pieces: - alfred/agent/tools/workflow.py: start_workflow / end_workflow tools (with YAML specs under tools/specs/) that drive memory.stm.workflow. - alfred/agent/prompt.py: CORE_TOOLS constant, visible_tool_names(), filtered build_tools_spec() / _format_tools_description(), and a new _format_workflow_scope() section in the system prompt. - alfred/agent/agent.py: WorkflowLoader wired into Agent, defensive out-of-scope check in _execute_tool_call. - alfred/agent/registry.py: registers the two new meta-tools (21 total, 7 with YAML spec). - workflows/media.organize_media.yaml: tools/steps list refreshed to match the current resolver split (analyze_release, probe_media, resolve_*_destination, move_to_destination).	2026-05-14 21:07:36 +02:00
francwa	97adfbda45	refactor(workflows): adopt media.* naming convention Rename workflow files and their 'name' field with a 'media.' domain prefix to anticipate future multi-domain expansion (mail., calendar., ...). - organize_media -> media.organize_media - manage_subtitles -> media.manage_subtitles WorkflowLoader picks them up unchanged (uses data['name']).	2026-05-14 20:55:35 +02:00
francwa	239fce9e4e	chore(agent): remove dead parameters.py The ParameterSchema / REQUIRED_PARAMETERS / get_missing_required_parameters machinery in alfred/agent/parameters.py was used in early prototypes for the prompt-required-params check but has been unwired from production for several refactors. The new YAML tool-spec layer (alfred/agent/tools/specs/) covers the same need (rich, LLM-facing parameter descriptions) without the parallel registration plumbing. Tests in tests/test_config_edge_cases.py still reference the deleted module — left untouched per the project policy of treating test sync as a dedicated end-of-week task.	2026-05-14 18:06:34 +02:00
francwa	99c95af64e	feat(agent): YAML tool specs as the LLM-facing semantic layer Introduce a first-class semantic layer for tool descriptions, separated from Python signatures (which stay the source of truth for types and required-ness). New - alfred/agent/tools/spec.py — ToolSpec / ParameterSpec / ReturnsSpec dataclasses with strict YAML validation (ToolSpecError on malformed or inconsistent specs). compile_description() builds the rich text passed to the LLM as Tool.description, with sections for summary, description, when_to_use, when_not_to_use, next_steps, and returns. compile_parameter_description() injects the 'why_needed' field next to each parameter so the LLM sees the intent of each argument. - alfred/agent/tools/spec_loader.py — discovers tools/specs/.yaml, enforces filename ↔ spec.name match, rejects duplicates. - alfred/agent/tools/specs/ — one YAML per tool: resolve_season_destination.yaml * resolve_episode_destination.yaml * resolve_movie_destination.yaml * resolve_series_destination.yaml * move_to_destination.yaml Refactor - alfred/agent/registry.py * _create_tool_from_function now takes an optional ToolSpec. When provided, the long description + per-parameter descriptions come from the spec; types and required-ness still come from the Python signature. * Cross-validates spec.parameters against the function signature — crashes on missing or extra entries. * make_tools() loads all specs at startup and hands the right one to each tool. Tools without a spec fall back to the old docstring-only behaviour, so the 14 not-yet-migrated tools keep working unchanged. * Adds 'array' and 'object' to the Python→JSON type mapping and handles Optional[X] / X \| None annotations. - alfred/agent/tools/filesystem.py * Drops the '_tool' suffix on the 4 resolve_* wrappers (option 1: alias the use-case imports as _resolve_). Tool names exposed to the LLM now match the underlying use case verbatim. Wrapper docstrings shrink to a one-liner pointing to the YAML spec — no more duplicated when_to_use/Args/Returns in Python. Verified - make_tools() loads 19 tools (5 with YAML spec, 14 doc-only). - Compiled descriptions render cleanly with all sections.	2026-05-14 18:06:27 +02:00
francwa	b5025bb5f8	refactor(resolve_destination): factor shared series-folder resolution + DTO base - New _Clarification sentinel and _resolve_series_folder() helper — the three TV use cases now share one matching/clarification path instead of triplicating the same if/elif/else block. - New _ResolvedDestinationBase carrying status/question/options/error/ message plus a _base_dict() helper; the four concrete DTOs only declare their own ok-state fields and a slim to_dict(). - No behaviour change: same outputs for ok/needs_clarification/error cases (verified by import + DTO smoke tests).	2026-05-14 16:09:33 +02:00
francwa	e45465d52d	feat: split resolve_destination, persona-driven prompts, qBittorrent relocation Destination resolution - Replace the single ResolveDestinationUseCase with four dedicated functions, one per release type: resolve_season_destination (pack season, folder move) resolve_episode_destination (single episode, file move) resolve_movie_destination (movie, file move) resolve_series_destination (multi-season pack, folder move) - Each returns a dedicated DTO carrying only the fields relevant to that release type — no more polymorphic ResolvedDestination with half the fields unused depending on the case. - Looser series folder matching: exact computed-name match is reused silently; any deviation (different group, multiple candidates) now prompts the user with all options including the computed name. Agent tools - Four new tools wrapping the use cases above; old resolve_destination removed from the registry. - New move_to_destination tool: create_folder + move, chained — used after a resolve_* call to perform the actual relocation. - Low-level filesystem_operations module (create_folder, move via mv) for instant same-FS renames (ZFS). Prompt & persona - New PromptBuilder (alfred/agent/prompt.py) replacing prompts.py: identity + personality block, situational expressions, memory schema, episodic/STM/config context, tool catalogue. - Per-user expression system: knowledge/users/common.yaml + {username}.yaml are merged at runtime; one phrase per situation (greeting/success/error/...) is sampled into the system prompt. qBittorrent integration - Credentials now come from settings (qbittorrent_url/username/password) instead of hardcoded defaults. - New client methods: find_by_name, set_location, recheck — the trio needed to update a torrent's save path and re-verify after a move. - Host→container path translation settings (qbittorrent_host_path / qbittorrent_container_path) for docker-mounted setups. Subtitles - Identifier: strip parenthesized qualifiers (simplified, brazil…) at tokenization; new _tokenize_suffix used for the episode_subfolder pattern so episode-stem tokens no longer pollute language detection. - Placer: extract _build_dest_name so it can be reused by the new dry_run path in ManageSubtitlesUseCase. - Knowledge: add yue, ell, ind, msa, rus, vie, heb, tam, tel, tha, hin, ukr; add 'fre' to fra; add 'simplified'/'traditional' to zho. Misc - LTM workspace: add 'trash' folder slot. - Default LLM provider switched to deepseek. - testing/debug_release.py: CLI to parse a release, hit TMDB, and dry-run the destination resolution end-to-end.	2026-05-14 05:01:59 +02:00
francwa	1723b9fa53	feat: release parser, media type detection, ffprobe integration Replace the old domain/media release parser with a full rewrite under domain/release/: - ParsedRelease with media_type ("movie" \| "tv_show" \| "tv_complete" \| "documentary" \| "concert" \| "other" \| "unknown"), site_tag, parse_path, languages, audio_codec, audio_channels, bit_depth, hdr_format, edition - Well-formedness check + sanitize pipeline (_is_well_formed, _sanitize, _strip_site_tag) before token-level parsing - Multi-token sequence matching for audio (DTS-HD.MA, TrueHD.Atmos…), HDR (DV.HDR10…) and editions (DIRECTORS.CUT…) - Knowledge YAML: file_extensions, release_format, languages, audio, video, editions, sites/c411 New infrastructure: - ffprobe.py — single-pass probe returning MediaInfo (video, audio tracks, subtitle tracks) - find_video.py — locate first video file in a release folder New application helpers: - detect_media_type — filesystem-based type refinement - enrich_from_probe — fill missing ParsedRelease fields from MediaInfo New agent tools: - analyze_release — parse + detect type + ffprobe in one call - probe_media — standalone ffprobe for a specific file New domain value object: - MediaInfo + AudioTrack + SubtitleTrack (domain/shared/media_info.py) Testing CLIs: - recognize_folders_in_downloads.py — full pipeline with colored output - probe_video.py — display MediaInfo for a video file	2026-05-12 16:14:20 +02:00
francwa	249c5de76a	feat: major architectural refactor - Refactor memory system (episodic/STM/LTM with components) - Implement complete subtitle domain (scanner, matcher, placer) - Add YAML workflow infrastructure - Externalize knowledge base (patterns, release groups) - Add comprehensive testing suite - Create manual testing CLIs	2026-05-11 21:55:06 +02:00

1 2 3 4

160 Commits