# Changelog All notable changes to Alfred are documented here. The format is loosely based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). Alfred is not yet on SemVer — entries are grouped by **dated work blocks** instead of release numbers. Granularity targets behavioral or API-visible changes; refer to `git log` for commit-level detail. Sections used per block: **Added** / **Changed** / **Deprecated** / **Removed** / **Fixed** / **Internal** (for tech-debt and refactor noise that doesn't affect callers). --- ## [Unreleased] ### Fixed - **Multi-episode chain (e.g. `S14E09E10E11`) now collapses to a full range.** The parser previously captured `episode=9, episode_end=10` and dropped E11+. It now returns `episode=first, episode_end=last`, with intermediate values implied. Fixture `shitty/archer_multi_episode/` updated from anti-regression-of-bug to anti-regression-of-fix. - **Apostrophes in titles no longer push the release through the AI fallback.** `Honey.Don't.2025.2160p.WEBRip.DSNP.DV.HDR.x265-Amen` previously parsed with `parse_path="ai"` and everything UNKNOWN because `'` is in the forbidden-chars list. Apostrophes are now pre-stripped before the well-formed check, so the parse completes normally (`title=Honey.Dont, year=2025, quality=2160p, ...`); only the title text loses its apostrophe. `parse_path` becomes `sanitized` to surface the cleanup. Side win: PoP fixture `the_prodigy_full_chaos/` also moves from total failure to a partially-correct parse (year, source, codec extracted). - **Season-range markers (`Sxx-yy`) are now recognized as `tv_complete`.** `Der.Tatortreiniger.S01-06.GERMAN...` previously parsed as `media_type=movie` with `S01-06` glued onto the title. The parser now recognizes the range, sets `season=first`, `media_type=tv_complete`, and removes the marker from the title. `is_season_pack` flips to `true`. - **Pure-punctuation TITLE tokens are dropped at assembly.** Releases with surrounding ` - ` separators (`Vinyl - 1x01 - FHD`) previously produced `title="Vinyl.-"`. Such tokens (a stray dash, a wide pipe `|`, …) carry no title content and are now filtered out. Side effect: PoP fixture `khruangbin_yt_wide_pipe/` also benefits — the YouTube wide-pipe no longer leaks into the title. ### Added - **`InspectedResult.recommended_action` property** — derived hint that collapses the orchestrator's go / wait / skip decision into a single value (``"process"`` / ``"ask_user"`` / ``"skip"``). Centralizes the exclusion logic that was previously dispersed across road / media_type / main_video checks at each call site. Ordering is part of the contract: ``skip`` (no main video, or media_type == ``"other"``) wins over ``ask_user`` (media_type == ``"unknown"`` or road == ``"path_of_pain"``) which wins over ``process``. Surfaced through the ``analyze_release`` tool so the LLM can route on it directly. 6 new tests in ``tests/application/test_inspect.py`` cover the four branches and the precedence rules. - **`LanguageRepository` port** in `alfred.domain.shared.ports`. Structural Protocol covering `from_iso`, `from_any`, `all`, `__contains__`, `__len__` — the surface previously coupled to the concrete `LanguageRegistry`. Mirrors the `MediaProber` / `FilesystemScanner` pattern: domain code depends on the Protocol, infrastructure provides the YAML-backed adapter. Tests in `tests/infrastructure/test_language_registry.py`. ### Changed - **`SubtitleCandidate` renamed to `SubtitleScanResult`.** The old name conflated "this might become a placed subtitle" with "this is what a scan pass produced". The class is the output of a scan/identify pass — language/format may still be `None`, confidence reflects how sure the classifier is, and `raw_tokens` holds the filename fragments under analysis. `SubtitleScanResult` says that directly. Pure rename with a refreshed docstring in `alfred/domain/subtitles/entities.py`; no behavior change. Touches the domain entity + `__init__` export, the matcher / identifier / utils services, the manage_subtitles use case, the placer, the metadata store, the shared-media cross-ref comment, and the seven test modules that imported the type. - **`ParsedRelease` is now frozen; enrichment passes return new instances.** The VO was mutable so `detect_media_type` and `enrich_from_probe` could patch fields in place — a code smell in a value object whose identity *is* its content. `ParsedRelease` is now `@dataclass(frozen=True)`; `languages` is a `tuple[str, ...]` instead of a `list[str]`. `enrich_from_probe` returns a new `ParsedRelease` via `dataclasses.replace` (only allocates when at least one field actually changed). `inspect_release` rebinds `parsed` after both `detect_media_type` (wrapped in `MediaTypeToken` to satisfy the strict isinstance check that now also runs on replace) and `enrich_from_probe`. Parser pipeline now packs `languages` as a tuple in the assemble dict. Callers updated: `inspect_release`, `testing/recognize_folders_in_downloads.py`, and the enrichment tests (22 call sites + language assertions switched to tuple literals). - **`resolve_destination` use cases take `kb` / `prober` as required params; module-level singletons gone.** The four `resolve_{season,episode,movie,series}_destination` use cases now accept `kb: ReleaseKnowledge` and `prober: MediaProber` as required arguments, matching the shape of `inspect_release`. The module-level `_KB = YamlReleaseKnowledge()` and `_PROBER = FfprobeMediaProber()` singletons that previously lived in `alfred/application/filesystem/resolve_destination.py` are removed — the application layer no longer reaches into infrastructure. The singletons now live at the agent-tools frontier (`alfred/agent/tools/filesystem.py`), where the LLM-facing wrappers instantiate them once and thread them through. `analyze_release` no longer needs the dirty `from ... import _KB` indirection. Tests inject their own stubs by keyword (`prober=_StubProber(...)`) instead of monkeypatching a module attribute. - **`ParsePath` enum renamed to `TokenizationRoute`.** The old name collided with `pathlib.Path` in code-reading mental models, and was one letter from `parse_path` (the field that holds the value) — making it harder than it needed to be to spot the type vs the attribute. ``TokenizationRoute`` says what it actually captures (DIRECT / SANITIZED / AI = how the name reached the tokenizer), and the class docstring now spells out the orthogonality with ``Road`` (EASY / SHITTY / PATH_OF_PAIN, which captures parser confidence on ``ParseReport``). The ``parse_path`` field name stays unchanged — string values too — so YAML fixtures, the ``analyze_release`` tool spec, and any external consumer are untouched. - **`enrich_from_probe` codec mappings moved to YAML.** The three hard-coded module dicts (`_VIDEO_CODEC_MAP`, `_AUDIO_CODEC_MAP`, `_CHANNEL_MAP`) translating ffprobe output to scene tokens (`hevc → x265`, `eac3 → EAC3`, `8 → "7.1"`, …) now live in `alfred/knowledge/release/probe_mappings.yaml` and are loaded into `ReleaseKnowledge.probe_mappings` (new port field, populated by `YamlReleaseKnowledge`). `enrich_from_probe` gains a third `kb` parameter and reads the maps from there. Aligns with the CLAUDE.md rule that lookup tables of domain knowledge belong in YAML, not in Python — and opens the door to a future "learn new codec" pass. Callers updated: `inspect_release`, `testing/recognize_folders_in_downloads.py`, and all 22 sites in `tests/application/test_enrich_from_probe.py`. - **`ParsedRelease.tech_string` is now a derived `@property`** (`alfred/domain/release/value_objects.py`). It computes `quality.source.codec` joined by dots on every access, so it stays in sync with the underlying fields by construction. The stored field is gone from the dataclass, the dict returned by `assemble()` no longer carries the key, `parse_release`'s malformed-name fallback drops the `tech_string=""` kwarg, and `enrich_from_probe` no longer re-derives it after filling `quality`/`source`/`codec`. Closes the parser/enrichment double-source-of-truth that `e79ca46` had to fix reactively. The fixtures runner now injects `tech_string` alongside `is_season_pack` since `asdict()` skips properties. - **`RuleScope.level` is now an enum (`RuleScopeLevel`).** The set of valid levels (global, release_group, movie, show, season, episode) was documented only in a docstring comment and validated nowhere. `RuleScopeLevel(str, Enum)` keeps wire compatibility (YAML serialization, `.value` access) while making the closed set explicit to type-checkers and IDEs. `to_dict()` emits `.value` strings so YAML output is unchanged. - **`FilePath` VO uses `__post_init__` instead of a hand-rolled `__init__`.** Same public API (accepts `str | Path`), same behavior, but the dataclass-generated `__init__` is no longer bypassed. One less smell in the shared VOs. - **`Language` VO is strict by default; `Language.from_raw()` factory for normalization.** The previous `__post_init__` mutated `iso` and `aliases` via `object.__setattr__` on a frozen dataclass — a code smell hiding behind the dataclass facade. Split: the direct constructor now rejects un-normalized input (uppercase iso, whitespace in aliases, etc.), and `Language.from_raw()` handles arbitrary YAML/user input. Only one caller (LanguageRegistry loading the ISO YAML) needed migration. - **`ParsedRelease.normalised` renamed to `clean`.** The field name promised "dots instead of spaces" but in practice held `raw - site_tag - apostrophes` — only used by `season_folder_name()`. Renamed and docstring corrected. - **`ParsedRelease.media_type` / `parse_path` are strict enums.** The fields were already typed as `MediaTypeToken` / `ParsePath`, but a tolerant `__post_init__` coerced raw strings. With both classes being `(str, Enum)`, the coercion served no purpose. Strict constructor; `.value` no longer passed at call sites; dropped the unused `_VALID_MEDIA_TYPES` / `_VALID_PARSE_PATHS` lookup tables. ### Removed - **`settings.min_movie_size_bytes`** — orphan Pydantic field + validator. Its only consumer (`MovieService.validate_movie_file`) had been removed during an earlier refactor. The "real movie vs sample" rule now lives in extension-based exclusion (`application/release/supported_media.py`) and PoP. If a size threshold is ever needed, it'll go in a knowledge YAML, not in `settings`. ### Internal - **Flattened `alfred.domain.shared.media/` package into a single `media.py` module.** The 6-file package (audio, video, subtitle, info, matching, tracks_mixin + `__init__`) collapsed into one ~250 LoC module. All 12 import sites continue to resolve unchanged (`from alfred.domain.shared.media import AudioTrack, MediaInfo, …`) since Python treats `media.py` and `media/__init__.py` interchangeably for import paths. Easier to scan when the whole bounded-context fits on one screen. - **`SubtitleKnowledgeBase` types `language_registry` against the `LanguageRepository` port** instead of the concrete `LanguageRegistry` class. The default constructor still instantiates the concrete adapter when no repository is injected — behaviour is unchanged for existing callers. Opens the door to in-memory fakes in future tests without loading the full ISO 639 YAML. - **Moved `detect_media_type` and `enrich_from_probe` from `alfred.application.filesystem` to `alfred.application.release`**. They are inspection-pipeline helpers — their natural home is next to `inspect_release`, not next to the filesystem use cases. The move also eliminates a circular-import workaround in `resolve_destination.py`: `inspect_release` can now be imported at module top instead of lazily inside `_resolve_parsed`. Public surface is unchanged for callers that imported the helpers from their full module paths (the only call sites — `inspect.py`, two tests, one testing script — were updated in this commit). ### Added - **`resolve_*_destination` use cases now consume `inspect_release`**. `resolve_episode_destination` and `resolve_movie_destination` reuse their existing `source_file` parameter as the inspection target; `resolve_season_destination` and `resolve_series_destination` gain a new **optional** `source_path` parameter (also threaded through the tool wrappers and YAML specs). When the path exists, ffprobe data fills tokens missing from the release name (e.g. quality) and refreshes `tech_string`, so the destination folder / file names end up more accurate. When the path is missing or absent (back-compat callers), the use cases fall back to parse-only — same behavior as before. ### Fixed - **`enrich_from_probe` now refreshes `tech_string`** after filling `quality` / `source` / `codec`. Previously the field stayed at its parser-time value, so filename builders saw stale tech tokens even after a successful probe. New `TestTechString` class in `tests/application/test_enrich_from_probe.py` locks the behavior. ### Added - **`inspect_release` orchestrator + `InspectedResult` VO** (`alfred/application/release/inspect.py`). Single composition of the four inspection layers: `parse_release` → `detect_media_type` (patches `parsed.media_type`) → `find_main_video` (top-level scan) → `prober.probe` + `enrich_from_probe` when a video exists and the refined media type isn't in `{"unknown", "other"}`. Returns a frozen `InspectedResult(parsed, report, source_path, main_video, media_info, probe_used)` that downstream callers consume directly instead of rebuilding the same chain. `kb` and `prober` are injected — no module-level singletons. Never raises. ### Changed - **`analyze_release` tool now delegates to `inspect_release`** — same output shape, plus two new fields: `confidence` (0–100) and `road` (`"easy"` / `"shitty"` / `"path_of_pain"`) surfaced from the parser's `ParseReport`. The tool spec (`specs/analyze_release.yaml`) documents both fields so the LLM can route releases by confidence. - **`MediaProber` port now covers full media probing**: added `probe(video) -> MediaInfo | None` alongside the existing `list_subtitle_streams`. `FfprobeMediaProber` (in `alfred/infrastructure/probe/`) implements both methods and is now the single adapter shelling out to `ffprobe`. The standalone `alfred/infrastructure/filesystem/ffprobe.py` module was removed — all callers (tools, testing scripts) instantiate `FfprobeMediaProber` instead. Unblocks the upcoming `inspect_release` orchestrator, which depends on the port. ### Removed - `alfred/infrastructure/filesystem/ffprobe.py` (folded into the `FfprobeMediaProber` adapter). --- ## [2026-05-20] — Release parser confidence scoring + exclusion ### Added - **Pre-pipeline exclusion helpers** (`alfred/application/release/supported_media.py`): `is_supported_video(path, kb)` (extension-only check against `kb.video_extensions`) and `find_main_video(folder, kb)` (top-level scan, lexicographically-first eligible file, returns `None` when no video qualifies; accepts a bare file as folder for single-file releases). No size threshold, no filename heuristics — PATH_OF_PAIN handles the exotic cases. Foundation for the future `inspect_release` orchestrator. - **Release parser — parse-confidence scoring** (`alfred/domain/release/parser/scoring.py`, `alfred/knowledge/release/scoring.yaml`). `parse_release` now returns `(ParsedRelease, ParseReport)`. The new `ParseReport` frozen VO carries a 0–100 `confidence`, a `road` (`"easy"` / `"shitty"` / `"path_of_pain"`), the residual UNKNOWN tokens, and the missing critical fields. EASY is decided structurally (a group schema matched); SHITTY vs PATH_OF_PAIN is decided by score against a YAML-configurable cutoff (default 60). Weights and penalties also live in `scoring.yaml` — title 30, media_type 20, year 15, season 10, episode 5, tech 5 each; penalty 5 per UNKNOWN token capped at -30. `Road` is a new enum, distinct from `ParsePath` (which records the tokenization route, not the confidence tier). `ReleaseKnowledge` port gains a `scoring: dict` field. ### Changed - **`parse_release` signature** is now `(name, kb) → tuple[ParsedRelease, ParseReport]` instead of returning a bare `ParsedRelease`. Call sites updated in `application/filesystem/resolve_destination.py` and `agent/tools/filesystem.py`. Tests updated accordingly. --- ## [2026-05-20] — Release parser v2 (EASY + SHITTY) ### Added - **Release parser v2 — EASY path live** (`alfred/domain/release/parser/`): new annotate-based pipeline (tokenize → annotate → assemble) drives releases from known groups. Exposes `Token` (frozen VO with `index` + `role` + `extra`), `TokenRole` enum (structural/technical/meta families), and `GroupSchema` / `SchemaChunk` value objects. - `pipeline.tokenize`: string-ops separator split (no regex), strips a `[site.tag]` prefix/suffix first. - `pipeline.annotate`: detects the trailing group right-to-left (priority to `codec-GROUP` shape, fallback to any non-source dashed token), looks up its `GroupSchema`, then walks tokens and schema chunks in lockstep — optional chunks that don't match are skipped, mandatory mismatches abort EASY and return `None` so the caller can fall back to SHITTY. - `pipeline.assemble`: folds annotated tokens into a `ParsedRelease`-compatible dict. - `parse_release` (in `release.services`) tries the v2 EASY path first and falls through to the legacy SHITTY heuristic on `None`. Legacy SHITTY/PATH OF PAIN behavior is unchanged. - Knowledge: `alfred/knowledge/release/release_groups/{kontrast,elite, rarbg}.yaml` declare the canonical chunk order per group, loaded via new `ReleaseKnowledge.group_schema(name)` port method. - Tests in `tests/domain/release/test_parser_v2_{scaffolding,easy}.py` cover token VOs, site-tag stripping, group detection, schema-driven annotation (movie, TV episode, season pack with optional source), and field assembly. - **Release parser v2 — enricher pass** completes the EASY pipeline. The structural schema walk now tolerates non-positional tokens between chunks (instead of aborting on leftover tokens), and a second pass tags them with audio / video-meta / edition / language roles. Multi-token sequences from `audio.yaml`, `video.yaml`, `editions.yaml` (e.g. `DTS.HD.MA`, `DV.HDR10`, `TrueHD.Atmos`, `DIRECTORS.CUT`) are matched before single tokens. Channel layouts like `5.1` and `7.1` (split into two tokens by the `.` separator) are detected as consecutive pairs. Sequence members carry an `extra["sequence_member"]` marker so `assemble` extracts the canonical value only from the primary token. KONTRAST releases with audio / HDR / edition / language metadata now produce a fully populated `ParsedRelease`. - **Streaming distributor as a separate dimension** from encoding source. New `alfred/knowledge/release/distributors.yaml` (NF, AMZN, DSNP, HMAX, ATVP, HULU, PCOK, PMTP, CR) feeds a new `ReleaseKnowledge.distributors` port field, a `TokenRole.DISTRIBUTOR` annotation, and a `ParsedRelease.distributor` field. `WEB-DL` stays the source; the platform that produced the release is now recorded distinctly. The five entries (NF, AMZN, DSNP, HMAX, ATVP) were correspondingly removed from `sources.yaml`. - **Real-world release fixtures** under `tests/fixtures/releases/{easy,shitty,path_of_pain}/`, each documenting an expected `ParsedRelease` plus the future `routing` (library / torrents / seed_hardlinks) for the upcoming `organize_media` refactor. EASY bucket seeded with 5 cases (movie, single-episode, season pack, movie + noise, YTS bracket-heavy). SHITTY bucket seeded with 15 anti-regression cases covering: 3-level INTEGRALE hierarchy (Angel), French custom titles (Buffy, La Nuit au Musée, Chérie j'ai agrandi), multi-episode chain `S14E09E10E11` (Archer, captures E11 loss), lowercase `s01e01` (Notre Planète), `NxNN` with ` - ` separators (Vinyl, captures dash artifact), title-with-year-suffix (Deutschland.83), season-range `S01-06` (Tatortreiniger, captures movie misclassification), bare folder name (Jurassic Park, media_type=unknown), apostrophe-in-name (Honey Don't, captures full AI-path degeneration), SUBS-tag movie (Hook), space separators (Predator Badlands, captures group=UNKNOWN), subs-only release (Westworld S04). PATH OF PAIN bucket seeded with 10 worst-case fixtures covering: UTF-8 wide pipe yt-dlp slug (Khruangbin), 3-show franchise box-set with double season range and parens-wrapped tech (Deutschland 83-86-89, captures `group=S03` misdetection), accented chars in title (Chérie BéBé with VFF), 8-word stand-up comedy title (Jimmy Carr), site-tag prefix + XviD (OxTorrent), episode title + air-date silently lost (Prodiges), full-chaos apostrophe + spaces + Blu-ray dash + 1080i + multi-word audio codec (The Prodigy, full AI-path degeneration), yt-dlp YouTube ID glued to year (Sleaford Mods), bilingual `[FR-EN]` tag mistaken for group (Super Mario Bros), COMPLETE + S01-S07 range + REPACK + HEVC (Gilmore Girls, the well-behaved exception). Parametrized over `tests/domain/test_release_fixtures.py` for anti-regression. - **`NxNN` alt season/episode form supported** by `parse_release`. Releases like `Show.1x05.720p.HDTV.x264-GRP` and `Show.2x07x08.1080p.WEB.x265-GRP` (multi-ep alt form) now parse as TV shows. - **`alfred/knowledge/release/separators.yaml`** declares the token separators used by the release-name tokenizer (`.`, ` `, `[`, `]`, `(`, `)`, `_`). New conventions can be added without code changes. The canonical `.` is always present even if missing from YAML. ### Changed - **Release parser v2 — SHITTY simplified to dict-driven tagging**. The legacy ~480-line heuristic block in `release/services.py` is gone; `pipeline._annotate_shitty` does a single pass that looks each token up in the kb buckets (resolutions / sources / codecs / distributors / year / `SxxExx`) with first-match-wins semantics, and the leftmost contiguous UNKNOWN run becomes the title. `annotate()` no longer returns `None` — SHITTY is the always-on fallback when no group schema matches. `services.py` shrunk from ~525 to ~85 lines. Four fixtures (`deutschland_franchise_box`, `sleaford_yt_slug`, `super_mario_bilingual`, `predator_space_separators` — the last one moved from `shitty/` → `path_of_pain/`) are now marked `pytest.mark.xfail(strict=False)` documenting PoP-grade pathologies that SHITTY intentionally won't handle. `ReleaseFixture` grows an `xfail_reason` field; the parametrized suite wires the xfail mark automatically. - **`parse_release` tokenizer is now data-driven**: it splits on any character listed in `separators.yaml` (regex character class) instead of `name.split(".")`. This makes YTS-style releases (`The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]`), space-separated names (`Inception 2010 1080p BluRay x264-GROUP`), and underscore-separated names parse correctly via the direct path — no more fallback through sanitization. - **`parse_release` flow simplified**: site-tag extraction always runs first (so `parse_path == "sanitized"` now reliably indicates a stripped `[tag]`), then well-formedness is checked only against truly forbidden chars (anything not in the configured separator set). - **ISO 639-2/B is now the canonical language code project-wide** (was a mix of 639-1 and 639-2/T): - `SubtitlePreferences.languages` default is now `["fre", "eng"]` (was `["fr", "en"]`). Old LTM files are not auto-migrated — delete `data/memory/ltm.json` to regenerate with the new defaults. - Subtitle output filenames are now `{iso639_2b}.srt` (e.g. `fre.srt`, `fre.sdh.srt`). Existing `fr.srt` files are still **read** correctly (recognized as French via alias) but new files are written canonically. - `Language` value object docstring corrected: it has always stored 639-2/B (matching what ffprobe emits), not 639-2/T as previously documented. - **`MovieService.validate_movie_file` minimum size is now configurable** via `settings.min_movie_size_bytes` (default unchanged: 100 MB). Constructor accepts an optional `min_movie_size_bytes` override for tests. - **`SubtitleKnowledgeBase` delegates language lookup to `LanguageRegistry`** rather than duplicating tokens. `subtitles.yaml` now only declares subtitle-specific tokens (e.g. `vostfr`, `vf`, `vff`) under a new `language_tokens` section. ### Removed - **`alfred/domain/tv_shows/services.py`** and **`alfred/domain/movies/services.py`** deleted entirely. They held fossil parsers (`parse_episode_filename`, `extract_movie_metadata`, …) with zero production callers — superseded by `parse_release` as the single source of truth for release-name parsing. Associated tests (`tests/domain/test_movies.py`, `tests/domain/test_tv_shows_service.py`) removed as well. - `_sanitize` and `_normalize` helpers in `alfred/domain/release/services.py` — the new tokenizer makes them redundant. - `_LANG_KEYWORDS`, `_SDH_TOKENS`, `_FORCED_TOKENS`, `SUBTITLE_EXTENSIONS` hardcoded dicts in `alfred/domain/subtitles/scanner.py` — all knowledge now lives in YAML (CLAUDE.md compliance). - `_MIN_MOVIE_SIZE_BYTES` module-level constant in `alfred/domain/movies/services.py` — replaced by the new setting. - Top-level `languages:` block in `subtitles.yaml` — superseded by `language_tokens:` (subtitle-specific only) since iso_languages.yaml is the canonical source. ### Fixed - **`hi` token no longer marks a subtitle as SDH** (it conflicted with the ISO 639-1 alias for Hindi). SDH is now detected only via `sdh`, `cc`, and `hearing` tokens. - `SubtitleKnowledgeBase` default rules used `"fra"` while `iso_languages.yaml` exposes French as `"fre"` — preferred languages defaults now match the canonical form. ### Internal - **Domain I/O extraction** (`refactor/domain-io-extraction`): the domain layer no longer performs subprocess calls, filesystem scans, or YAML loading. Achieved in a series of focused commits: - **Knowledge YAML loaders moved to infrastructure**: `alfred/domain/release/knowledge.py`, `alfred/domain/shared/knowledge/language_registry.py`, and `alfred/domain/subtitles/knowledge/{base,loader}.py` relocated to `alfred/infrastructure/knowledge/`. Re-exports were dropped — callers import directly from the new location. - **`MediaProber` and `FilesystemScanner` Protocol ports** introduced at `alfred/domain/shared/ports/` with frozen-dataclass DTOs (`SubtitleStreamInfo`, `FileEntry`). `SubtitleIdentifier` and `PatternDetector` are now constructor-injected with concrete adapters (`FfprobeMediaProber` wrapping `subprocess.run(ffprobe)` and `PathlibFilesystemScanner` wrapping `pathlib`). No more direct `subprocess`/`pathlib` usage from the subtitle domain services. - **Live filesystem methods removed from VOs and entities**: `FilePath.exists()` / `.is_file()` / `.is_dir()` deleted — `FilePath` is now a pure address VO. `Movie.has_file()` and `Episode.is_downloaded()` dropped. Callers either rely on a prior detection step or use try/except over pre-checks (eliminates TOCTOU races). - **`SubtitlePlacer` moved to the application layer** at `alfred/application/subtitles/placer.py` — it performs `os.link` I/O, which doesn't belong in the domain. Pre-checks replaced with try/except for `FileNotFoundError`/`FileExistsError`. - **`SubtitleRuleSet.resolve()` no longer reaches into the knowledge base**: the implicit `DEFAULT_RULES()` helper is gone, replaced by an explicit `default_rules: SubtitleMatchingRules` parameter. The `ManageSubtitles` use case loads defaults from the KB once and passes them in. - **`SubtitleKnowledge` Protocol port** at `alfred/domain/subtitles/ports/knowledge.py` declares the read-only query surface domain services consume (7 methods: `known_extensions`, `format_for_extension`, `language_for_token`, `is_known_lang_token`, `type_for_token`, `is_known_type_token`, `patterns`). `SubtitleIdentifier` and `PatternDetector` depend on this Protocol instead of the concrete `SubtitleKnowledgeBase` from infrastructure — `domain/subtitles/` now has zero imports from `infrastructure/`. The remaining domain → infra leak (`domain/release/` loading separator YAML at import-time) is documented in tech-debt and scheduled for its own branch. - **`to_dot_folder_name(title)` helper** in `alfred/domain/shared/value_objects.py` — extracts the `re.sub(r"[^\w\s\.\-]", "", title).replace(" ", ".")` pattern that was duplicated between `MovieTitle.normalized()` and `TVShow.get_folder_name()`. - **`ParsedRelease.languages` uses `field(default_factory=list)`** instead of a manual `__post_init__` that assigned `[]` via `object.__setattr__`. - **`file_extensions.yaml` splits subtitle sidecars (`.srt`, `.sub`, `.idx`, `.ass`, `.ssa`) into a dedicated `subtitle:` category** instead of lumping them under `metadata:`. The `_METADATA_EXTENSIONS` set used by `detect_media_type` remains the union of both (same behavior — subtitles are still ignored when deciding the media type of a folder), but a new `load_subtitle_extensions()` loader is now available for the subtitles domain. Sematic clarity, no functional change. - **`tv_shows/entities.py` module docstring** now shows the aggregate ownership as an ASCII tree before the rule text — quicker visual scan of the DDD structure. - Removed backward-compat shims `_sanitise_for_fs` / `_strip_episode_from_normalised` from `domain/release/value_objects.py` (zero callers). - Cleaned ruff warnings across the codebase: `subprocess.run` calls now pass explicit `check=False` (PLW1510); lazy imports promoted to module top where there was no cycle (PLC0415 in `manage_subtitles.py`, `placer.py`, `qbittorrent/client.py`, `file_manager.py`); fixed module-level import ordering (E402) in `language_registry.py` and `subtitles/knowledge/loader.py`; removed unused locals (F841 / B007); replaced unnecessary set comprehension with `set()` in `release/knowledge.py` (C416). - Ruff config: ignore `PLR0911` / `PLR0912` (too-many-returns / too-many-branches) globally — noisy on parser mappers and orchestrator use-cases where early-return validation is essential complexity. Ignore `PLW0603` for the documented memory singleton (`infrastructure/persistence/context.py`). - **Release-knowledge DDD purification** (`refactor/domain-release-knowledge`): the last domain → infrastructure leak (`domain/release/value_objects.py` loading YAML at import-time) is gone. Achieved via: - **`ReleaseKnowledge` Protocol port** at `alfred/domain/release/ports/knowledge.py` declares the read-only query surface release parsing needs (token sets for resolutions, sources, codecs, languages, hdr extras; structured dicts for audio, video_meta, editions, media_type_tokens; separators list; file-extension sets used by application/infra callers; `sanitize_for_fs(text)` method). - **`YamlReleaseKnowledge` adapter** at `alfred/infrastructure/knowledge/release_kb.py` loads every YAML constant once at construction. Builds an immutable `str.maketrans` translation table for filesystem sanitization. - **`parse_release(name, kb)`** takes the knowledge as an explicit parameter — no more module-level YAML loading inside the domain. Every internal helper (`_tokenize`, `_extract_tech`, `_extract_languages`, `_extract_audio`, `_extract_video_meta`, `_extract_edition`, `_extract_title`, `_infer_media_type`, `_is_well_formed`) takes `kb`. - **`ParsedRelease` Option B**: sanitization happens once at parse time and is stored on a new `title_sanitized: str` field. Builder methods (`show_folder_name`, `season_folder_name`, `episode_filename`, `movie_folder_name`, `movie_filename`) are now pure — they accept already-sanitized `tmdb_title_safe` / `tmdb_episode_title_safe` arguments. Callers at the use-case boundary sanitize TMDB strings via `kb.sanitize_for_fs(...)` before passing them in. - **All domain-knowledge constants removed from `value_objects.py`**: `_RESOLUTIONS`, `_SOURCES`, `_CODECS`, `_AUDIO`, `_VIDEO_META`, `_EDITIONS`, `_HDR_EXTRA`, `_MEDIA_TYPE_TOKENS`, `_LANGUAGE_TOKENS`, `_FORBIDDEN_CHARS`, `_VIDEO_EXTENSIONS`, `_NON_VIDEO_EXTENSIONS`, `_SUBTITLE_EXTENSIONS`, `_METADATA_EXTENSIONS`, `_WIN_FORBIDDEN_TABLE`, and the `_sanitize_for_fs` helper. The domain module is now pure. - **Application-layer KB singleton**: `resolve_destination.py` instantiates a module-level `_KB: ReleaseKnowledge = YamlReleaseKnowledge()` and threads it through every `parse_release(...)` call. The local `_sanitize` helper and `_WIN_FORBIDDEN` regex were dropped in favor of `_KB.sanitize_for_fs(...)`. - **`detect_media_type(parsed, source_path, kb)` and `find_video_file(path, kb)`** now take the knowledge explicitly instead of importing `_*_EXTENSIONS` constants from the domain. `agent/tools/filesystem.py::analyze_release` imports the application KB singleton and passes it through. --- ## [2026-05-17] — TVShow & Movie aggregate refactor Multi-phase refonte of the TV show domain into a real DDD aggregate, with matching parity work on `Movie`, a language knowledge system, and the `shared/media` restructure that supports both. ### Added - **Language knowledge system** (`alfred/knowledge/iso_languages.yaml` + 42 languages including `und` for undetermined). - `Language` value object (frozen dataclass) with `iso`, `english_name`, `native_name`, `aliases`, and a `matches(raw)` cross-format helper. - `LanguageRegistry` loader (`alfred/domain/shared/knowledge/`) merging builtin + learned YAML. Not a singleton — the application layer instantiates it. - ISO 639-2/B is the canonical key; aliases cover 639-1, 639-2/T, English name, native name, and common spellings. - **`VideoTrack`** dataclass (`alfred/domain/shared/media/video.py`) with a `resolution` property using width-priority bucket detection (handles cinema/scope crops like 1920×960 → 1080p). - **`shared/media/matching.py`** — `track_lang_matches` helper shared by `Episode` and `Movie`. Implements the **"C+" contract** for language helpers: - `Language` query → cross-format match via `Language.matches()` - `str` query → case-insensitive direct comparison (no normalization) - **TVShow aggregate composition**: - `TVShow.seasons: dict[SeasonNumber, Season]` - `Season.episodes: dict[EpisodeNumber, Episode]` - `Season.expected_episodes` / `Season.aired_episodes` (split so collection state can compare "owned vs aired today" without confusing in-flight seasons with future ones) - **Aggregate methods on `TVShow`**: - `add_episode(ep)` — sole sanctioned mutation entry point (creates the season if missing) - `add_season(season)` — replaces a season wholesale - `collection_status()` → `CollectionStatus.{EMPTY, PARTIAL, COMPLETE}` - `is_complete_series()` — true iff `ENDED + COMPLETE` - `missing_episodes()` — flat list of all aired-but-not-owned `(season, episode)` pairs - **`CollectionStatus`** enum (orthogonal to `ShowStatus`). - **Episode track helpers** (`has_audio_in`, `has_subtitles_in`, `has_forced_subs`, `audio_languages`, `subtitle_languages`), driven by `Episode.audio_tracks` / `Episode.subtitle_tracks`. - **Movie aggregate parity** — `Movie` now carries `audio_tracks` / `subtitle_tracks` and exposes the same helpers as `Episode` (same C+ contract). - **`CHANGELOG.md`** (this file). ### Changed - **`shared/media_info.py` exploded into `shared/media/{audio,video,subtitle,info,matching}.py`.** `MediaInfo` is now symmetric: every stream type is a `list[Track]`. Flat accessors (`width`, `height`, `video_codec`, `resolution`) remain as properties that read the first video track. - **`MediaInfo.duration_seconds` / `bitrate_kbps`** moved from `VideoTrack` to `MediaInfo` (file-level — they come from the ffprobe `format` block, not a stream). Files without a video stream now correctly expose duration. - **`ShowStatus.from_string`** extended to map TMDB strings (`Returning Series`, `In Production`, `Pilot`, `Planned`, `Canceled`, `Cancelled`). Comparison is whitespace-trimmed and case-insensitive. - **`Season` / `Episode`** dropped their `show_imdb_id` back-references. They are owned by `TVShow` and reached only through it. - **`TVShow.seasons_count` and `episode_count`** are now `@property` (computed from the dict) instead of stored ints. - **`TVShowService.parse_episode_from_filename`** rewritten in string operations (no regex). Supports `S01E05` / `s1e5` and `1x05` / `01x5` forms. - **`TVShowService.find_next_episode`** now drives off `show.missing_episodes()` instead of the hardcoded "max 50 episodes per season" heuristic. - **`TVShowService` constructor** no longer takes `season_repository` / `episode_repository` — the aggregate persists in one block via `TVShowRepository` only. - **`SubtitleTrack` in `alfred.domain.subtitles.entities` renamed to `SubtitleCandidate`.** Coexists with the `shared.media.SubtitleTrack` ffprobe-view dataclass (different bounded contexts, kept separate intentionally). - **`tv_shows/services.py` `_VIDEO_EXTENSIONS`** now loaded from `knowledge/release/file_extensions.yaml` via `load_video_extensions()` (single source of truth). - **`CLAUDE.md`** updated with three new policy sections: - "Tests" — small updates OK during normal work, no mass-update sprees - "Backwards-compatibility shims" — prefer clean migration over shims - "Regex" — not forbidden, use judgment when string ops would be fragile ### Removed - **Legacy `Season N Episode N` filename form** in `TVShowService.parse_episode_from_filename`. It never appears in the release names Alfred handles, and supporting it forced a regex. - **`SeasonRepository` and `EpisodeRepository`** — only the aggregate root has a repository (DDD rule: one repo per aggregate). - **`shared/media_info.py`** compatibility shim — callers updated. - **`SubtitleTrack` compatibility alias** in `subtitles.entities` — callers updated to `SubtitleCandidate`. ### Fixed - **`MediaInfo.duration_seconds` returns `None` on audio-only files** instead of crashing through `primary_video.duration_seconds` (see the duration/bitrate move under **Changed**). - **`MediaOrganizer`** (`infrastructure/filesystem/organizer.py`) no longer passes the removed `show_imdb_id` / `episode_count` kwargs when constructing a `Season` for folder-name generation. ### Internal - Test suite rewritten where the aggregate redesign broke fixtures: `tests/domain/test_tv_shows.py` (69 tests), `tests/domain/test_media_info.py` (rewritten for `VideoTrack`), `tests/application/test_enrich_from_probe.py` (helper added), `tests/infrastructure/test_filesystem_extras.py` (fixtures), `tests/domain/test_tv_shows_service.py` (find_next_episode driven by real aggregate state). - Subtitle services internal migration: `matcher.py`, `utils.py`, `placer.py`, `identifier.py` updated to import `SubtitleCandidate`. - Suite status at end of block: **1066 passed, 8 skipped, 0 failed**.