alfred/CHANGELOG.md

# Changelog

All notable changes to Alfred are documented here.

The format is loosely based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
Alfred is not yet on SemVer — entries are grouped by **dated work blocks** instead
of release numbers. Granularity targets behavioral or API-visible changes; refer
to `git log` for commit-level detail.

Sections used per block: **Added** / **Changed** / **Deprecated** / **Removed** /
**Fixed** / **Internal** (for tech-debt and refactor noise that doesn't affect
callers).

---

## [Unreleased]

### Removed

- **`settings.min_movie_size_bytes` retiré.** Le champ Pydantic + son
  validator n'avaient plus aucun consommateur (l'ancien
  `MovieService.validate_movie_file` ayant été supprimé lors d'une
  refonte précédente). La règle "est-ce un vrai film ou un sample"
  est désormais portée par l'exclusion par extension
  (`application/release/supported_media.py`) et le PoP. Si on a un
  jour besoin d'un seuil de taille, il ira dans un YAML knowledge,
  pas dans `settings`.

### Fixed

- **Multi-episode chain (e.g. `S14E09E10E11`) now collapses to a full
  range.** The parser previously captured `episode=9, episode_end=10`
  and dropped E11+. It now returns `episode=first, episode_end=last`,
  with intermediate values implied. Fixture
  `shitty/archer_multi_episode/` updated from anti-regression-of-bug
  to anti-regression-of-fix.
- **Apostrophes in titles no longer push the release through the AI
  fallback.** `Honey.Don't.2025.2160p.WEBRip.DSNP.DV.HDR.x265-Amen`
  previously parsed with `parse_path="ai"` and everything UNKNOWN
  because `'` is in the forbidden-chars list. Apostrophes are now
  pre-stripped before the well-formed check, so the parse completes
  normally (`title=Honey.Dont, year=2025, quality=2160p, ...`); only
  the title text loses its apostrophe. `parse_path` becomes
  `sanitized` to surface the cleanup. Side win: PoP fixture
  `the_prodigy_full_chaos/` also moves from total failure to a
  partially-correct parse (year, source, codec extracted).
- **Season-range markers (`Sxx-yy`) are now recognized as
  `tv_complete`.** `Der.Tatortreiniger.S01-06.GERMAN...` previously
  parsed as `media_type=movie` with `S01-06` glued onto the title.
  The parser now recognizes the range, sets `season=first`,
  `media_type=tv_complete`, and removes the marker from the title.
  `is_season_pack` flips to `true`.
- **Pure-punctuation TITLE tokens are dropped at assembly.** Releases
  with surrounding ` - ` separators (`Vinyl - 1x01 - FHD`) previously
  produced `title="Vinyl.-"`. Such tokens (a stray dash, a wide pipe
  `｜`, …) carry no title content and are now filtered out. Side
  effect: PoP fixture `khruangbin_yt_wide_pipe/` also benefits — the
  YouTube wide-pipe no longer leaks into the title.

### Added

- **`LanguageRepository` port** in `alfred.domain.shared.ports`. Structural
  Protocol covering `from_iso`, `from_any`, `all`, `__contains__`, `__len__`
  — the surface previously coupled to the concrete `LanguageRegistry`.
  Mirrors the `MediaProber` / `FilesystemScanner` pattern: domain code
  depends on the Protocol, infrastructure provides the YAML-backed
  adapter. Tests in `tests/infrastructure/test_language_registry.py`.

### Internal

- **Flattened `alfred.domain.shared.media/` package into a single
  `media.py` module.** The 6-file package (audio, video, subtitle,
  info, matching, tracks_mixin + `__init__`) collapsed into one ~250
  LoC module. All 12 import sites continue to resolve unchanged
  (`from alfred.domain.shared.media import AudioTrack, MediaInfo, …`)
  since Python treats `media.py` and `media/__init__.py`
  interchangeably for import paths. Easier to scan when the whole
  bounded-context fits on one screen.
- **`SubtitleKnowledgeBase` types `language_registry` against the
  `LanguageRepository` port** instead of the concrete `LanguageRegistry`
  class. The default constructor still instantiates the concrete adapter
  when no repository is injected — behaviour is unchanged for existing
  callers. Opens the door to in-memory fakes in future tests without
  loading the full ISO 639 YAML.
- **Moved `detect_media_type` and `enrich_from_probe` from
  `alfred.application.filesystem` to `alfred.application.release`**.
  They are inspection-pipeline helpers — their natural home is next to
  `inspect_release`, not next to the filesystem use cases. The move
  also eliminates a circular-import workaround in
  `resolve_destination.py`: `inspect_release` can now be imported at
  module top instead of lazily inside `_resolve_parsed`. Public
  surface is unchanged for callers that imported the helpers from
  their full module paths (the only call sites — `inspect.py`, two
  tests, one testing script — were updated in this commit).

### Added

- **`resolve_*_destination` use cases now consume `inspect_release`**.
  `resolve_episode_destination` and `resolve_movie_destination` reuse
  their existing `source_file` parameter as the inspection target;
  `resolve_season_destination` and `resolve_series_destination` gain
  a new **optional** `source_path` parameter (also threaded through
  the tool wrappers and YAML specs). When the path exists, ffprobe
  data fills tokens missing from the release name (e.g. quality) and
  refreshes `tech_string`, so the destination folder / file names
  end up more accurate. When the path is missing or absent (back-compat
  callers), the use cases fall back to parse-only — same behavior as
  before.

### Fixed

- **`enrich_from_probe` now refreshes `tech_string`** after filling
  `quality` / `source` / `codec`. Previously the field stayed at its
  parser-time value, so filename builders saw stale tech tokens even
  after a successful probe. New `TestTechString` class in
  `tests/application/test_enrich_from_probe.py` locks the behavior.

### Added

- **`inspect_release` orchestrator + `InspectedResult` VO**
  (`alfred/application/release/inspect.py`). Single composition of the
  four inspection layers: `parse_release` → `detect_media_type` (patches
  `parsed.media_type`) → `find_main_video` (top-level scan) →
  `prober.probe` + `enrich_from_probe` when a video exists and the
  refined media type isn't in `{"unknown", "other"}`. Returns a frozen
  `InspectedResult(parsed, report, source_path, main_video, media_info,
  probe_used)` that downstream callers consume directly instead of
  rebuilding the same chain. `kb` and `prober` are injected — no
  module-level singletons. Never raises.

### Changed

- **`analyze_release` tool now delegates to `inspect_release`** — same
  output shape, plus two new fields: `confidence` (0–100) and `road`
  (`"easy"` / `"shitty"` / `"path_of_pain"`) surfaced from the parser's
  `ParseReport`. The tool spec (`specs/analyze_release.yaml`) documents
  both fields so the LLM can route releases by confidence.

- **`MediaProber` port now covers full media probing**: added
  `probe(video) -> MediaInfo | None` alongside the existing
  `list_subtitle_streams`. `FfprobeMediaProber` (in
  `alfred/infrastructure/probe/`) implements both methods and is now
  the single adapter shelling out to `ffprobe`. The standalone
  `alfred/infrastructure/filesystem/ffprobe.py` module was removed —
  all callers (tools, testing scripts) instantiate
  `FfprobeMediaProber` instead. Unblocks the upcoming
  `inspect_release` orchestrator, which depends on the port.

### Removed

- `alfred/infrastructure/filesystem/ffprobe.py` (folded into the
  `FfprobeMediaProber` adapter).

---

## [2026-05-20] — Release parser confidence scoring + exclusion

### Added

- **Pre-pipeline exclusion helpers** (`alfred/application/release/supported_media.py`):
  `is_supported_video(path, kb)` (extension-only check against
  `kb.video_extensions`) and `find_main_video(folder, kb)` (top-level
  scan, lexicographically-first eligible file, returns `None` when no
  video qualifies; accepts a bare file as folder for single-file
  releases). No size threshold, no filename heuristics —
  PATH_OF_PAIN handles the exotic cases. Foundation for the future
  `inspect_release` orchestrator.

- **Release parser — parse-confidence scoring** (`alfred/domain/release/parser/scoring.py`,
  `alfred/knowledge/release/scoring.yaml`). `parse_release` now returns
  `(ParsedRelease, ParseReport)`. The new `ParseReport` frozen VO
  carries a 0–100 `confidence`, a `road` (`"easy"` / `"shitty"` /
  `"path_of_pain"`), the residual UNKNOWN tokens, and the missing
  critical fields. EASY is decided structurally (a group schema
  matched); SHITTY vs PATH_OF_PAIN is decided by score against a
  YAML-configurable cutoff (default 60). Weights and penalties also
  live in `scoring.yaml` — title 30, media_type 20, year 15, season
  10, episode 5, tech 5 each; penalty 5 per UNKNOWN token capped at
  -30. `Road` is a new enum, distinct from `ParsePath` (which records
  the tokenization route, not the confidence tier). `ReleaseKnowledge`
  port gains a `scoring: dict` field.

### Changed

- **`parse_release` signature** is now `(name, kb) → tuple[ParsedRelease,
  ParseReport]` instead of returning a bare `ParsedRelease`. Call
  sites updated in `application/filesystem/resolve_destination.py` and
  `agent/tools/filesystem.py`. Tests updated accordingly.

---

## [2026-05-20] — Release parser v2 (EASY + SHITTY)

### Added

- **Release parser v2 — EASY path live** (`alfred/domain/release/parser/`):
  new annotate-based pipeline (tokenize → annotate → assemble) drives
  releases from known groups. Exposes `Token` (frozen VO with `index` +
  `role` + `extra`), `TokenRole` enum (structural/technical/meta families),
  and `GroupSchema` / `SchemaChunk` value objects.
  - `pipeline.tokenize`: string-ops separator split (no regex), strips
    a `[site.tag]` prefix/suffix first.
  - `pipeline.annotate`: detects the trailing group right-to-left
    (priority to `codec-GROUP` shape, fallback to any non-source dashed
    token), looks up its `GroupSchema`, then walks tokens and schema
    chunks in lockstep — optional chunks that don't match are skipped,
    mandatory mismatches abort EASY and return `None` so the caller can
    fall back to SHITTY.
  - `pipeline.assemble`: folds annotated tokens into a
    `ParsedRelease`-compatible dict.
  - `parse_release` (in `release.services`) tries the v2 EASY path first
    and falls through to the legacy SHITTY heuristic on `None`. Legacy
    SHITTY/PATH OF PAIN behavior is unchanged.
  - Knowledge: `alfred/knowledge/release/release_groups/{kontrast,elite,
    rarbg}.yaml` declare the canonical chunk order per group, loaded via
    new `ReleaseKnowledge.group_schema(name)` port method.
  - Tests in `tests/domain/release/test_parser_v2_{scaffolding,easy}.py`
    cover token VOs, site-tag stripping, group detection, schema-driven
    annotation (movie, TV episode, season pack with optional source),
    and field assembly.

- **Release parser v2 — enricher pass** completes the EASY pipeline.
  The structural schema walk now tolerates non-positional tokens
  between chunks (instead of aborting on leftover tokens), and a second
  pass tags them with audio / video-meta / edition / language roles.
  Multi-token sequences from `audio.yaml`, `video.yaml`, `editions.yaml`
  (e.g. `DTS.HD.MA`, `DV.HDR10`, `TrueHD.Atmos`, `DIRECTORS.CUT`) are
  matched before single tokens. Channel layouts like `5.1` and `7.1`
  (split into two tokens by the `.` separator) are detected as
  consecutive pairs. Sequence members carry an `extra["sequence_member"]`
  marker so `assemble` extracts the canonical value only from the
  primary token. KONTRAST releases with audio / HDR / edition / language
  metadata now produce a fully populated `ParsedRelease`.

- **Streaming distributor as a separate dimension** from encoding source.
  New `alfred/knowledge/release/distributors.yaml` (NF, AMZN, DSNP, HMAX,
  ATVP, HULU, PCOK, PMTP, CR) feeds a new `ReleaseKnowledge.distributors`
  port field, a `TokenRole.DISTRIBUTOR` annotation, and a
  `ParsedRelease.distributor` field. `WEB-DL` stays the source; the
  platform that produced the release is now recorded distinctly. The
  five entries (NF, AMZN, DSNP, HMAX, ATVP) were correspondingly removed
  from `sources.yaml`.

- **Real-world release fixtures** under `tests/fixtures/releases/{easy,shitty,path_of_pain}/`,
  each documenting an expected `ParsedRelease` plus the future `routing`
  (library / torrents / seed_hardlinks) for the upcoming `organize_media`
  refactor. EASY bucket seeded with 5 cases (movie, single-episode, season
  pack, movie + noise, YTS bracket-heavy). SHITTY bucket seeded with 15
  anti-regression cases covering: 3-level INTEGRALE hierarchy (Angel),
  French custom titles (Buffy, La Nuit au Musée, Chérie j'ai agrandi),
  multi-episode chain `S14E09E10E11` (Archer, captures E11 loss),
  lowercase `s01e01` (Notre Planète), `NxNN` with ` - ` separators
  (Vinyl, captures dash artifact), title-with-year-suffix (Deutschland.83),
  season-range `S01-06` (Tatortreiniger, captures movie misclassification),
  bare folder name (Jurassic Park,
  media_type=unknown), apostrophe-in-name (Honey Don't, captures full AI-path
  degeneration), SUBS-tag movie (Hook), space separators (Predator Badlands,
  captures group=UNKNOWN), subs-only release (Westworld S04).
  PATH OF PAIN bucket seeded with 10 worst-case fixtures covering:
  UTF-8 wide pipe yt-dlp slug (Khruangbin), 3-show franchise box-set
  with double season range and parens-wrapped tech (Deutschland 83-86-89,
  captures `group=S03` misdetection), accented chars in title (Chérie
  BéBé with VFF), 8-word stand-up comedy title (Jimmy Carr), site-tag
  prefix + XviD (OxTorrent), episode title + air-date silently lost
  (Prodiges), full-chaos apostrophe + spaces + Blu-ray dash + 1080i +
  multi-word audio codec (The Prodigy, full AI-path degeneration),
  yt-dlp YouTube ID glued to year (Sleaford Mods), bilingual `[FR-EN]`
  tag mistaken for group (Super Mario Bros), COMPLETE + S01-S07 range +
  REPACK + HEVC (Gilmore Girls, the well-behaved exception).
  Parametrized over `tests/domain/test_release_fixtures.py` for anti-regression.
- **`NxNN` alt season/episode form supported** by `parse_release`. Releases like
  `Show.1x05.720p.HDTV.x264-GRP` and `Show.2x07x08.1080p.WEB.x265-GRP` (multi-ep
  alt form) now parse as TV shows.
- **`alfred/knowledge/release/separators.yaml`** declares the token separators
  used by the release-name tokenizer (`.`, ` `, `[`, `]`, `(`, `)`, `_`). New
  conventions can be added without code changes. The canonical `.` is always
  present even if missing from YAML.

### Changed

- **Release parser v2 — SHITTY simplified to dict-driven tagging**.
  The legacy ~480-line heuristic block in `release/services.py` is gone;
  `pipeline._annotate_shitty` does a single pass that looks each token
  up in the kb buckets (resolutions / sources / codecs / distributors /
  year / `SxxExx`) with first-match-wins semantics, and the leftmost
  contiguous UNKNOWN run becomes the title. `annotate()` no longer
  returns `None` — SHITTY is the always-on fallback when no group schema
  matches. `services.py` shrunk from ~525 to ~85 lines. Four fixtures
  (`deutschland_franchise_box`, `sleaford_yt_slug`,
  `super_mario_bilingual`, `predator_space_separators` — the last one
  moved from `shitty/` → `path_of_pain/`) are now marked
  `pytest.mark.xfail(strict=False)` documenting PoP-grade pathologies
  that SHITTY intentionally won't handle. `ReleaseFixture` grows an
  `xfail_reason` field; the parametrized suite wires the xfail mark
  automatically.

- **`parse_release` tokenizer is now data-driven**: it splits on any character
  listed in `separators.yaml` (regex character class) instead of `name.split(".")`.
  This makes YTS-style releases (`The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]`),
  space-separated names (`Inception 2010 1080p BluRay x264-GROUP`), and
  underscore-separated names parse correctly via the direct path — no more
  fallback through sanitization.
- **`parse_release` flow simplified**: site-tag extraction always runs first
  (so `parse_path == "sanitized"` now reliably indicates a stripped `[tag]`),
  then well-formedness is checked only against truly forbidden chars
  (anything not in the configured separator set).
- **ISO 639-2/B is now the canonical language code project-wide** (was a mix of
  639-1 and 639-2/T):
  - `SubtitlePreferences.languages` default is now `["fre", "eng"]` (was
    `["fr", "en"]`). Old LTM files are not auto-migrated — delete
    `data/memory/ltm.json` to regenerate with the new defaults.
  - Subtitle output filenames are now `{iso639_2b}.srt` (e.g. `fre.srt`,
    `fre.sdh.srt`). Existing `fr.srt` files are still **read** correctly
    (recognized as French via alias) but new files are written canonically.
  - `Language` value object docstring corrected: it has always stored 639-2/B
    (matching what ffprobe emits), not 639-2/T as previously documented.
- **`MovieService.validate_movie_file` minimum size is now configurable** via
  `settings.min_movie_size_bytes` (default unchanged: 100 MB). Constructor
  accepts an optional `min_movie_size_bytes` override for tests.
- **`SubtitleKnowledgeBase` delegates language lookup to `LanguageRegistry`**
  rather than duplicating tokens. `subtitles.yaml` now only declares
  subtitle-specific tokens (e.g. `vostfr`, `vf`, `vff`) under a new
  `language_tokens` section.

### Removed

- **`alfred/domain/tv_shows/services.py`** and **`alfred/domain/movies/services.py`**
  deleted entirely. They held fossil parsers (`parse_episode_filename`,
  `extract_movie_metadata`, …) with zero production callers — superseded by
  `parse_release` as the single source of truth for release-name parsing.
  Associated tests (`tests/domain/test_movies.py`, `tests/domain/test_tv_shows_service.py`)
  removed as well.
- `_sanitize` and `_normalize` helpers in `alfred/domain/release/services.py` —
  the new tokenizer makes them redundant.
- `_LANG_KEYWORDS`, `_SDH_TOKENS`, `_FORCED_TOKENS`, `SUBTITLE_EXTENSIONS`
  hardcoded dicts in `alfred/domain/subtitles/scanner.py` — all knowledge now
  lives in YAML (CLAUDE.md compliance).
- `_MIN_MOVIE_SIZE_BYTES` module-level constant in
  `alfred/domain/movies/services.py` — replaced by the new setting.
- Top-level `languages:` block in `subtitles.yaml` — superseded by
  `language_tokens:` (subtitle-specific only) since iso_languages.yaml is the
  canonical source.

### Fixed

- **`hi` token no longer marks a subtitle as SDH** (it conflicted with the
  ISO 639-1 alias for Hindi). SDH is now detected only via `sdh`, `cc`, and
  `hearing` tokens.
- `SubtitleKnowledgeBase` default rules used `"fra"` while
  `iso_languages.yaml` exposes French as `"fre"` — preferred languages
  defaults now match the canonical form.

### Internal

- **Domain I/O extraction** (`refactor/domain-io-extraction`): the domain
  layer no longer performs subprocess calls, filesystem scans, or YAML
  loading. Achieved in a series of focused commits:
  - **Knowledge YAML loaders moved to infrastructure**:
    `alfred/domain/release/knowledge.py`,
    `alfred/domain/shared/knowledge/language_registry.py`, and
    `alfred/domain/subtitles/knowledge/{base,loader}.py` relocated to
    `alfred/infrastructure/knowledge/`. Re-exports were dropped — callers
    import directly from the new location.
  - **`MediaProber` and `FilesystemScanner` Protocol ports** introduced at
    `alfred/domain/shared/ports/` with frozen-dataclass DTOs
    (`SubtitleStreamInfo`, `FileEntry`). `SubtitleIdentifier` and
    `PatternDetector` are now constructor-injected with concrete adapters
    (`FfprobeMediaProber` wrapping `subprocess.run(ffprobe)` and
    `PathlibFilesystemScanner` wrapping `pathlib`). No more direct
    `subprocess`/`pathlib` usage from the subtitle domain services.
  - **Live filesystem methods removed from VOs and entities**:
    `FilePath.exists()` / `.is_file()` / `.is_dir()` deleted —
    `FilePath` is now a pure address VO. `Movie.has_file()` and
    `Episode.is_downloaded()` dropped. Callers either rely on a prior
    detection step or use try/except over pre-checks (eliminates
    TOCTOU races).
  - **`SubtitlePlacer` moved to the application layer** at
    `alfred/application/subtitles/placer.py` — it performs `os.link`
    I/O, which doesn't belong in the domain. Pre-checks replaced with
    try/except for `FileNotFoundError`/`FileExistsError`.
  - **`SubtitleRuleSet.resolve()` no longer reaches into the knowledge
    base**: the implicit `DEFAULT_RULES()` helper is gone, replaced by
    an explicit `default_rules: SubtitleMatchingRules` parameter. The
    `ManageSubtitles` use case loads defaults from the KB once and
    passes them in.
  - **`SubtitleKnowledge` Protocol port** at
    `alfred/domain/subtitles/ports/knowledge.py` declares the read-only
    query surface domain services consume (7 methods:
    `known_extensions`, `format_for_extension`, `language_for_token`,
    `is_known_lang_token`, `type_for_token`, `is_known_type_token`,
    `patterns`). `SubtitleIdentifier` and `PatternDetector` depend on
    this Protocol instead of the concrete `SubtitleKnowledgeBase` from
    infrastructure — `domain/subtitles/` now has zero imports from
    `infrastructure/`. The remaining domain → infra leak
    (`domain/release/` loading separator YAML at import-time) is
    documented in tech-debt and scheduled for its own branch.
- **`to_dot_folder_name(title)` helper** in
  `alfred/domain/shared/value_objects.py` — extracts the
  `re.sub(r"[^\w\s\.\-]", "", title).replace(" ", ".")` pattern that was
  duplicated between `MovieTitle.normalized()` and `TVShow.get_folder_name()`.
- **`ParsedRelease.languages` uses `field(default_factory=list)`** instead of
  a manual `__post_init__` that assigned `[]` via `object.__setattr__`.
- **`file_extensions.yaml` splits subtitle sidecars (`.srt`, `.sub`, `.idx`,
  `.ass`, `.ssa`) into a dedicated `subtitle:` category** instead of lumping
  them under `metadata:`. The `_METADATA_EXTENSIONS` set used by
  `detect_media_type` remains the union of both (same behavior — subtitles
  are still ignored when deciding the media type of a folder), but a new
  `load_subtitle_extensions()` loader is now available for the subtitles
  domain. Sematic clarity, no functional change.
- **`tv_shows/entities.py` module docstring** now shows the aggregate
  ownership as an ASCII tree before the rule text — quicker visual scan
  of the DDD structure.
- Removed backward-compat shims `_sanitise_for_fs` /
  `_strip_episode_from_normalised` from `domain/release/value_objects.py`
  (zero callers).
- Cleaned ruff warnings across the codebase: `subprocess.run` calls now pass
  explicit `check=False` (PLW1510); lazy imports promoted to module top where
  there was no cycle (PLC0415 in `manage_subtitles.py`, `placer.py`,
  `qbittorrent/client.py`, `file_manager.py`); fixed module-level import
  ordering (E402) in `language_registry.py` and `subtitles/knowledge/loader.py`;
  removed unused locals (F841 / B007); replaced unnecessary set comprehension
  with `set()` in `release/knowledge.py` (C416).
- Ruff config: ignore `PLR0911` / `PLR0912` (too-many-returns / too-many-branches)
  globally — noisy on parser mappers and orchestrator use-cases where early-return
  validation is essential complexity. Ignore `PLW0603` for the documented memory
  singleton (`infrastructure/persistence/context.py`).
- **Release-knowledge DDD purification** (`refactor/domain-release-knowledge`):
  the last domain → infrastructure leak (`domain/release/value_objects.py`
  loading YAML at import-time) is gone. Achieved via:
  - **`ReleaseKnowledge` Protocol port** at
    `alfred/domain/release/ports/knowledge.py` declares the read-only query
    surface release parsing needs (token sets for resolutions, sources, codecs,
    languages, hdr extras; structured dicts for audio, video_meta, editions,
    media_type_tokens; separators list; file-extension sets used by
    application/infra callers; `sanitize_for_fs(text)` method).
  - **`YamlReleaseKnowledge` adapter** at
    `alfred/infrastructure/knowledge/release_kb.py` loads every YAML constant
    once at construction. Builds an immutable `str.maketrans` translation
    table for filesystem sanitization.
  - **`parse_release(name, kb)`** takes the knowledge as an explicit
    parameter — no more module-level YAML loading inside the domain. Every
    internal helper (`_tokenize`, `_extract_tech`, `_extract_languages`,
    `_extract_audio`, `_extract_video_meta`, `_extract_edition`,
    `_extract_title`, `_infer_media_type`, `_is_well_formed`) takes `kb`.
  - **`ParsedRelease` Option B**: sanitization happens once at parse time
    and is stored on a new `title_sanitized: str` field. Builder methods
    (`show_folder_name`, `season_folder_name`, `episode_filename`,
    `movie_folder_name`, `movie_filename`) are now pure — they accept
    already-sanitized `tmdb_title_safe` / `tmdb_episode_title_safe`
    arguments. Callers at the use-case boundary sanitize TMDB strings
    via `kb.sanitize_for_fs(...)` before passing them in.
  - **All domain-knowledge constants removed from `value_objects.py`**:
    `_RESOLUTIONS`, `_SOURCES`, `_CODECS`, `_AUDIO`, `_VIDEO_META`,
    `_EDITIONS`, `_HDR_EXTRA`, `_MEDIA_TYPE_TOKENS`, `_LANGUAGE_TOKENS`,
    `_FORBIDDEN_CHARS`, `_VIDEO_EXTENSIONS`, `_NON_VIDEO_EXTENSIONS`,
    `_SUBTITLE_EXTENSIONS`, `_METADATA_EXTENSIONS`, `_WIN_FORBIDDEN_TABLE`,
    and the `_sanitize_for_fs` helper. The domain module is now pure.
  - **Application-layer KB singleton**: `resolve_destination.py` instantiates
    a module-level `_KB: ReleaseKnowledge = YamlReleaseKnowledge()` and
    threads it through every `parse_release(...)` call. The local
    `_sanitize` helper and `_WIN_FORBIDDEN` regex were dropped in favor of
    `_KB.sanitize_for_fs(...)`.
  - **`detect_media_type(parsed, source_path, kb)` and
    `find_video_file(path, kb)`** now take the knowledge explicitly
    instead of importing `_*_EXTENSIONS` constants from the domain.
    `agent/tools/filesystem.py::analyze_release` imports the application
    KB singleton and passes it through.

---

## [2026-05-17] — TVShow & Movie aggregate refactor

Multi-phase refonte of the TV show domain into a real DDD aggregate, with
matching parity work on `Movie`, a language knowledge system, and the
`shared/media` restructure that supports both.

### Added

- **Language knowledge system** (`alfred/knowledge/iso_languages.yaml` + 42
  languages including `und` for undetermined).
  - `Language` value object (frozen dataclass) with `iso`, `english_name`,
    `native_name`, `aliases`, and a `matches(raw)` cross-format helper.
  - `LanguageRegistry` loader (`alfred/domain/shared/knowledge/`) merging
    builtin + learned YAML. Not a singleton — the application layer
    instantiates it.
  - ISO 639-2/B is the canonical key; aliases cover 639-1, 639-2/T, English
    name, native name, and common spellings.
- **`VideoTrack`** dataclass (`alfred/domain/shared/media/video.py`) with a
  `resolution` property using width-priority bucket detection (handles
  cinema/scope crops like 1920×960 → 1080p).
- **`shared/media/matching.py`** — `track_lang_matches` helper shared by
  `Episode` and `Movie`. Implements the **"C+" contract** for language helpers:
  - `Language` query → cross-format match via `Language.matches()`
  - `str` query → case-insensitive direct comparison (no normalization)
- **TVShow aggregate composition**:
  - `TVShow.seasons: dict[SeasonNumber, Season]`
  - `Season.episodes: dict[EpisodeNumber, Episode]`
  - `Season.expected_episodes` / `Season.aired_episodes` (split so collection
    state can compare "owned vs aired today" without confusing in-flight
    seasons with future ones)
- **Aggregate methods on `TVShow`**:
  - `add_episode(ep)` — sole sanctioned mutation entry point (creates the
    season if missing)
  - `add_season(season)` — replaces a season wholesale
  - `collection_status()` → `CollectionStatus.{EMPTY, PARTIAL, COMPLETE}`
  - `is_complete_series()` — true iff `ENDED + COMPLETE`
  - `missing_episodes()` — flat list of all aired-but-not-owned
    `(season, episode)` pairs
- **`CollectionStatus`** enum (orthogonal to `ShowStatus`).
- **Episode track helpers** (`has_audio_in`, `has_subtitles_in`,
  `has_forced_subs`, `audio_languages`, `subtitle_languages`), driven by
  `Episode.audio_tracks` / `Episode.subtitle_tracks`.
- **Movie aggregate parity** — `Movie` now carries `audio_tracks` /
  `subtitle_tracks` and exposes the same helpers as `Episode` (same C+
  contract).
- **`CHANGELOG.md`** (this file).

### Changed

- **`shared/media_info.py` exploded into `shared/media/{audio,video,subtitle,info,matching}.py`.**
  `MediaInfo` is now symmetric: every stream type is a `list[Track]`. Flat
  accessors (`width`, `height`, `video_codec`, `resolution`) remain as
  properties that read the first video track.
- **`MediaInfo.duration_seconds` / `bitrate_kbps`** moved from `VideoTrack` to
  `MediaInfo` (file-level — they come from the ffprobe `format` block, not a
  stream). Files without a video stream now correctly expose duration.
- **`ShowStatus.from_string`** extended to map TMDB strings (`Returning
  Series`, `In Production`, `Pilot`, `Planned`, `Canceled`, `Cancelled`).
  Comparison is whitespace-trimmed and case-insensitive.
- **`Season` / `Episode`** dropped their `show_imdb_id` back-references. They
  are owned by `TVShow` and reached only through it.
- **`TVShow.seasons_count` and `episode_count`** are now `@property` (computed
  from the dict) instead of stored ints.
- **`TVShowService.parse_episode_from_filename`** rewritten in string
  operations (no regex). Supports `S01E05` / `s1e5` and `1x05` / `01x5` forms.
- **`TVShowService.find_next_episode`** now drives off
  `show.missing_episodes()` instead of the hardcoded "max 50 episodes per
  season" heuristic.
- **`TVShowService` constructor** no longer takes `season_repository` /
  `episode_repository` — the aggregate persists in one block via
  `TVShowRepository` only.
- **`SubtitleTrack` in `alfred.domain.subtitles.entities` renamed to
  `SubtitleCandidate`.** Coexists with the `shared.media.SubtitleTrack`
  ffprobe-view dataclass (different bounded contexts, kept separate
  intentionally).
- **`tv_shows/services.py` `_VIDEO_EXTENSIONS`** now loaded from
  `knowledge/release/file_extensions.yaml` via `load_video_extensions()`
  (single source of truth).
- **`CLAUDE.md`** updated with three new policy sections:
  - "Tests" — small updates OK during normal work, no mass-update sprees
  - "Backwards-compatibility shims" — prefer clean migration over shims
  - "Regex" — not forbidden, use judgment when string ops would be fragile

### Removed

- **Legacy `Season N Episode N` filename form** in
  `TVShowService.parse_episode_from_filename`. It never appears in the release
  names Alfred handles, and supporting it forced a regex.
- **`SeasonRepository` and `EpisodeRepository`** — only the aggregate root has
  a repository (DDD rule: one repo per aggregate).
- **`shared/media_info.py`** compatibility shim — callers updated.
- **`SubtitleTrack` compatibility alias** in `subtitles.entities` — callers
  updated to `SubtitleCandidate`.

### Fixed

- **`MediaInfo.duration_seconds` returns `None` on audio-only files** instead
  of crashing through `primary_video.duration_seconds` (see the duration/bitrate
  move under **Changed**).
- **`MediaOrganizer`** (`infrastructure/filesystem/organizer.py`) no longer
  passes the removed `show_imdb_id` / `episode_count` kwargs when constructing
  a `Season` for folder-name generation.

### Internal

- Test suite rewritten where the aggregate redesign broke fixtures:
  `tests/domain/test_tv_shows.py` (69 tests), `tests/domain/test_media_info.py`
  (rewritten for `VideoTrack`), `tests/application/test_enrich_from_probe.py`
  (helper added), `tests/infrastructure/test_filesystem_extras.py` (fixtures),
  `tests/domain/test_tv_shows_service.py` (find_next_episode driven by real
  aggregate state).
- Subtitle services internal migration: `matcher.py`, `utils.py`, `placer.py`,
  `identifier.py` updated to import `SubtitleCandidate`.
- Suite status at end of block: **1066 passed, 8 skipped, 0 failed**.