Files
alfred/CHANGELOG.md
T
francwa a09262b33f chore(settings): remove unused min_movie_size_bytes
Le champ + son validator étaient orphelins depuis la suppression
de MovieService.validate_movie_file. L'exclusion par extension
(application/release/supported_media.py) + le PoP couvrent désormais
la règle 'vrai film vs sample'. Si on a un jour besoin d'un seuil de
taille, il ira dans data/knowledge/, pas dans settings.
2026-05-20 23:41:41 +02:00

585 lines
32 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Changelog
All notable changes to Alfred are documented here.
The format is loosely based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
Alfred is not yet on SemVer — entries are grouped by **dated work blocks** instead
of release numbers. Granularity targets behavioral or API-visible changes; refer
to `git log` for commit-level detail.
Sections used per block: **Added** / **Changed** / **Deprecated** / **Removed** /
**Fixed** / **Internal** (for tech-debt and refactor noise that doesn't affect
callers).
---
## [Unreleased]
### Removed
- **`settings.min_movie_size_bytes` retiré.** Le champ Pydantic + son
validator n'avaient plus aucun consommateur (l'ancien
`MovieService.validate_movie_file` ayant été supprimé lors d'une
refonte précédente). La règle "est-ce un vrai film ou un sample"
est désormais portée par l'exclusion par extension
(`application/release/supported_media.py`) et le PoP. Si on a un
jour besoin d'un seuil de taille, il ira dans un YAML knowledge,
pas dans `settings`.
### Fixed
- **Multi-episode chain (e.g. `S14E09E10E11`) now collapses to a full
range.** The parser previously captured `episode=9, episode_end=10`
and dropped E11+. It now returns `episode=first, episode_end=last`,
with intermediate values implied. Fixture
`shitty/archer_multi_episode/` updated from anti-regression-of-bug
to anti-regression-of-fix.
- **Apostrophes in titles no longer push the release through the AI
fallback.** `Honey.Don't.2025.2160p.WEBRip.DSNP.DV.HDR.x265-Amen`
previously parsed with `parse_path="ai"` and everything UNKNOWN
because `'` is in the forbidden-chars list. Apostrophes are now
pre-stripped before the well-formed check, so the parse completes
normally (`title=Honey.Dont, year=2025, quality=2160p, ...`); only
the title text loses its apostrophe. `parse_path` becomes
`sanitized` to surface the cleanup. Side win: PoP fixture
`the_prodigy_full_chaos/` also moves from total failure to a
partially-correct parse (year, source, codec extracted).
- **Season-range markers (`Sxx-yy`) are now recognized as
`tv_complete`.** `Der.Tatortreiniger.S01-06.GERMAN...` previously
parsed as `media_type=movie` with `S01-06` glued onto the title.
The parser now recognizes the range, sets `season=first`,
`media_type=tv_complete`, and removes the marker from the title.
`is_season_pack` flips to `true`.
- **Pure-punctuation TITLE tokens are dropped at assembly.** Releases
with surrounding ` - ` separators (`Vinyl - 1x01 - FHD`) previously
produced `title="Vinyl.-"`. Such tokens (a stray dash, a wide pipe
``, …) carry no title content and are now filtered out. Side
effect: PoP fixture `khruangbin_yt_wide_pipe/` also benefits — the
YouTube wide-pipe no longer leaks into the title.
### Added
- **`LanguageRepository` port** in `alfred.domain.shared.ports`. Structural
Protocol covering `from_iso`, `from_any`, `all`, `__contains__`, `__len__`
— the surface previously coupled to the concrete `LanguageRegistry`.
Mirrors the `MediaProber` / `FilesystemScanner` pattern: domain code
depends on the Protocol, infrastructure provides the YAML-backed
adapter. Tests in `tests/infrastructure/test_language_registry.py`.
### Internal
- **Flattened `alfred.domain.shared.media/` package into a single
`media.py` module.** The 6-file package (audio, video, subtitle,
info, matching, tracks_mixin + `__init__`) collapsed into one ~250
LoC module. All 12 import sites continue to resolve unchanged
(`from alfred.domain.shared.media import AudioTrack, MediaInfo, …`)
since Python treats `media.py` and `media/__init__.py`
interchangeably for import paths. Easier to scan when the whole
bounded-context fits on one screen.
- **`SubtitleKnowledgeBase` types `language_registry` against the
`LanguageRepository` port** instead of the concrete `LanguageRegistry`
class. The default constructor still instantiates the concrete adapter
when no repository is injected — behaviour is unchanged for existing
callers. Opens the door to in-memory fakes in future tests without
loading the full ISO 639 YAML.
- **Moved `detect_media_type` and `enrich_from_probe` from
`alfred.application.filesystem` to `alfred.application.release`**.
They are inspection-pipeline helpers — their natural home is next to
`inspect_release`, not next to the filesystem use cases. The move
also eliminates a circular-import workaround in
`resolve_destination.py`: `inspect_release` can now be imported at
module top instead of lazily inside `_resolve_parsed`. Public
surface is unchanged for callers that imported the helpers from
their full module paths (the only call sites — `inspect.py`, two
tests, one testing script — were updated in this commit).
### Added
- **`resolve_*_destination` use cases now consume `inspect_release`**.
`resolve_episode_destination` and `resolve_movie_destination` reuse
their existing `source_file` parameter as the inspection target;
`resolve_season_destination` and `resolve_series_destination` gain
a new **optional** `source_path` parameter (also threaded through
the tool wrappers and YAML specs). When the path exists, ffprobe
data fills tokens missing from the release name (e.g. quality) and
refreshes `tech_string`, so the destination folder / file names
end up more accurate. When the path is missing or absent (back-compat
callers), the use cases fall back to parse-only — same behavior as
before.
### Fixed
- **`enrich_from_probe` now refreshes `tech_string`** after filling
`quality` / `source` / `codec`. Previously the field stayed at its
parser-time value, so filename builders saw stale tech tokens even
after a successful probe. New `TestTechString` class in
`tests/application/test_enrich_from_probe.py` locks the behavior.
### Added
- **`inspect_release` orchestrator + `InspectedResult` VO**
(`alfred/application/release/inspect.py`). Single composition of the
four inspection layers: `parse_release``detect_media_type` (patches
`parsed.media_type`) → `find_main_video` (top-level scan) →
`prober.probe` + `enrich_from_probe` when a video exists and the
refined media type isn't in `{"unknown", "other"}`. Returns a frozen
`InspectedResult(parsed, report, source_path, main_video, media_info,
probe_used)` that downstream callers consume directly instead of
rebuilding the same chain. `kb` and `prober` are injected — no
module-level singletons. Never raises.
### Changed
- **`analyze_release` tool now delegates to `inspect_release`** — same
output shape, plus two new fields: `confidence` (0100) and `road`
(`"easy"` / `"shitty"` / `"path_of_pain"`) surfaced from the parser's
`ParseReport`. The tool spec (`specs/analyze_release.yaml`) documents
both fields so the LLM can route releases by confidence.
- **`MediaProber` port now covers full media probing**: added
`probe(video) -> MediaInfo | None` alongside the existing
`list_subtitle_streams`. `FfprobeMediaProber` (in
`alfred/infrastructure/probe/`) implements both methods and is now
the single adapter shelling out to `ffprobe`. The standalone
`alfred/infrastructure/filesystem/ffprobe.py` module was removed —
all callers (tools, testing scripts) instantiate
`FfprobeMediaProber` instead. Unblocks the upcoming
`inspect_release` orchestrator, which depends on the port.
### Removed
- `alfred/infrastructure/filesystem/ffprobe.py` (folded into the
`FfprobeMediaProber` adapter).
---
## [2026-05-20] — Release parser confidence scoring + exclusion
### Added
- **Pre-pipeline exclusion helpers** (`alfred/application/release/supported_media.py`):
`is_supported_video(path, kb)` (extension-only check against
`kb.video_extensions`) and `find_main_video(folder, kb)` (top-level
scan, lexicographically-first eligible file, returns `None` when no
video qualifies; accepts a bare file as folder for single-file
releases). No size threshold, no filename heuristics —
PATH_OF_PAIN handles the exotic cases. Foundation for the future
`inspect_release` orchestrator.
- **Release parser — parse-confidence scoring** (`alfred/domain/release/parser/scoring.py`,
`alfred/knowledge/release/scoring.yaml`). `parse_release` now returns
`(ParsedRelease, ParseReport)`. The new `ParseReport` frozen VO
carries a 0100 `confidence`, a `road` (`"easy"` / `"shitty"` /
`"path_of_pain"`), the residual UNKNOWN tokens, and the missing
critical fields. EASY is decided structurally (a group schema
matched); SHITTY vs PATH_OF_PAIN is decided by score against a
YAML-configurable cutoff (default 60). Weights and penalties also
live in `scoring.yaml` — title 30, media_type 20, year 15, season
10, episode 5, tech 5 each; penalty 5 per UNKNOWN token capped at
-30. `Road` is a new enum, distinct from `ParsePath` (which records
the tokenization route, not the confidence tier). `ReleaseKnowledge`
port gains a `scoring: dict` field.
### Changed
- **`parse_release` signature** is now `(name, kb) → tuple[ParsedRelease,
ParseReport]` instead of returning a bare `ParsedRelease`. Call
sites updated in `application/filesystem/resolve_destination.py` and
`agent/tools/filesystem.py`. Tests updated accordingly.
---
## [2026-05-20] — Release parser v2 (EASY + SHITTY)
### Added
- **Release parser v2 — EASY path live** (`alfred/domain/release/parser/`):
new annotate-based pipeline (tokenize → annotate → assemble) drives
releases from known groups. Exposes `Token` (frozen VO with `index` +
`role` + `extra`), `TokenRole` enum (structural/technical/meta families),
and `GroupSchema` / `SchemaChunk` value objects.
- `pipeline.tokenize`: string-ops separator split (no regex), strips
a `[site.tag]` prefix/suffix first.
- `pipeline.annotate`: detects the trailing group right-to-left
(priority to `codec-GROUP` shape, fallback to any non-source dashed
token), looks up its `GroupSchema`, then walks tokens and schema
chunks in lockstep — optional chunks that don't match are skipped,
mandatory mismatches abort EASY and return `None` so the caller can
fall back to SHITTY.
- `pipeline.assemble`: folds annotated tokens into a
`ParsedRelease`-compatible dict.
- `parse_release` (in `release.services`) tries the v2 EASY path first
and falls through to the legacy SHITTY heuristic on `None`. Legacy
SHITTY/PATH OF PAIN behavior is unchanged.
- Knowledge: `alfred/knowledge/release/release_groups/{kontrast,elite,
rarbg}.yaml` declare the canonical chunk order per group, loaded via
new `ReleaseKnowledge.group_schema(name)` port method.
- Tests in `tests/domain/release/test_parser_v2_{scaffolding,easy}.py`
cover token VOs, site-tag stripping, group detection, schema-driven
annotation (movie, TV episode, season pack with optional source),
and field assembly.
- **Release parser v2 — enricher pass** completes the EASY pipeline.
The structural schema walk now tolerates non-positional tokens
between chunks (instead of aborting on leftover tokens), and a second
pass tags them with audio / video-meta / edition / language roles.
Multi-token sequences from `audio.yaml`, `video.yaml`, `editions.yaml`
(e.g. `DTS.HD.MA`, `DV.HDR10`, `TrueHD.Atmos`, `DIRECTORS.CUT`) are
matched before single tokens. Channel layouts like `5.1` and `7.1`
(split into two tokens by the `.` separator) are detected as
consecutive pairs. Sequence members carry an `extra["sequence_member"]`
marker so `assemble` extracts the canonical value only from the
primary token. KONTRAST releases with audio / HDR / edition / language
metadata now produce a fully populated `ParsedRelease`.
- **Streaming distributor as a separate dimension** from encoding source.
New `alfred/knowledge/release/distributors.yaml` (NF, AMZN, DSNP, HMAX,
ATVP, HULU, PCOK, PMTP, CR) feeds a new `ReleaseKnowledge.distributors`
port field, a `TokenRole.DISTRIBUTOR` annotation, and a
`ParsedRelease.distributor` field. `WEB-DL` stays the source; the
platform that produced the release is now recorded distinctly. The
five entries (NF, AMZN, DSNP, HMAX, ATVP) were correspondingly removed
from `sources.yaml`.
- **Real-world release fixtures** under `tests/fixtures/releases/{easy,shitty,path_of_pain}/`,
each documenting an expected `ParsedRelease` plus the future `routing`
(library / torrents / seed_hardlinks) for the upcoming `organize_media`
refactor. EASY bucket seeded with 5 cases (movie, single-episode, season
pack, movie + noise, YTS bracket-heavy). SHITTY bucket seeded with 15
anti-regression cases covering: 3-level INTEGRALE hierarchy (Angel),
French custom titles (Buffy, La Nuit au Musée, Chérie j'ai agrandi),
multi-episode chain `S14E09E10E11` (Archer, captures E11 loss),
lowercase `s01e01` (Notre Planète), `NxNN` with ` - ` separators
(Vinyl, captures dash artifact), title-with-year-suffix (Deutschland.83),
season-range `S01-06` (Tatortreiniger, captures movie misclassification),
bare folder name (Jurassic Park,
media_type=unknown), apostrophe-in-name (Honey Don't, captures full AI-path
degeneration), SUBS-tag movie (Hook), space separators (Predator Badlands,
captures group=UNKNOWN), subs-only release (Westworld S04).
PATH OF PAIN bucket seeded with 10 worst-case fixtures covering:
UTF-8 wide pipe yt-dlp slug (Khruangbin), 3-show franchise box-set
with double season range and parens-wrapped tech (Deutschland 83-86-89,
captures `group=S03` misdetection), accented chars in title (Chérie
BéBé with VFF), 8-word stand-up comedy title (Jimmy Carr), site-tag
prefix + XviD (OxTorrent), episode title + air-date silently lost
(Prodiges), full-chaos apostrophe + spaces + Blu-ray dash + 1080i +
multi-word audio codec (The Prodigy, full AI-path degeneration),
yt-dlp YouTube ID glued to year (Sleaford Mods), bilingual `[FR-EN]`
tag mistaken for group (Super Mario Bros), COMPLETE + S01-S07 range +
REPACK + HEVC (Gilmore Girls, the well-behaved exception).
Parametrized over `tests/domain/test_release_fixtures.py` for anti-regression.
- **`NxNN` alt season/episode form supported** by `parse_release`. Releases like
`Show.1x05.720p.HDTV.x264-GRP` and `Show.2x07x08.1080p.WEB.x265-GRP` (multi-ep
alt form) now parse as TV shows.
- **`alfred/knowledge/release/separators.yaml`** declares the token separators
used by the release-name tokenizer (`.`, ` `, `[`, `]`, `(`, `)`, `_`). New
conventions can be added without code changes. The canonical `.` is always
present even if missing from YAML.
### Changed
- **Release parser v2 — SHITTY simplified to dict-driven tagging**.
The legacy ~480-line heuristic block in `release/services.py` is gone;
`pipeline._annotate_shitty` does a single pass that looks each token
up in the kb buckets (resolutions / sources / codecs / distributors /
year / `SxxExx`) with first-match-wins semantics, and the leftmost
contiguous UNKNOWN run becomes the title. `annotate()` no longer
returns `None` — SHITTY is the always-on fallback when no group schema
matches. `services.py` shrunk from ~525 to ~85 lines. Four fixtures
(`deutschland_franchise_box`, `sleaford_yt_slug`,
`super_mario_bilingual`, `predator_space_separators` — the last one
moved from `shitty/` → `path_of_pain/`) are now marked
`pytest.mark.xfail(strict=False)` documenting PoP-grade pathologies
that SHITTY intentionally won't handle. `ReleaseFixture` grows an
`xfail_reason` field; the parametrized suite wires the xfail mark
automatically.
- **`parse_release` tokenizer is now data-driven**: it splits on any character
listed in `separators.yaml` (regex character class) instead of `name.split(".")`.
This makes YTS-style releases (`The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]`),
space-separated names (`Inception 2010 1080p BluRay x264-GROUP`), and
underscore-separated names parse correctly via the direct path — no more
fallback through sanitization.
- **`parse_release` flow simplified**: site-tag extraction always runs first
(so `parse_path == "sanitized"` now reliably indicates a stripped `[tag]`),
then well-formedness is checked only against truly forbidden chars
(anything not in the configured separator set).
- **ISO 639-2/B is now the canonical language code project-wide** (was a mix of
639-1 and 639-2/T):
- `SubtitlePreferences.languages` default is now `["fre", "eng"]` (was
`["fr", "en"]`). Old LTM files are not auto-migrated — delete
`data/memory/ltm.json` to regenerate with the new defaults.
- Subtitle output filenames are now `{iso639_2b}.srt` (e.g. `fre.srt`,
`fre.sdh.srt`). Existing `fr.srt` files are still **read** correctly
(recognized as French via alias) but new files are written canonically.
- `Language` value object docstring corrected: it has always stored 639-2/B
(matching what ffprobe emits), not 639-2/T as previously documented.
- **`MovieService.validate_movie_file` minimum size is now configurable** via
`settings.min_movie_size_bytes` (default unchanged: 100 MB). Constructor
accepts an optional `min_movie_size_bytes` override for tests.
- **`SubtitleKnowledgeBase` delegates language lookup to `LanguageRegistry`**
rather than duplicating tokens. `subtitles.yaml` now only declares
subtitle-specific tokens (e.g. `vostfr`, `vf`, `vff`) under a new
`language_tokens` section.
### Removed
- **`alfred/domain/tv_shows/services.py`** and **`alfred/domain/movies/services.py`**
deleted entirely. They held fossil parsers (`parse_episode_filename`,
`extract_movie_metadata`, …) with zero production callers — superseded by
`parse_release` as the single source of truth for release-name parsing.
Associated tests (`tests/domain/test_movies.py`, `tests/domain/test_tv_shows_service.py`)
removed as well.
- `_sanitize` and `_normalize` helpers in `alfred/domain/release/services.py` —
the new tokenizer makes them redundant.
- `_LANG_KEYWORDS`, `_SDH_TOKENS`, `_FORCED_TOKENS`, `SUBTITLE_EXTENSIONS`
hardcoded dicts in `alfred/domain/subtitles/scanner.py` — all knowledge now
lives in YAML (CLAUDE.md compliance).
- `_MIN_MOVIE_SIZE_BYTES` module-level constant in
`alfred/domain/movies/services.py` — replaced by the new setting.
- Top-level `languages:` block in `subtitles.yaml` — superseded by
`language_tokens:` (subtitle-specific only) since iso_languages.yaml is the
canonical source.
### Fixed
- **`hi` token no longer marks a subtitle as SDH** (it conflicted with the
ISO 639-1 alias for Hindi). SDH is now detected only via `sdh`, `cc`, and
`hearing` tokens.
- `SubtitleKnowledgeBase` default rules used `"fra"` while
`iso_languages.yaml` exposes French as `"fre"` — preferred languages
defaults now match the canonical form.
### Internal
- **Domain I/O extraction** (`refactor/domain-io-extraction`): the domain
layer no longer performs subprocess calls, filesystem scans, or YAML
loading. Achieved in a series of focused commits:
- **Knowledge YAML loaders moved to infrastructure**:
`alfred/domain/release/knowledge.py`,
`alfred/domain/shared/knowledge/language_registry.py`, and
`alfred/domain/subtitles/knowledge/{base,loader}.py` relocated to
`alfred/infrastructure/knowledge/`. Re-exports were dropped — callers
import directly from the new location.
- **`MediaProber` and `FilesystemScanner` Protocol ports** introduced at
`alfred/domain/shared/ports/` with frozen-dataclass DTOs
(`SubtitleStreamInfo`, `FileEntry`). `SubtitleIdentifier` and
`PatternDetector` are now constructor-injected with concrete adapters
(`FfprobeMediaProber` wrapping `subprocess.run(ffprobe)` and
`PathlibFilesystemScanner` wrapping `pathlib`). No more direct
`subprocess`/`pathlib` usage from the subtitle domain services.
- **Live filesystem methods removed from VOs and entities**:
`FilePath.exists()` / `.is_file()` / `.is_dir()` deleted —
`FilePath` is now a pure address VO. `Movie.has_file()` and
`Episode.is_downloaded()` dropped. Callers either rely on a prior
detection step or use try/except over pre-checks (eliminates
TOCTOU races).
- **`SubtitlePlacer` moved to the application layer** at
`alfred/application/subtitles/placer.py` — it performs `os.link`
I/O, which doesn't belong in the domain. Pre-checks replaced with
try/except for `FileNotFoundError`/`FileExistsError`.
- **`SubtitleRuleSet.resolve()` no longer reaches into the knowledge
base**: the implicit `DEFAULT_RULES()` helper is gone, replaced by
an explicit `default_rules: SubtitleMatchingRules` parameter. The
`ManageSubtitles` use case loads defaults from the KB once and
passes them in.
- **`SubtitleKnowledge` Protocol port** at
`alfred/domain/subtitles/ports/knowledge.py` declares the read-only
query surface domain services consume (7 methods:
`known_extensions`, `format_for_extension`, `language_for_token`,
`is_known_lang_token`, `type_for_token`, `is_known_type_token`,
`patterns`). `SubtitleIdentifier` and `PatternDetector` depend on
this Protocol instead of the concrete `SubtitleKnowledgeBase` from
infrastructure — `domain/subtitles/` now has zero imports from
`infrastructure/`. The remaining domain → infra leak
(`domain/release/` loading separator YAML at import-time) is
documented in tech-debt and scheduled for its own branch.
- **`to_dot_folder_name(title)` helper** in
`alfred/domain/shared/value_objects.py` — extracts the
`re.sub(r"[^\w\s\.\-]", "", title).replace(" ", ".")` pattern that was
duplicated between `MovieTitle.normalized()` and `TVShow.get_folder_name()`.
- **`ParsedRelease.languages` uses `field(default_factory=list)`** instead of
a manual `__post_init__` that assigned `[]` via `object.__setattr__`.
- **`file_extensions.yaml` splits subtitle sidecars (`.srt`, `.sub`, `.idx`,
`.ass`, `.ssa`) into a dedicated `subtitle:` category** instead of lumping
them under `metadata:`. The `_METADATA_EXTENSIONS` set used by
`detect_media_type` remains the union of both (same behavior — subtitles
are still ignored when deciding the media type of a folder), but a new
`load_subtitle_extensions()` loader is now available for the subtitles
domain. Sematic clarity, no functional change.
- **`tv_shows/entities.py` module docstring** now shows the aggregate
ownership as an ASCII tree before the rule text — quicker visual scan
of the DDD structure.
- Removed backward-compat shims `_sanitise_for_fs` /
`_strip_episode_from_normalised` from `domain/release/value_objects.py`
(zero callers).
- Cleaned ruff warnings across the codebase: `subprocess.run` calls now pass
explicit `check=False` (PLW1510); lazy imports promoted to module top where
there was no cycle (PLC0415 in `manage_subtitles.py`, `placer.py`,
`qbittorrent/client.py`, `file_manager.py`); fixed module-level import
ordering (E402) in `language_registry.py` and `subtitles/knowledge/loader.py`;
removed unused locals (F841 / B007); replaced unnecessary set comprehension
with `set()` in `release/knowledge.py` (C416).
- Ruff config: ignore `PLR0911` / `PLR0912` (too-many-returns / too-many-branches)
globally — noisy on parser mappers and orchestrator use-cases where early-return
validation is essential complexity. Ignore `PLW0603` for the documented memory
singleton (`infrastructure/persistence/context.py`).
- **Release-knowledge DDD purification** (`refactor/domain-release-knowledge`):
the last domain → infrastructure leak (`domain/release/value_objects.py`
loading YAML at import-time) is gone. Achieved via:
- **`ReleaseKnowledge` Protocol port** at
`alfred/domain/release/ports/knowledge.py` declares the read-only query
surface release parsing needs (token sets for resolutions, sources, codecs,
languages, hdr extras; structured dicts for audio, video_meta, editions,
media_type_tokens; separators list; file-extension sets used by
application/infra callers; `sanitize_for_fs(text)` method).
- **`YamlReleaseKnowledge` adapter** at
`alfred/infrastructure/knowledge/release_kb.py` loads every YAML constant
once at construction. Builds an immutable `str.maketrans` translation
table for filesystem sanitization.
- **`parse_release(name, kb)`** takes the knowledge as an explicit
parameter — no more module-level YAML loading inside the domain. Every
internal helper (`_tokenize`, `_extract_tech`, `_extract_languages`,
`_extract_audio`, `_extract_video_meta`, `_extract_edition`,
`_extract_title`, `_infer_media_type`, `_is_well_formed`) takes `kb`.
- **`ParsedRelease` Option B**: sanitization happens once at parse time
and is stored on a new `title_sanitized: str` field. Builder methods
(`show_folder_name`, `season_folder_name`, `episode_filename`,
`movie_folder_name`, `movie_filename`) are now pure — they accept
already-sanitized `tmdb_title_safe` / `tmdb_episode_title_safe`
arguments. Callers at the use-case boundary sanitize TMDB strings
via `kb.sanitize_for_fs(...)` before passing them in.
- **All domain-knowledge constants removed from `value_objects.py`**:
`_RESOLUTIONS`, `_SOURCES`, `_CODECS`, `_AUDIO`, `_VIDEO_META`,
`_EDITIONS`, `_HDR_EXTRA`, `_MEDIA_TYPE_TOKENS`, `_LANGUAGE_TOKENS`,
`_FORBIDDEN_CHARS`, `_VIDEO_EXTENSIONS`, `_NON_VIDEO_EXTENSIONS`,
`_SUBTITLE_EXTENSIONS`, `_METADATA_EXTENSIONS`, `_WIN_FORBIDDEN_TABLE`,
and the `_sanitize_for_fs` helper. The domain module is now pure.
- **Application-layer KB singleton**: `resolve_destination.py` instantiates
a module-level `_KB: ReleaseKnowledge = YamlReleaseKnowledge()` and
threads it through every `parse_release(...)` call. The local
`_sanitize` helper and `_WIN_FORBIDDEN` regex were dropped in favor of
`_KB.sanitize_for_fs(...)`.
- **`detect_media_type(parsed, source_path, kb)` and
`find_video_file(path, kb)`** now take the knowledge explicitly
instead of importing `_*_EXTENSIONS` constants from the domain.
`agent/tools/filesystem.py::analyze_release` imports the application
KB singleton and passes it through.
---
## [2026-05-17] — TVShow & Movie aggregate refactor
Multi-phase refonte of the TV show domain into a real DDD aggregate, with
matching parity work on `Movie`, a language knowledge system, and the
`shared/media` restructure that supports both.
### Added
- **Language knowledge system** (`alfred/knowledge/iso_languages.yaml` + 42
languages including `und` for undetermined).
- `Language` value object (frozen dataclass) with `iso`, `english_name`,
`native_name`, `aliases`, and a `matches(raw)` cross-format helper.
- `LanguageRegistry` loader (`alfred/domain/shared/knowledge/`) merging
builtin + learned YAML. Not a singleton — the application layer
instantiates it.
- ISO 639-2/B is the canonical key; aliases cover 639-1, 639-2/T, English
name, native name, and common spellings.
- **`VideoTrack`** dataclass (`alfred/domain/shared/media/video.py`) with a
`resolution` property using width-priority bucket detection (handles
cinema/scope crops like 1920×960 → 1080p).
- **`shared/media/matching.py`** — `track_lang_matches` helper shared by
`Episode` and `Movie`. Implements the **"C+" contract** for language helpers:
- `Language` query → cross-format match via `Language.matches()`
- `str` query → case-insensitive direct comparison (no normalization)
- **TVShow aggregate composition**:
- `TVShow.seasons: dict[SeasonNumber, Season]`
- `Season.episodes: dict[EpisodeNumber, Episode]`
- `Season.expected_episodes` / `Season.aired_episodes` (split so collection
state can compare "owned vs aired today" without confusing in-flight
seasons with future ones)
- **Aggregate methods on `TVShow`**:
- `add_episode(ep)` — sole sanctioned mutation entry point (creates the
season if missing)
- `add_season(season)` — replaces a season wholesale
- `collection_status()` → `CollectionStatus.{EMPTY, PARTIAL, COMPLETE}`
- `is_complete_series()` — true iff `ENDED + COMPLETE`
- `missing_episodes()` — flat list of all aired-but-not-owned
`(season, episode)` pairs
- **`CollectionStatus`** enum (orthogonal to `ShowStatus`).
- **Episode track helpers** (`has_audio_in`, `has_subtitles_in`,
`has_forced_subs`, `audio_languages`, `subtitle_languages`), driven by
`Episode.audio_tracks` / `Episode.subtitle_tracks`.
- **Movie aggregate parity** — `Movie` now carries `audio_tracks` /
`subtitle_tracks` and exposes the same helpers as `Episode` (same C+
contract).
- **`CHANGELOG.md`** (this file).
### Changed
- **`shared/media_info.py` exploded into `shared/media/{audio,video,subtitle,info,matching}.py`.**
`MediaInfo` is now symmetric: every stream type is a `list[Track]`. Flat
accessors (`width`, `height`, `video_codec`, `resolution`) remain as
properties that read the first video track.
- **`MediaInfo.duration_seconds` / `bitrate_kbps`** moved from `VideoTrack` to
`MediaInfo` (file-level — they come from the ffprobe `format` block, not a
stream). Files without a video stream now correctly expose duration.
- **`ShowStatus.from_string`** extended to map TMDB strings (`Returning
Series`, `In Production`, `Pilot`, `Planned`, `Canceled`, `Cancelled`).
Comparison is whitespace-trimmed and case-insensitive.
- **`Season` / `Episode`** dropped their `show_imdb_id` back-references. They
are owned by `TVShow` and reached only through it.
- **`TVShow.seasons_count` and `episode_count`** are now `@property` (computed
from the dict) instead of stored ints.
- **`TVShowService.parse_episode_from_filename`** rewritten in string
operations (no regex). Supports `S01E05` / `s1e5` and `1x05` / `01x5` forms.
- **`TVShowService.find_next_episode`** now drives off
`show.missing_episodes()` instead of the hardcoded "max 50 episodes per
season" heuristic.
- **`TVShowService` constructor** no longer takes `season_repository` /
`episode_repository` — the aggregate persists in one block via
`TVShowRepository` only.
- **`SubtitleTrack` in `alfred.domain.subtitles.entities` renamed to
`SubtitleCandidate`.** Coexists with the `shared.media.SubtitleTrack`
ffprobe-view dataclass (different bounded contexts, kept separate
intentionally).
- **`tv_shows/services.py` `_VIDEO_EXTENSIONS`** now loaded from
`knowledge/release/file_extensions.yaml` via `load_video_extensions()`
(single source of truth).
- **`CLAUDE.md`** updated with three new policy sections:
- "Tests" — small updates OK during normal work, no mass-update sprees
- "Backwards-compatibility shims" — prefer clean migration over shims
- "Regex" — not forbidden, use judgment when string ops would be fragile
### Removed
- **Legacy `Season N Episode N` filename form** in
`TVShowService.parse_episode_from_filename`. It never appears in the release
names Alfred handles, and supporting it forced a regex.
- **`SeasonRepository` and `EpisodeRepository`** — only the aggregate root has
a repository (DDD rule: one repo per aggregate).
- **`shared/media_info.py`** compatibility shim — callers updated.
- **`SubtitleTrack` compatibility alias** in `subtitles.entities` — callers
updated to `SubtitleCandidate`.
### Fixed
- **`MediaInfo.duration_seconds` returns `None` on audio-only files** instead
of crashing through `primary_video.duration_seconds` (see the duration/bitrate
move under **Changed**).
- **`MediaOrganizer`** (`infrastructure/filesystem/organizer.py`) no longer
passes the removed `show_imdb_id` / `episode_count` kwargs when constructing
a `Season` for folder-name generation.
### Internal
- Test suite rewritten where the aggregate redesign broke fixtures:
`tests/domain/test_tv_shows.py` (69 tests), `tests/domain/test_media_info.py`
(rewritten for `VideoTrack`), `tests/application/test_enrich_from_probe.py`
(helper added), `tests/infrastructure/test_filesystem_extras.py` (fixtures),
`tests/domain/test_tv_shows_service.py` (find_next_episode driven by real
aggregate state).
- Subtitle services internal migration: `matcher.py`, `utils.py`, `placer.py`,
`identifier.py` updated to import `SubtitleCandidate`.
- Suite status at end of block: **1066 passed, 8 skipped, 0 failed**.