e79ca462b8
enrich_from_probe fills None fields on ParsedRelease (quality, source, codec, audio_*, languages) but left tech_string at its parser-time value — so the filename builders (movie_folder_name, episode_filename, …) saw stale tech tokens even after a successful probe. Re-derive tech_string the same way the parser does — quality.source.codec joined by dots, skipping None — at the end of enrich_from_probe. Token- level values still win because enrich only fills None fields. Four new tests in TestTechString cover: enrichment rebuilds it, existing source survives, no-info input leaves it untouched, fully empty parsed produces ''.
493 lines
27 KiB
Markdown
493 lines
27 KiB
Markdown
# Changelog
|
||
|
||
All notable changes to Alfred are documented here.
|
||
|
||
The format is loosely based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
|
||
Alfred is not yet on SemVer — entries are grouped by **dated work blocks** instead
|
||
of release numbers. Granularity targets behavioral or API-visible changes; refer
|
||
to `git log` for commit-level detail.
|
||
|
||
Sections used per block: **Added** / **Changed** / **Deprecated** / **Removed** /
|
||
**Fixed** / **Internal** (for tech-debt and refactor noise that doesn't affect
|
||
callers).
|
||
|
||
---
|
||
|
||
## [Unreleased]
|
||
|
||
### Fixed
|
||
|
||
- **`enrich_from_probe` now refreshes `tech_string`** after filling
|
||
`quality` / `source` / `codec`. Previously the field stayed at its
|
||
parser-time value, so filename builders saw stale tech tokens even
|
||
after a successful probe. New `TestTechString` class in
|
||
`tests/application/test_enrich_from_probe.py` locks the behavior.
|
||
|
||
### Added
|
||
|
||
- **`inspect_release` orchestrator + `InspectedResult` VO**
|
||
(`alfred/application/release/inspect.py`). Single composition of the
|
||
four inspection layers: `parse_release` → `detect_media_type` (patches
|
||
`parsed.media_type`) → `find_main_video` (top-level scan) →
|
||
`prober.probe` + `enrich_from_probe` when a video exists and the
|
||
refined media type isn't in `{"unknown", "other"}`. Returns a frozen
|
||
`InspectedResult(parsed, report, source_path, main_video, media_info,
|
||
probe_used)` that downstream callers consume directly instead of
|
||
rebuilding the same chain. `kb` and `prober` are injected — no
|
||
module-level singletons. Never raises.
|
||
|
||
### Changed
|
||
|
||
- **`analyze_release` tool now delegates to `inspect_release`** — same
|
||
output shape, plus two new fields: `confidence` (0–100) and `road`
|
||
(`"easy"` / `"shitty"` / `"path_of_pain"`) surfaced from the parser's
|
||
`ParseReport`. The tool spec (`specs/analyze_release.yaml`) documents
|
||
both fields so the LLM can route releases by confidence.
|
||
|
||
- **`MediaProber` port now covers full media probing**: added
|
||
`probe(video) -> MediaInfo | None` alongside the existing
|
||
`list_subtitle_streams`. `FfprobeMediaProber` (in
|
||
`alfred/infrastructure/probe/`) implements both methods and is now
|
||
the single adapter shelling out to `ffprobe`. The standalone
|
||
`alfred/infrastructure/filesystem/ffprobe.py` module was removed —
|
||
all callers (tools, testing scripts) instantiate
|
||
`FfprobeMediaProber` instead. Unblocks the upcoming
|
||
`inspect_release` orchestrator, which depends on the port.
|
||
|
||
### Removed
|
||
|
||
- `alfred/infrastructure/filesystem/ffprobe.py` (folded into the
|
||
`FfprobeMediaProber` adapter).
|
||
|
||
---
|
||
|
||
## [2026-05-20] — Release parser confidence scoring + exclusion
|
||
|
||
### Added
|
||
|
||
- **Pre-pipeline exclusion helpers** (`alfred/application/release/supported_media.py`):
|
||
`is_supported_video(path, kb)` (extension-only check against
|
||
`kb.video_extensions`) and `find_main_video(folder, kb)` (top-level
|
||
scan, lexicographically-first eligible file, returns `None` when no
|
||
video qualifies; accepts a bare file as folder for single-file
|
||
releases). No size threshold, no filename heuristics —
|
||
PATH_OF_PAIN handles the exotic cases. Foundation for the future
|
||
`inspect_release` orchestrator.
|
||
|
||
- **Release parser — parse-confidence scoring** (`alfred/domain/release/parser/scoring.py`,
|
||
`alfred/knowledge/release/scoring.yaml`). `parse_release` now returns
|
||
`(ParsedRelease, ParseReport)`. The new `ParseReport` frozen VO
|
||
carries a 0–100 `confidence`, a `road` (`"easy"` / `"shitty"` /
|
||
`"path_of_pain"`), the residual UNKNOWN tokens, and the missing
|
||
critical fields. EASY is decided structurally (a group schema
|
||
matched); SHITTY vs PATH_OF_PAIN is decided by score against a
|
||
YAML-configurable cutoff (default 60). Weights and penalties also
|
||
live in `scoring.yaml` — title 30, media_type 20, year 15, season
|
||
10, episode 5, tech 5 each; penalty 5 per UNKNOWN token capped at
|
||
-30. `Road` is a new enum, distinct from `ParsePath` (which records
|
||
the tokenization route, not the confidence tier). `ReleaseKnowledge`
|
||
port gains a `scoring: dict` field.
|
||
|
||
### Changed
|
||
|
||
- **`parse_release` signature** is now `(name, kb) → tuple[ParsedRelease,
|
||
ParseReport]` instead of returning a bare `ParsedRelease`. Call
|
||
sites updated in `application/filesystem/resolve_destination.py` and
|
||
`agent/tools/filesystem.py`. Tests updated accordingly.
|
||
|
||
---
|
||
|
||
## [2026-05-20] — Release parser v2 (EASY + SHITTY)
|
||
|
||
### Added
|
||
|
||
- **Release parser v2 — EASY path live** (`alfred/domain/release/parser/`):
|
||
new annotate-based pipeline (tokenize → annotate → assemble) drives
|
||
releases from known groups. Exposes `Token` (frozen VO with `index` +
|
||
`role` + `extra`), `TokenRole` enum (structural/technical/meta families),
|
||
and `GroupSchema` / `SchemaChunk` value objects.
|
||
- `pipeline.tokenize`: string-ops separator split (no regex), strips
|
||
a `[site.tag]` prefix/suffix first.
|
||
- `pipeline.annotate`: detects the trailing group right-to-left
|
||
(priority to `codec-GROUP` shape, fallback to any non-source dashed
|
||
token), looks up its `GroupSchema`, then walks tokens and schema
|
||
chunks in lockstep — optional chunks that don't match are skipped,
|
||
mandatory mismatches abort EASY and return `None` so the caller can
|
||
fall back to SHITTY.
|
||
- `pipeline.assemble`: folds annotated tokens into a
|
||
`ParsedRelease`-compatible dict.
|
||
- `parse_release` (in `release.services`) tries the v2 EASY path first
|
||
and falls through to the legacy SHITTY heuristic on `None`. Legacy
|
||
SHITTY/PATH OF PAIN behavior is unchanged.
|
||
- Knowledge: `alfred/knowledge/release/release_groups/{kontrast,elite,
|
||
rarbg}.yaml` declare the canonical chunk order per group, loaded via
|
||
new `ReleaseKnowledge.group_schema(name)` port method.
|
||
- Tests in `tests/domain/release/test_parser_v2_{scaffolding,easy}.py`
|
||
cover token VOs, site-tag stripping, group detection, schema-driven
|
||
annotation (movie, TV episode, season pack with optional source),
|
||
and field assembly.
|
||
|
||
- **Release parser v2 — enricher pass** completes the EASY pipeline.
|
||
The structural schema walk now tolerates non-positional tokens
|
||
between chunks (instead of aborting on leftover tokens), and a second
|
||
pass tags them with audio / video-meta / edition / language roles.
|
||
Multi-token sequences from `audio.yaml`, `video.yaml`, `editions.yaml`
|
||
(e.g. `DTS.HD.MA`, `DV.HDR10`, `TrueHD.Atmos`, `DIRECTORS.CUT`) are
|
||
matched before single tokens. Channel layouts like `5.1` and `7.1`
|
||
(split into two tokens by the `.` separator) are detected as
|
||
consecutive pairs. Sequence members carry an `extra["sequence_member"]`
|
||
marker so `assemble` extracts the canonical value only from the
|
||
primary token. KONTRAST releases with audio / HDR / edition / language
|
||
metadata now produce a fully populated `ParsedRelease`.
|
||
|
||
- **Streaming distributor as a separate dimension** from encoding source.
|
||
New `alfred/knowledge/release/distributors.yaml` (NF, AMZN, DSNP, HMAX,
|
||
ATVP, HULU, PCOK, PMTP, CR) feeds a new `ReleaseKnowledge.distributors`
|
||
port field, a `TokenRole.DISTRIBUTOR` annotation, and a
|
||
`ParsedRelease.distributor` field. `WEB-DL` stays the source; the
|
||
platform that produced the release is now recorded distinctly. The
|
||
five entries (NF, AMZN, DSNP, HMAX, ATVP) were correspondingly removed
|
||
from `sources.yaml`.
|
||
|
||
- **Real-world release fixtures** under `tests/fixtures/releases/{easy,shitty,path_of_pain}/`,
|
||
each documenting an expected `ParsedRelease` plus the future `routing`
|
||
(library / torrents / seed_hardlinks) for the upcoming `organize_media`
|
||
refactor. EASY bucket seeded with 5 cases (movie, single-episode, season
|
||
pack, movie + noise, YTS bracket-heavy). SHITTY bucket seeded with 15
|
||
anti-regression cases covering: 3-level INTEGRALE hierarchy (Angel),
|
||
French custom titles (Buffy, La Nuit au Musée, Chérie j'ai agrandi),
|
||
multi-episode chain `S14E09E10E11` (Archer, captures E11 loss),
|
||
lowercase `s01e01` (Notre Planète), `NxNN` with ` - ` separators
|
||
(Vinyl, captures dash artifact), title-with-year-suffix (Deutschland.83),
|
||
season-range `S01-06` (Tatortreiniger, captures movie misclassification),
|
||
bare folder name (Jurassic Park,
|
||
media_type=unknown), apostrophe-in-name (Honey Don't, captures full AI-path
|
||
degeneration), SUBS-tag movie (Hook), space separators (Predator Badlands,
|
||
captures group=UNKNOWN), subs-only release (Westworld S04).
|
||
PATH OF PAIN bucket seeded with 10 worst-case fixtures covering:
|
||
UTF-8 wide pipe yt-dlp slug (Khruangbin), 3-show franchise box-set
|
||
with double season range and parens-wrapped tech (Deutschland 83-86-89,
|
||
captures `group=S03` misdetection), accented chars in title (Chérie
|
||
BéBé with VFF), 8-word stand-up comedy title (Jimmy Carr), site-tag
|
||
prefix + XviD (OxTorrent), episode title + air-date silently lost
|
||
(Prodiges), full-chaos apostrophe + spaces + Blu-ray dash + 1080i +
|
||
multi-word audio codec (The Prodigy, full AI-path degeneration),
|
||
yt-dlp YouTube ID glued to year (Sleaford Mods), bilingual `[FR-EN]`
|
||
tag mistaken for group (Super Mario Bros), COMPLETE + S01-S07 range +
|
||
REPACK + HEVC (Gilmore Girls, the well-behaved exception).
|
||
Parametrized over `tests/domain/test_release_fixtures.py` for anti-regression.
|
||
- **`NxNN` alt season/episode form supported** by `parse_release`. Releases like
|
||
`Show.1x05.720p.HDTV.x264-GRP` and `Show.2x07x08.1080p.WEB.x265-GRP` (multi-ep
|
||
alt form) now parse as TV shows.
|
||
- **`alfred/knowledge/release/separators.yaml`** declares the token separators
|
||
used by the release-name tokenizer (`.`, ` `, `[`, `]`, `(`, `)`, `_`). New
|
||
conventions can be added without code changes. The canonical `.` is always
|
||
present even if missing from YAML.
|
||
|
||
### Changed
|
||
|
||
- **Release parser v2 — SHITTY simplified to dict-driven tagging**.
|
||
The legacy ~480-line heuristic block in `release/services.py` is gone;
|
||
`pipeline._annotate_shitty` does a single pass that looks each token
|
||
up in the kb buckets (resolutions / sources / codecs / distributors /
|
||
year / `SxxExx`) with first-match-wins semantics, and the leftmost
|
||
contiguous UNKNOWN run becomes the title. `annotate()` no longer
|
||
returns `None` — SHITTY is the always-on fallback when no group schema
|
||
matches. `services.py` shrunk from ~525 to ~85 lines. Four fixtures
|
||
(`deutschland_franchise_box`, `sleaford_yt_slug`,
|
||
`super_mario_bilingual`, `predator_space_separators` — the last one
|
||
moved from `shitty/` → `path_of_pain/`) are now marked
|
||
`pytest.mark.xfail(strict=False)` documenting PoP-grade pathologies
|
||
that SHITTY intentionally won't handle. `ReleaseFixture` grows an
|
||
`xfail_reason` field; the parametrized suite wires the xfail mark
|
||
automatically.
|
||
|
||
- **`parse_release` tokenizer is now data-driven**: it splits on any character
|
||
listed in `separators.yaml` (regex character class) instead of `name.split(".")`.
|
||
This makes YTS-style releases (`The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]`),
|
||
space-separated names (`Inception 2010 1080p BluRay x264-GROUP`), and
|
||
underscore-separated names parse correctly via the direct path — no more
|
||
fallback through sanitization.
|
||
- **`parse_release` flow simplified**: site-tag extraction always runs first
|
||
(so `parse_path == "sanitized"` now reliably indicates a stripped `[tag]`),
|
||
then well-formedness is checked only against truly forbidden chars
|
||
(anything not in the configured separator set).
|
||
- **ISO 639-2/B is now the canonical language code project-wide** (was a mix of
|
||
639-1 and 639-2/T):
|
||
- `SubtitlePreferences.languages` default is now `["fre", "eng"]` (was
|
||
`["fr", "en"]`). Old LTM files are not auto-migrated — delete
|
||
`data/memory/ltm.json` to regenerate with the new defaults.
|
||
- Subtitle output filenames are now `{iso639_2b}.srt` (e.g. `fre.srt`,
|
||
`fre.sdh.srt`). Existing `fr.srt` files are still **read** correctly
|
||
(recognized as French via alias) but new files are written canonically.
|
||
- `Language` value object docstring corrected: it has always stored 639-2/B
|
||
(matching what ffprobe emits), not 639-2/T as previously documented.
|
||
- **`MovieService.validate_movie_file` minimum size is now configurable** via
|
||
`settings.min_movie_size_bytes` (default unchanged: 100 MB). Constructor
|
||
accepts an optional `min_movie_size_bytes` override for tests.
|
||
- **`SubtitleKnowledgeBase` delegates language lookup to `LanguageRegistry`**
|
||
rather than duplicating tokens. `subtitles.yaml` now only declares
|
||
subtitle-specific tokens (e.g. `vostfr`, `vf`, `vff`) under a new
|
||
`language_tokens` section.
|
||
|
||
### Removed
|
||
|
||
- **`alfred/domain/tv_shows/services.py`** and **`alfred/domain/movies/services.py`**
|
||
deleted entirely. They held fossil parsers (`parse_episode_filename`,
|
||
`extract_movie_metadata`, …) with zero production callers — superseded by
|
||
`parse_release` as the single source of truth for release-name parsing.
|
||
Associated tests (`tests/domain/test_movies.py`, `tests/domain/test_tv_shows_service.py`)
|
||
removed as well.
|
||
- `_sanitize` and `_normalize` helpers in `alfred/domain/release/services.py` —
|
||
the new tokenizer makes them redundant.
|
||
- `_LANG_KEYWORDS`, `_SDH_TOKENS`, `_FORCED_TOKENS`, `SUBTITLE_EXTENSIONS`
|
||
hardcoded dicts in `alfred/domain/subtitles/scanner.py` — all knowledge now
|
||
lives in YAML (CLAUDE.md compliance).
|
||
- `_MIN_MOVIE_SIZE_BYTES` module-level constant in
|
||
`alfred/domain/movies/services.py` — replaced by the new setting.
|
||
- Top-level `languages:` block in `subtitles.yaml` — superseded by
|
||
`language_tokens:` (subtitle-specific only) since iso_languages.yaml is the
|
||
canonical source.
|
||
|
||
### Fixed
|
||
|
||
- **`hi` token no longer marks a subtitle as SDH** (it conflicted with the
|
||
ISO 639-1 alias for Hindi). SDH is now detected only via `sdh`, `cc`, and
|
||
`hearing` tokens.
|
||
- `SubtitleKnowledgeBase` default rules used `"fra"` while
|
||
`iso_languages.yaml` exposes French as `"fre"` — preferred languages
|
||
defaults now match the canonical form.
|
||
|
||
### Internal
|
||
|
||
- **Domain I/O extraction** (`refactor/domain-io-extraction`): the domain
|
||
layer no longer performs subprocess calls, filesystem scans, or YAML
|
||
loading. Achieved in a series of focused commits:
|
||
- **Knowledge YAML loaders moved to infrastructure**:
|
||
`alfred/domain/release/knowledge.py`,
|
||
`alfred/domain/shared/knowledge/language_registry.py`, and
|
||
`alfred/domain/subtitles/knowledge/{base,loader}.py` relocated to
|
||
`alfred/infrastructure/knowledge/`. Re-exports were dropped — callers
|
||
import directly from the new location.
|
||
- **`MediaProber` and `FilesystemScanner` Protocol ports** introduced at
|
||
`alfred/domain/shared/ports/` with frozen-dataclass DTOs
|
||
(`SubtitleStreamInfo`, `FileEntry`). `SubtitleIdentifier` and
|
||
`PatternDetector` are now constructor-injected with concrete adapters
|
||
(`FfprobeMediaProber` wrapping `subprocess.run(ffprobe)` and
|
||
`PathlibFilesystemScanner` wrapping `pathlib`). No more direct
|
||
`subprocess`/`pathlib` usage from the subtitle domain services.
|
||
- **Live filesystem methods removed from VOs and entities**:
|
||
`FilePath.exists()` / `.is_file()` / `.is_dir()` deleted —
|
||
`FilePath` is now a pure address VO. `Movie.has_file()` and
|
||
`Episode.is_downloaded()` dropped. Callers either rely on a prior
|
||
detection step or use try/except over pre-checks (eliminates
|
||
TOCTOU races).
|
||
- **`SubtitlePlacer` moved to the application layer** at
|
||
`alfred/application/subtitles/placer.py` — it performs `os.link`
|
||
I/O, which doesn't belong in the domain. Pre-checks replaced with
|
||
try/except for `FileNotFoundError`/`FileExistsError`.
|
||
- **`SubtitleRuleSet.resolve()` no longer reaches into the knowledge
|
||
base**: the implicit `DEFAULT_RULES()` helper is gone, replaced by
|
||
an explicit `default_rules: SubtitleMatchingRules` parameter. The
|
||
`ManageSubtitles` use case loads defaults from the KB once and
|
||
passes them in.
|
||
- **`SubtitleKnowledge` Protocol port** at
|
||
`alfred/domain/subtitles/ports/knowledge.py` declares the read-only
|
||
query surface domain services consume (7 methods:
|
||
`known_extensions`, `format_for_extension`, `language_for_token`,
|
||
`is_known_lang_token`, `type_for_token`, `is_known_type_token`,
|
||
`patterns`). `SubtitleIdentifier` and `PatternDetector` depend on
|
||
this Protocol instead of the concrete `SubtitleKnowledgeBase` from
|
||
infrastructure — `domain/subtitles/` now has zero imports from
|
||
`infrastructure/`. The remaining domain → infra leak
|
||
(`domain/release/` loading separator YAML at import-time) is
|
||
documented in tech-debt and scheduled for its own branch.
|
||
- **`to_dot_folder_name(title)` helper** in
|
||
`alfred/domain/shared/value_objects.py` — extracts the
|
||
`re.sub(r"[^\w\s\.\-]", "", title).replace(" ", ".")` pattern that was
|
||
duplicated between `MovieTitle.normalized()` and `TVShow.get_folder_name()`.
|
||
- **`ParsedRelease.languages` uses `field(default_factory=list)`** instead of
|
||
a manual `__post_init__` that assigned `[]` via `object.__setattr__`.
|
||
- **`file_extensions.yaml` splits subtitle sidecars (`.srt`, `.sub`, `.idx`,
|
||
`.ass`, `.ssa`) into a dedicated `subtitle:` category** instead of lumping
|
||
them under `metadata:`. The `_METADATA_EXTENSIONS` set used by
|
||
`detect_media_type` remains the union of both (same behavior — subtitles
|
||
are still ignored when deciding the media type of a folder), but a new
|
||
`load_subtitle_extensions()` loader is now available for the subtitles
|
||
domain. Sematic clarity, no functional change.
|
||
- **`tv_shows/entities.py` module docstring** now shows the aggregate
|
||
ownership as an ASCII tree before the rule text — quicker visual scan
|
||
of the DDD structure.
|
||
- Removed backward-compat shims `_sanitise_for_fs` /
|
||
`_strip_episode_from_normalised` from `domain/release/value_objects.py`
|
||
(zero callers).
|
||
- Cleaned ruff warnings across the codebase: `subprocess.run` calls now pass
|
||
explicit `check=False` (PLW1510); lazy imports promoted to module top where
|
||
there was no cycle (PLC0415 in `manage_subtitles.py`, `placer.py`,
|
||
`qbittorrent/client.py`, `file_manager.py`); fixed module-level import
|
||
ordering (E402) in `language_registry.py` and `subtitles/knowledge/loader.py`;
|
||
removed unused locals (F841 / B007); replaced unnecessary set comprehension
|
||
with `set()` in `release/knowledge.py` (C416).
|
||
- Ruff config: ignore `PLR0911` / `PLR0912` (too-many-returns / too-many-branches)
|
||
globally — noisy on parser mappers and orchestrator use-cases where early-return
|
||
validation is essential complexity. Ignore `PLW0603` for the documented memory
|
||
singleton (`infrastructure/persistence/context.py`).
|
||
- **Release-knowledge DDD purification** (`refactor/domain-release-knowledge`):
|
||
the last domain → infrastructure leak (`domain/release/value_objects.py`
|
||
loading YAML at import-time) is gone. Achieved via:
|
||
- **`ReleaseKnowledge` Protocol port** at
|
||
`alfred/domain/release/ports/knowledge.py` declares the read-only query
|
||
surface release parsing needs (token sets for resolutions, sources, codecs,
|
||
languages, hdr extras; structured dicts for audio, video_meta, editions,
|
||
media_type_tokens; separators list; file-extension sets used by
|
||
application/infra callers; `sanitize_for_fs(text)` method).
|
||
- **`YamlReleaseKnowledge` adapter** at
|
||
`alfred/infrastructure/knowledge/release_kb.py` loads every YAML constant
|
||
once at construction. Builds an immutable `str.maketrans` translation
|
||
table for filesystem sanitization.
|
||
- **`parse_release(name, kb)`** takes the knowledge as an explicit
|
||
parameter — no more module-level YAML loading inside the domain. Every
|
||
internal helper (`_tokenize`, `_extract_tech`, `_extract_languages`,
|
||
`_extract_audio`, `_extract_video_meta`, `_extract_edition`,
|
||
`_extract_title`, `_infer_media_type`, `_is_well_formed`) takes `kb`.
|
||
- **`ParsedRelease` Option B**: sanitization happens once at parse time
|
||
and is stored on a new `title_sanitized: str` field. Builder methods
|
||
(`show_folder_name`, `season_folder_name`, `episode_filename`,
|
||
`movie_folder_name`, `movie_filename`) are now pure — they accept
|
||
already-sanitized `tmdb_title_safe` / `tmdb_episode_title_safe`
|
||
arguments. Callers at the use-case boundary sanitize TMDB strings
|
||
via `kb.sanitize_for_fs(...)` before passing them in.
|
||
- **All domain-knowledge constants removed from `value_objects.py`**:
|
||
`_RESOLUTIONS`, `_SOURCES`, `_CODECS`, `_AUDIO`, `_VIDEO_META`,
|
||
`_EDITIONS`, `_HDR_EXTRA`, `_MEDIA_TYPE_TOKENS`, `_LANGUAGE_TOKENS`,
|
||
`_FORBIDDEN_CHARS`, `_VIDEO_EXTENSIONS`, `_NON_VIDEO_EXTENSIONS`,
|
||
`_SUBTITLE_EXTENSIONS`, `_METADATA_EXTENSIONS`, `_WIN_FORBIDDEN_TABLE`,
|
||
and the `_sanitize_for_fs` helper. The domain module is now pure.
|
||
- **Application-layer KB singleton**: `resolve_destination.py` instantiates
|
||
a module-level `_KB: ReleaseKnowledge = YamlReleaseKnowledge()` and
|
||
threads it through every `parse_release(...)` call. The local
|
||
`_sanitize` helper and `_WIN_FORBIDDEN` regex were dropped in favor of
|
||
`_KB.sanitize_for_fs(...)`.
|
||
- **`detect_media_type(parsed, source_path, kb)` and
|
||
`find_video_file(path, kb)`** now take the knowledge explicitly
|
||
instead of importing `_*_EXTENSIONS` constants from the domain.
|
||
`agent/tools/filesystem.py::analyze_release` imports the application
|
||
KB singleton and passes it through.
|
||
|
||
---
|
||
|
||
## [2026-05-17] — TVShow & Movie aggregate refactor
|
||
|
||
Multi-phase refonte of the TV show domain into a real DDD aggregate, with
|
||
matching parity work on `Movie`, a language knowledge system, and the
|
||
`shared/media` restructure that supports both.
|
||
|
||
### Added
|
||
|
||
- **Language knowledge system** (`alfred/knowledge/iso_languages.yaml` + 42
|
||
languages including `und` for undetermined).
|
||
- `Language` value object (frozen dataclass) with `iso`, `english_name`,
|
||
`native_name`, `aliases`, and a `matches(raw)` cross-format helper.
|
||
- `LanguageRegistry` loader (`alfred/domain/shared/knowledge/`) merging
|
||
builtin + learned YAML. Not a singleton — the application layer
|
||
instantiates it.
|
||
- ISO 639-2/B is the canonical key; aliases cover 639-1, 639-2/T, English
|
||
name, native name, and common spellings.
|
||
- **`VideoTrack`** dataclass (`alfred/domain/shared/media/video.py`) with a
|
||
`resolution` property using width-priority bucket detection (handles
|
||
cinema/scope crops like 1920×960 → 1080p).
|
||
- **`shared/media/matching.py`** — `track_lang_matches` helper shared by
|
||
`Episode` and `Movie`. Implements the **"C+" contract** for language helpers:
|
||
- `Language` query → cross-format match via `Language.matches()`
|
||
- `str` query → case-insensitive direct comparison (no normalization)
|
||
- **TVShow aggregate composition**:
|
||
- `TVShow.seasons: dict[SeasonNumber, Season]`
|
||
- `Season.episodes: dict[EpisodeNumber, Episode]`
|
||
- `Season.expected_episodes` / `Season.aired_episodes` (split so collection
|
||
state can compare "owned vs aired today" without confusing in-flight
|
||
seasons with future ones)
|
||
- **Aggregate methods on `TVShow`**:
|
||
- `add_episode(ep)` — sole sanctioned mutation entry point (creates the
|
||
season if missing)
|
||
- `add_season(season)` — replaces a season wholesale
|
||
- `collection_status()` → `CollectionStatus.{EMPTY, PARTIAL, COMPLETE}`
|
||
- `is_complete_series()` — true iff `ENDED + COMPLETE`
|
||
- `missing_episodes()` — flat list of all aired-but-not-owned
|
||
`(season, episode)` pairs
|
||
- **`CollectionStatus`** enum (orthogonal to `ShowStatus`).
|
||
- **Episode track helpers** (`has_audio_in`, `has_subtitles_in`,
|
||
`has_forced_subs`, `audio_languages`, `subtitle_languages`), driven by
|
||
`Episode.audio_tracks` / `Episode.subtitle_tracks`.
|
||
- **Movie aggregate parity** — `Movie` now carries `audio_tracks` /
|
||
`subtitle_tracks` and exposes the same helpers as `Episode` (same C+
|
||
contract).
|
||
- **`CHANGELOG.md`** (this file).
|
||
|
||
### Changed
|
||
|
||
- **`shared/media_info.py` exploded into `shared/media/{audio,video,subtitle,info,matching}.py`.**
|
||
`MediaInfo` is now symmetric: every stream type is a `list[Track]`. Flat
|
||
accessors (`width`, `height`, `video_codec`, `resolution`) remain as
|
||
properties that read the first video track.
|
||
- **`MediaInfo.duration_seconds` / `bitrate_kbps`** moved from `VideoTrack` to
|
||
`MediaInfo` (file-level — they come from the ffprobe `format` block, not a
|
||
stream). Files without a video stream now correctly expose duration.
|
||
- **`ShowStatus.from_string`** extended to map TMDB strings (`Returning
|
||
Series`, `In Production`, `Pilot`, `Planned`, `Canceled`, `Cancelled`).
|
||
Comparison is whitespace-trimmed and case-insensitive.
|
||
- **`Season` / `Episode`** dropped their `show_imdb_id` back-references. They
|
||
are owned by `TVShow` and reached only through it.
|
||
- **`TVShow.seasons_count` and `episode_count`** are now `@property` (computed
|
||
from the dict) instead of stored ints.
|
||
- **`TVShowService.parse_episode_from_filename`** rewritten in string
|
||
operations (no regex). Supports `S01E05` / `s1e5` and `1x05` / `01x5` forms.
|
||
- **`TVShowService.find_next_episode`** now drives off
|
||
`show.missing_episodes()` instead of the hardcoded "max 50 episodes per
|
||
season" heuristic.
|
||
- **`TVShowService` constructor** no longer takes `season_repository` /
|
||
`episode_repository` — the aggregate persists in one block via
|
||
`TVShowRepository` only.
|
||
- **`SubtitleTrack` in `alfred.domain.subtitles.entities` renamed to
|
||
`SubtitleCandidate`.** Coexists with the `shared.media.SubtitleTrack`
|
||
ffprobe-view dataclass (different bounded contexts, kept separate
|
||
intentionally).
|
||
- **`tv_shows/services.py` `_VIDEO_EXTENSIONS`** now loaded from
|
||
`knowledge/release/file_extensions.yaml` via `load_video_extensions()`
|
||
(single source of truth).
|
||
- **`CLAUDE.md`** updated with three new policy sections:
|
||
- "Tests" — small updates OK during normal work, no mass-update sprees
|
||
- "Backwards-compatibility shims" — prefer clean migration over shims
|
||
- "Regex" — not forbidden, use judgment when string ops would be fragile
|
||
|
||
### Removed
|
||
|
||
- **Legacy `Season N Episode N` filename form** in
|
||
`TVShowService.parse_episode_from_filename`. It never appears in the release
|
||
names Alfred handles, and supporting it forced a regex.
|
||
- **`SeasonRepository` and `EpisodeRepository`** — only the aggregate root has
|
||
a repository (DDD rule: one repo per aggregate).
|
||
- **`shared/media_info.py`** compatibility shim — callers updated.
|
||
- **`SubtitleTrack` compatibility alias** in `subtitles.entities` — callers
|
||
updated to `SubtitleCandidate`.
|
||
|
||
### Fixed
|
||
|
||
- **`MediaInfo.duration_seconds` returns `None` on audio-only files** instead
|
||
of crashing through `primary_video.duration_seconds` (see the duration/bitrate
|
||
move under **Changed**).
|
||
- **`MediaOrganizer`** (`infrastructure/filesystem/organizer.py`) no longer
|
||
passes the removed `show_imdb_id` / `episode_count` kwargs when constructing
|
||
a `Season` for folder-name generation.
|
||
|
||
### Internal
|
||
|
||
- Test suite rewritten where the aggregate redesign broke fixtures:
|
||
`tests/domain/test_tv_shows.py` (69 tests), `tests/domain/test_media_info.py`
|
||
(rewritten for `VideoTrack`), `tests/application/test_enrich_from_probe.py`
|
||
(helper added), `tests/infrastructure/test_filesystem_extras.py` (fixtures),
|
||
`tests/domain/test_tv_shows_service.py` (find_next_episode driven by real
|
||
aggregate state).
|
||
- Subtitle services internal migration: `matcher.py`, `utils.py`, `placer.py`,
|
||
`identifier.py` updated to import `SubtitleCandidate`.
|
||
- Suite status at end of block: **1066 passed, 8 skipped, 0 failed**.
|