c22b2b78eb
Filesystem-side concerns (file paths, tracks, quality, mode, added_at) move to the releases/ domain added in Phase 1; the TMDB aggregates now carry only identity + TMDB catalog facts. Domain entities: - TVShow: tmdb_id: TmdbId required (primary key), imdb_id: ImdbId | None optional, status: str = "unknown" added. - Season: episode_count: int = 0 added (TMDB-cached); audio_tracks, subtitle_tracks, mode property removed. - Episode: slimmed to identity + title. file_path/file_size/tracks removed. No longer inherits MediaWithTracks. - Movie: tmdb_id required, imdb_id optional. file_path/file_size/quality/ added_at/audio_tracks/subtitle_tracks removed. get_filename() now returns "Title.Year" — quality moves to MovieRelease. Builders: - TVShowBuilder requires tmdb_id: TmdbId; imdb_id/status optional. - SeasonBuilder.set_episode_count(int) replaces set_audio_tracks / set_subtitle_tracks. No-coercion contract: TVShow(tmdb_id=1396) raises — callers pass TmdbId(1396). No ergonomic shim per the no-shims rule. Cascade fixes: - MediaOrganizer test fixtures updated to new Movie/TVShow shapes. - Movie.get_filename() re-added (without Quality) so MediaOrganizer keeps working until Phase 4 rewires it through MovieRelease. Quarantined (deleted in Phase 4 alongside v1 dot_alfred): - tests/application/library/test_rescan.py — module-level skip. - tests/infrastructure/persistence/dot_alfred/test_repository.py — module-level skip. - tests/infrastructure/persistence/dot_alfred/test_serializer.py — module-level skip. Suite: 1216 passed, 11 skipped (8 pre-existing + 3 Phase 3 quarantines), 4 xfailed. CHANGELOG updated under [Unreleased].
1058 lines
59 KiB
Markdown
1058 lines
59 KiB
Markdown
# Changelog
|
||
|
||
All notable changes to Alfred are documented here.
|
||
|
||
The format is loosely based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
|
||
Alfred is not yet on SemVer — entries are grouped by **dated work blocks** instead
|
||
of release numbers. Granularity targets behavioral or API-visible changes; refer
|
||
to `git log` for commit-level detail.
|
||
|
||
Sections used per block: **Added** / **Changed** / **Deprecated** / **Removed** /
|
||
**Fixed** / **Internal** (for tech-debt and refactor noise that doesn't affect
|
||
callers).
|
||
|
||
---
|
||
|
||
## [Unreleased]
|
||
|
||
### Changed
|
||
|
||
- **`.alfred` v2 — Phase 3: `TVShow` / `Movie` aggregates become
|
||
TMDB-only.** Third phase of `specs/dot_alfred_v2.md` on branch
|
||
`refactor/dot-alfred-v2`. Filesystem-side concerns (file paths,
|
||
tracks, quality, mode, `added_at`) move to the `releases/` domain
|
||
added in Phase 1; the TMDB aggregates now carry only identity +
|
||
TMDB catalog facts.
|
||
- **`TVShow`** — `tmdb_id: TmdbId` is now the **required primary
|
||
key**; `imdb_id: ImdbId | None` is the optional secondary anchor.
|
||
Added `status: str = "unknown"` (raw TMDB string, default matches
|
||
the v2 library-index auto-heal placeholder). `episode_count`
|
||
aggregates the TMDB-cached counts on each `Season` (was: sum of
|
||
materialized `Episode` objects).
|
||
- **`Season`** — added `episode_count: int = 0` (TMDB-cached,
|
||
authoritative). **Removed**: `audio_tracks`, `subtitle_tracks`,
|
||
and the `mode` property (release mode now lives only on
|
||
`SeasonRelease.mode` — single source of truth).
|
||
- **`Episode`** — slimmed to identity + title. **Removed**:
|
||
`file_path`, `file_size`, `audio_tracks`, `subtitle_tracks`. The
|
||
`MediaWithTracks` mixin is no longer in `Episode`'s MRO; on-disk
|
||
facts live on the matching `EpisodeRelease` keyed by
|
||
`(season_number, episode_number)`.
|
||
- **`Movie`** — `tmdb_id: TmdbId` required, `imdb_id` optional.
|
||
**Removed**: `file_path`, `file_size`, `quality`, `added_at`,
|
||
`audio_tracks`, `subtitle_tracks`. `get_filename()` now returns
|
||
`"Title.Year"` (quality lives on `MovieRelease` and is appended
|
||
by a release-aware caller — Phase 4 wires this through
|
||
`MediaOrganizer`).
|
||
- **`TVShowBuilder` / `SeasonBuilder`** — constructor requires
|
||
`tmdb_id: TmdbId`; `imdb_id` and `status` are optional.
|
||
`SeasonBuilder.set_episode_count(int)` replaces the old
|
||
`set_audio_tracks` / `set_subtitle_tracks` (tracks no longer
|
||
persisted on `Season`).
|
||
- **`MovieRelease` carries `added_at: datetime`** (required).
|
||
Bumped `dot_alfred/v2` `SCHEMA_VERSION` from `1` → `2` to add
|
||
`added_at: datetime` to `MovieReleaseSidecar`. Round-trip via
|
||
Pydantic `mode="json"` (datetime ↔ ISO 8601 string). No migration
|
||
code shipped — no v2.1 sidecars exist in the wild yet.
|
||
- **No-coercion `TmdbId` contract.** `TVShow(tmdb_id=1396)` now raises
|
||
— callers pass `TmdbId(1396)`. Same for `imdb_id: ImdbId | None`
|
||
on `TVShow`/`Movie`. Honest type contract, no ergonomic shim.
|
||
|
||
### Removed
|
||
|
||
- `Season.mode` property (derive from `SeasonRelease.mode` instead).
|
||
- `Episode.file_path` / `file_size` / `audio_tracks` /
|
||
`subtitle_tracks`.
|
||
- `Movie.file_path` / `file_size` / `quality` / `added_at` /
|
||
`audio_tracks` / `subtitle_tracks`.
|
||
|
||
### Internal
|
||
|
||
- v1 dot_alfred package (`bridge.py`, `repository.py`,
|
||
`serializer.py`, `sidecar.py`), the abstract `TVShowRepository` /
|
||
`MovieRepository` ports typed against the pre-Phase-3 aggregates,
|
||
and `alfred/application/library/rescan.py` are **intentionally
|
||
left in tree as a known-red island**. Their tests
|
||
(`tests/infrastructure/persistence/dot_alfred/test_repository.py`,
|
||
`test_serializer.py`, `tests/application/library/test_rescan.py`)
|
||
are module-level skipped with a Phase 4 reference. Phase 4 rewrites
|
||
`rescan_show` / introduces `rescan_movie` on top of the v2
|
||
release repositories + library index, then deletes the v1 stack +
|
||
the abstract ports + the quarantined tests in one swing.
|
||
- Test suite: 1216 passed, 11 skipped (8 pre-existing + 3 Phase-3
|
||
quarantines), 4 xfailed. v2 round-trip tests now reference
|
||
`SCHEMA_VERSION` instead of hard-coded `1` for future-proofing.
|
||
|
||
### Added
|
||
|
||
- **`.alfred` v2 — Phase 2: new persistence package + TMDB client
|
||
extensions.** Second phase of `specs/dot_alfred_v2.md` on branch
|
||
`refactor/dot-alfred-v2`. The new
|
||
`alfred/infrastructure/persistence/dot_alfred/v2/` package ships
|
||
the full v2 sidecar stack while leaving v1 (and the existing
|
||
`TVShow` aggregate) untouched — Phase 3 is the cutover.
|
||
- **Pydantic DTOs** — `SeriesReleaseSidecar` /
|
||
`MovieReleaseSidecar` (per-item), `TVShowLibraryIndexSidecar` /
|
||
`MovieLibraryIndexSidecar` (library-root index). All built on a
|
||
common `_Strict` base (`extra="forbid"`, `frozen=True`) with a
|
||
`@model_validator` enforcing `schema_version == 1`.
|
||
- **Track entries** — `AudioTrackEntry` / `SubtitleEntry` (sidecar
|
||
cache shape, slimmed from the domain track types). `SubtitleEntry`
|
||
carries `is_forced` + `is_sdh` as explicit booleans (v1's
|
||
`type: "sdh"` overload is gone).
|
||
- **Serializer** — `read_yaml` / `atomic_write_yaml` helpers
|
||
centralize YAML I/O and atomic writes (`.tmp + os.replace`).
|
||
`SidecarSchemaError` wraps both YAML parse errors and Pydantic
|
||
validation errors for uniform catch-and-skip semantics.
|
||
- **Bridge** — lossless `domain ↔ sidecar` conversion for
|
||
`SeriesRelease` / `MovieRelease` (round-trippable, including
|
||
multi-episode ranges and `is_sdh` subtitles); one-way projection
|
||
for library-index entries (`show_index_entry_from`,
|
||
`movie_index_entry_from`) that flattens multi-episode files into
|
||
per-TMDB-slot maps in `seasons[*].episodes`.
|
||
- **Repositories** —
|
||
`DotAlfredSeriesReleaseRepository` /
|
||
`DotAlfredMovieReleaseRepository` walk `library_root/*/` with
|
||
log+skip on corruption; **`DotAlfredTVShowLibraryIndex`** /
|
||
**`DotAlfredMovieLibraryIndex`** auto-heal silently on missing or
|
||
corrupt index files by rebuilding from the per-item sidecars
|
||
(healed entries keep TMDB-cached fields as placeholders until the
|
||
next sync repopulates them). Writes are atomic and never auto-heal
|
||
(read paths handle that).
|
||
- **TMDB client extensions** — `TmdbSeasonInfo` / `TmdbShowInfo`
|
||
DTOs + `TMDBClient.get_tv_show_info(tmdb_id)` aggregating
|
||
`/tv/{id}` + `/tv/{id}/external_ids`. The parsing logic is a pure
|
||
function (`parse_tv_show_info`) testable without HTTP, with an
|
||
injectable reference date for deterministic `aired` flag tests.
|
||
- **`is_sdh` flag on `SubtitleTrack`.** Added to
|
||
`alfred/domain/shared/media.py::SubtitleTrack` to mirror ffprobe's
|
||
`hearing_impaired` disposition. Wired through the ffprobe layer
|
||
(`ffprobe_prober.py`) and the v2 sidecar bridge so SDH information
|
||
round-trips end-to-end. Defaults to `False` — backwards-compatible
|
||
for every existing caller.
|
||
- **37 v2 integration tests** on `tmp_path` covering round-trips
|
||
(domain ↔ sidecar ↔ YAML ↔ domain), atomic writes (no `.tmp`
|
||
leftovers), per-item log+skip on corruption / schema mismatch,
|
||
movie anchor-mismatch warning, full upsert / find / delete on both
|
||
library indexes, and the auto-heal path on missing / corrupt /
|
||
schema-mismatched index files. **16 TMDB DTO tests** for the new
|
||
`parse_tv_show_info` pure function.
|
||
|
||
- **`.alfred` v2 — Phase 1: new `releases/` domain.** First step of
|
||
`specs/dot_alfred_v2.md` on branch `refactor/dot-alfred-v2`. The
|
||
new `alfred/domain/releases/` package introduces a filesystem-only
|
||
bounded context separated from TMDB identity (the existing
|
||
`tv_shows` / `movies` domains). It hosts:
|
||
- **`EpisodeRange` VO** — covers single-episode files
|
||
(`EpisodeRange(E02, E02)`) and multi-episode files
|
||
(`EpisodeRange(E02, E04)` for `SxxE02E03E04.mkv`), with
|
||
`count()` / `numbers()` / `is_single()` helpers.
|
||
- **`ReleaseMode` enum** — `PACK` (N video files directly in the
|
||
season folder) vs `EPISODIC` (N sub-folders, one episode each);
|
||
classified by the walker, never re-derived.
|
||
- **Aggregates** — `TrackProfile`, `EpisodeRelease`,
|
||
`SeasonRelease` (with `episode_count()` summing each file's
|
||
range), `SeriesRelease`, `MovieRelease`. All frozen
|
||
dataclasses; mutation via `SeasonReleaseBuilder` /
|
||
`SeriesReleaseBuilder` (mirror the v1 `TVShowBuilder` pattern,
|
||
including `from_existing()` round-trip).
|
||
- **Abstract ports** — `SeriesReleaseRepository`,
|
||
`MovieReleaseRepository` (concrete `DotAlfred*` arrive in
|
||
Phase 2).
|
||
- **`TmdbId` VO** added to `alfred/domain/shared/value_objects.py`
|
||
(positive int, rejects bool/str/float — symmetry with `ImdbId`).
|
||
- 73 unit tests covering VO validation, entity invariants, builder
|
||
sort + overlap detection, and `from_existing()` round-trips. v1
|
||
code paths untouched at this stage; new domain coexists.
|
||
|
||
- **`rescan_show` orchestrator
|
||
(`alfred/application/library/rescan.py`).** Step 4 of the
|
||
`specs/dot_alfred.md` plan. Walks an Alfred-managed show folder,
|
||
runs the existing `inspect_release` pipeline on every video file it
|
||
finds, and assembles a frozen `TVShow` aggregate persisted via the
|
||
injected `TVShowRepository`. Reuses the release parser + ffprobe
|
||
path verbatim — no duplicated parse/probe logic at the library
|
||
layer. PACK vs EPISODIC inferred per season folder from the
|
||
on-disk file count + parser output: a single video whose name
|
||
carries no `Exx` token becomes a PACK season (tracks lifted to the
|
||
season-level `audio_tracks` / `subtitle_tracks`), anything else
|
||
becomes EPISODIC (one `Episode` per file). Episode paths are
|
||
stored relative to the show root for portability. Files that fail
|
||
to parse a season/episode number, or seasons with mixed numbers,
|
||
are logged and skipped — the orchestrator never raises. Embedded
|
||
subtitle tracks are captured from `ffprobe`; adjacent `.srt`
|
||
files, multi-episode entries (`S01E01E02`), and TMDB-driven PACK
|
||
detection are tracked as tech debt for a dedicated subtitles /
|
||
ShowTracker session. 7 integration tests on `tmp_path` with the
|
||
Foundation layout (S01 EPISODIC + S02 PACK) cover the round-trip
|
||
through the real `.alfred` repository.
|
||
- **Show tree walker (`alfred/application/library/walker.py`).**
|
||
Step 4a foundation. `walk_show(show_root, scanner, kb)` returns a
|
||
`ShowTree(show_root, season_folders=tuple[SeasonFolder, ...])` —
|
||
pure structural snapshot, no parsing, no probing. Season folders
|
||
are detected by a `\bS\d{1,2}\b` token anywhere in the directory
|
||
name (release-style naming, no Plex `Season 01` / `Specials`
|
||
conventions). Video files are filtered against
|
||
`kb.video_extensions`; no recursion into sub-sub-folders. 11 unit
|
||
tests on `tmp_path` cover detection (case-insensitive, in-word
|
||
rejection), filtering (subs, NFO, sample files), and edge cases
|
||
(empty / missing show root).
|
||
- **Season-level audio/subtitle tracks
|
||
(`alfred/domain/tv_shows/entities.py`,
|
||
`alfred/domain/tv_shows/builders.py`).** `Season` now inherits
|
||
from `MediaWithTracks` and carries `audio_tracks` /
|
||
`subtitle_tracks` tuples (empty by default). Populated only in
|
||
PACK mode (the single release covering the whole season); empty in
|
||
EPISODIC mode where tracks live per-episode. `SeasonBuilder`
|
||
gains `set_audio_tracks()` / `set_subtitle_tracks()` and forwards
|
||
them through `from_existing()`. The bridge writes / reads them in
|
||
the PACK branch via shared `_synth_audio_tracks` /
|
||
`_synth_subtitle_tracks` helpers used for episodes too.
|
||
|
||
- **`DotAlfredTVShowRepository` — filesystem-backed implementation of
|
||
the `TVShowRepository` port
|
||
(`alfred/infrastructure/persistence/dot_alfred/repository.py`).**
|
||
Step 3 of the `specs/dot_alfred.md` plan. Reads and writes one
|
||
`.alfred` YAML file per show under a configurable `library_root`.
|
||
`save(show)` writes atomically (`.alfred.tmp` + `os.replace`) into a
|
||
folder that **must already exist** — the repository never invents a
|
||
folder name (the upstream `MediaOrganizer` is in charge of placing
|
||
files; the repo writes the sidecar next to them). `find_by_imdb_id` /
|
||
`find_all` walk `library_root/*/`, loading each readable sidecar;
|
||
folders without a sidecar return `None` / are skipped (no implicit
|
||
cold scan — that is the job of the upcoming `rescan_show` tool).
|
||
Corrupted YAML and schema violations are logged and skipped, never
|
||
raised, so a single bad folder does not break the rest of the
|
||
library. The repo keeps a tiny in-memory `imdb_id → folder_name`
|
||
index populated on every successful read/save, so subsequent saves
|
||
find the right destination without re-walking — useful when the show
|
||
folder name diverges from `show.get_folder_name()` (custom 1080p / 4K
|
||
variants). 20 integration tests on `tmp_path` cover the round-trip,
|
||
cold folder / unknown id returns, multi-show `find_all`, corrupted /
|
||
wrong-schema skipping, atomic write (no `.alfred.tmp` left behind),
|
||
overwrite, and folder-name fallbacks.
|
||
- **Sidecar ↔ TVShow bridge
|
||
(`alfred/infrastructure/persistence/dot_alfred/bridge.py`).**
|
||
`to_sidecar(show, folder_paths=...)` summarizes the rich domain
|
||
`AudioTrack` / `SubtitleTrack` to the sidecar's compact form (unique
|
||
audio languages in track order; subtitle entries derived from
|
||
`is_forced` and assumed `source="embedded"`). `from_sidecar(sidecar,
|
||
title=...)` reconstructs the domain `TVShow` with synthesized tracks
|
||
— one `AudioTrack` per language, one `SubtitleTrack` per entry, with
|
||
ffprobe-only fields (`codec`, `channels`, `channel_layout`) left as
|
||
`None`. The bridge is intentionally lossy on probe minutiae the
|
||
sidecar does not store; this is the documented trade-off from the
|
||
factual-only spec.
|
||
|
||
- **`.alfred` sidecar serializer
|
||
(`alfred/infrastructure/persistence/dot_alfred/`).** Implements step 2
|
||
of the `specs/dot_alfred.md` plan. Pure-dict in/out
|
||
(`serialize(sidecar) -> dict`, `deserialize(data) -> ShowSidecar`) —
|
||
YAML I/O lives in the repository layer (step 3) and is kept out for
|
||
trivial testability. Ships the DTOs that mirror the YAML schema
|
||
field-for-field (`ShowSidecar`, `SeasonSidecar`, `EpisodeSidecar`,
|
||
`SubtitleEntry`). The sidecar acts as a **scan cache**: it stores
|
||
only what is genuinely costly to recompute — folder/file paths
|
||
(skipping the FS walk) and probed track metadata (skipping ffprobe).
|
||
Release identifiers (group, source, quality, codec) live in folder
|
||
and file names and are derived on demand by the parser — they are
|
||
deliberately absent from the schema and rejected on deserialize. The
|
||
serializer is **strict on schema**: unknown keys at any level raise
|
||
`SidecarSchemaError`, missing required fields raise clearly, and
|
||
`bool` cannot sneak in as a season/episode number. Optional fields
|
||
(`tmdb_id`, empty `audio`/`subtitles`/`episodes`) are omitted from
|
||
the output rather than emitted as `null` / `[]`. Tests cover
|
||
round-trip equivalence (DTO → dict → DTO and DTO → YAML text → DTO),
|
||
the Foundation S01 PACK case (real-world fixture with mixed sub
|
||
types — superset captured at season scope), and a Breaking Bad S05
|
||
EPISODIC case. An on-disk `tmp_path` fixture recreates the Foundation
|
||
folder structure with placeholder files, ready to be reused by the
|
||
upcoming repository walk tests in step 3.
|
||
|
||
- **`TVShowBuilder` / `SeasonBuilder` — sole construction surface for the
|
||
TVShow aggregate** (`alfred/domain/tv_shows/builders.py`). The aggregate
|
||
is now fully frozen; building goes through a mutable scratchpad that
|
||
emits an immutable `TVShow` via `build()`. Both builders offer a
|
||
`from_existing()` classmethod to seed from a current frozen aggregate
|
||
and apply modifications. Episodes are emitted sorted by number within a
|
||
season, seasons sorted by number within the show.
|
||
- **`SeasonMode` enum** (`PACK` / `EPISODIC`) in
|
||
`alfred/domain/tv_shows/value_objects.py`. Computed at read time from
|
||
the season's structural shape (`Season.mode` property): a season with
|
||
no explicit episodes is `PACK` (a single release covering the whole
|
||
season), a season with episodes is `EPISODIC` (currently airing, one
|
||
release per episode). Never stored — the YAML sidecar encodes the
|
||
mode via the presence/absence of the `episodes:` block.
|
||
|
||
### Changed
|
||
|
||
- **TVShow aggregate is now frozen all the way down.** `TVShow`,
|
||
`Season` and `Episode` are all `@dataclass(frozen=True)`. Children
|
||
are stored as ordered tuples (`tuple[Season, ...]`,
|
||
`tuple[Episode, ...]`) sorted by their respective numbers, replacing
|
||
the previous mutable dicts. Lookup helpers `TVShow.get_season(n)` and
|
||
`Season.get_episode(n)` traverse the tuple lazily via `next()`. The
|
||
former `add_episode` / `add_season` mutation methods are gone — all
|
||
construction goes through `TVShowBuilder` / `SeasonBuilder`.
|
||
|
||
### Removed
|
||
|
||
- **ShowTracker-territory fields stripped from the TVShow aggregate.**
|
||
The aggregate now models only what the `.alfred` sidecar stores
|
||
(filesystem-observable facts + immutable identity). Dropped from the
|
||
domain:
|
||
- `TVShow.status` (`ShowStatus`) and the `ShowStatus` enum entirely,
|
||
along with its TMDB string mapping (`from_string`).
|
||
- `TVShow.expected_seasons`, `Season.expected_episodes`,
|
||
`Season.aired_episodes`, `Season.name`.
|
||
- `TVShow.collection_status()`, `is_complete_series()`,
|
||
`missing_episodes()`, `is_ongoing()`, `is_ended()` and the
|
||
`CollectionStatus` enum.
|
||
- `Season.is_complete()`, `is_fully_aired()`, `missing_episodes()`
|
||
and the `aired ≤ expected` validation.
|
||
- `TVShow.add_episode()` / `TVShow.add_season()` /
|
||
`Season.add_episode()` — replaced by the builder API.
|
||
These concerns will reappear in a dedicated `ShowTracker` layer (to
|
||
be designed) that combines the `.alfred` sidecar with live TMDB data
|
||
to answer questions like "is this show complete?" or "are new
|
||
episodes out?". Keeping volatile/derived state out of the aggregate
|
||
matches the factuel-only philosophy locked in `specs/dot_alfred.md`.
|
||
|
||
### Internal
|
||
|
||
- **Test suite rewritten for the new aggregate shape.**
|
||
`tests/domain/test_tv_shows.py` now covers frozen invariants, builder
|
||
ordering, last-write-wins on duplicates, `from_existing` round-trip,
|
||
and `SeasonMode` derivation. `tests/infrastructure/test_filesystem_extras.py`
|
||
helper simplified (no more `ShowStatus.ENDED` / `expected_seasons` on
|
||
test shows). 1078 tests still green.
|
||
|
||
- **Design doc for `.alfred/` sidecar persistence
|
||
(`specs/dot_alfred.md`).** First entry in the new `specs/` directory.
|
||
Specifies a per-show `.alfred/` directory holding a `show.yaml` and
|
||
one `season_NN.yaml` per season, used by the upcoming concrete
|
||
`TVShowRepository` to cache parse/probe results and avoid full
|
||
rescans on every library read. Covers schema, naming conventions,
|
||
cache invalidation strategy (size + mtime), self-healing on
|
||
drift, atomicity (`os.replace`), edge cases (legacy folders,
|
||
corrupted sidecars, manual file removal), and a phased
|
||
implementation plan. No code yet — spec only.
|
||
|
||
### Internal
|
||
|
||
- **`specs/` is now tracked.** The repo-level `.gitignore` had a
|
||
blanket `*.md` rule with only `CHANGELOG.md` allow-listed. Added
|
||
explicit exceptions for `/README.md` (root only — avoids
|
||
unintentionally exposing fixture READMEs) and `specs/**/*.md` so the
|
||
new design-doc directory ships with the project. Also added an
|
||
explicit `/.claude/` ignore line for the private dev-docs sub-repo
|
||
that sits inside the working tree but is versioned separately.
|
||
|
||
### Fixed
|
||
|
||
- **Multi-episode chain (e.g. `S14E09E10E11`) now collapses to a full
|
||
range.** The parser previously captured `episode=9, episode_end=10`
|
||
and dropped E11+. It now returns `episode=first, episode_end=last`,
|
||
with intermediate values implied. Fixture
|
||
`shitty/archer_multi_episode/` updated from anti-regression-of-bug
|
||
to anti-regression-of-fix.
|
||
- **Apostrophes in titles no longer push the release through the AI
|
||
fallback.** `Honey.Don't.2025.2160p.WEBRip.DSNP.DV.HDR.x265-Amen`
|
||
previously parsed with `parse_path="ai"` and everything UNKNOWN
|
||
because `'` is in the forbidden-chars list. Apostrophes are now
|
||
pre-stripped before the well-formed check, so the parse completes
|
||
normally (`title=Honey.Dont, year=2025, quality=2160p, ...`); only
|
||
the title text loses its apostrophe. `parse_path` becomes
|
||
`sanitized` to surface the cleanup. Side win: PoP fixture
|
||
`the_prodigy_full_chaos/` also moves from total failure to a
|
||
partially-correct parse (year, source, codec extracted).
|
||
- **Season-range markers (`Sxx-yy`) are now recognized as
|
||
`tv_complete`.** `Der.Tatortreiniger.S01-06.GERMAN...` previously
|
||
parsed as `media_type=movie` with `S01-06` glued onto the title.
|
||
The parser now recognizes the range, sets `season=first`,
|
||
`media_type=tv_complete`, and removes the marker from the title.
|
||
`is_season_pack` flips to `true`.
|
||
- **Pure-punctuation TITLE tokens are dropped at assembly.** Releases
|
||
with surrounding ` - ` separators (`Vinyl - 1x01 - FHD`) previously
|
||
produced `title="Vinyl.-"`. Such tokens (a stray dash, a wide pipe
|
||
`|`, …) carry no title content and are now filtered out. Side
|
||
effect: PoP fixture `khruangbin_yt_wide_pipe/` also benefits — the
|
||
YouTube wide-pipe no longer leaks into the title.
|
||
|
||
### Added
|
||
|
||
- **Fullwidth vertical bar `|` (U+FF5C) is now a recognized release-name
|
||
token separator.** Added to `alfred/knowledge/release/separators.yaml`
|
||
so CJK release names (and the occasional decorative YouTube-style use)
|
||
tokenize cleanly instead of leaving the wide pipe glued onto an
|
||
adjacent token. The tokenizer in
|
||
`alfred/domain/release/parser/pipeline.py` already iterates the
|
||
separator list as plain strings (no regex), so a multi-byte UTF-8
|
||
separator works without any code change.
|
||
|
||
- **`InspectedResult.recommended_action` property** — derived hint that
|
||
collapses the orchestrator's go / wait / skip decision into a single
|
||
value (``"process"`` / ``"ask_user"`` / ``"skip"``). Centralizes the
|
||
exclusion logic that was previously dispersed across road /
|
||
media_type / main_video checks at each call site. Ordering is part of
|
||
the contract: ``skip`` (no main video, or media_type == ``"other"``)
|
||
wins over ``ask_user`` (media_type == ``"unknown"`` or road ==
|
||
``"path_of_pain"``) which wins over ``process``. Surfaced through the
|
||
``analyze_release`` tool so the LLM can route on it directly.
|
||
6 new tests in ``tests/application/test_inspect.py`` cover the four
|
||
branches and the precedence rules.
|
||
- **`LanguageRepository` port** in `alfred.domain.shared.ports`. Structural
|
||
Protocol covering `from_iso`, `from_any`, `all`, `__contains__`, `__len__`
|
||
— the surface previously coupled to the concrete `LanguageRegistry`.
|
||
Mirrors the `MediaProber` / `FilesystemScanner` pattern: domain code
|
||
depends on the Protocol, infrastructure provides the YAML-backed
|
||
adapter. Tests in `tests/infrastructure/test_language_registry.py`.
|
||
|
||
### Changed
|
||
|
||
- **`Movie` and `Episode` are now frozen dataclasses.** Both entities
|
||
hold their track collections as `tuple[AudioTrack, ...]` and
|
||
`tuple[SubtitleTrack, ...]` instead of mutable lists, and are
|
||
`@dataclass(frozen=True, eq=False)` (identity-based equality
|
||
preserved via `__eq__`/`__hash__`). `__post_init__` coercion uses
|
||
`object.__setattr__` for the `imdb_id` / `title` /
|
||
`season_number` / `episode_number` normalizations. To project
|
||
enrichment results (probe output, file metadata) callers now rebuild
|
||
via `dataclasses.replace(...)`. Pattern aligned with the recent
|
||
`ParsedRelease` freeze. `MediaWithTracks` mixin contract updated to
|
||
`tuple` accordingly. `Season` and `TVShow` remain mutable for now —
|
||
freezing the aggregate root would cascade a full reconstruction on
|
||
every `add_episode`, deferred.
|
||
- **`SubtitleCandidate` renamed to `SubtitleScanResult`.** The old name
|
||
conflated "this might become a placed subtitle" with "this is what a
|
||
scan pass produced". The class is the output of a scan/identify pass
|
||
— language/format may still be `None`, confidence reflects how sure
|
||
the classifier is, and `raw_tokens` holds the filename fragments
|
||
under analysis. `SubtitleScanResult` says that directly. Pure rename
|
||
with a refreshed docstring in `alfred/domain/subtitles/entities.py`;
|
||
no behavior change. Touches the domain entity + `__init__` export,
|
||
the matcher / identifier / utils services, the manage_subtitles use
|
||
case, the placer, the metadata store, the shared-media cross-ref
|
||
comment, and the seven test modules that imported the type.
|
||
|
||
- **`ParsedRelease` is now frozen; enrichment passes return new
|
||
instances.** The VO was mutable so `detect_media_type` and
|
||
`enrich_from_probe` could patch fields in place — a code smell in a
|
||
value object whose identity *is* its content. `ParsedRelease` is now
|
||
`@dataclass(frozen=True)`; `languages` is a `tuple[str, ...]`
|
||
instead of a `list[str]`. `enrich_from_probe` returns a new
|
||
`ParsedRelease` via `dataclasses.replace` (only allocates when at
|
||
least one field actually changed). `inspect_release` rebinds
|
||
`parsed` after both `detect_media_type` (wrapped in `MediaTypeToken`
|
||
to satisfy the strict isinstance check that now also runs on
|
||
replace) and `enrich_from_probe`. Parser pipeline now packs
|
||
`languages` as a tuple in the assemble dict. Callers updated:
|
||
`inspect_release`, `testing/recognize_folders_in_downloads.py`, and
|
||
the enrichment tests (22 call sites + language assertions switched
|
||
to tuple literals).
|
||
- **`resolve_destination` use cases take `kb` / `prober` as required
|
||
params; module-level singletons gone.** The four
|
||
`resolve_{season,episode,movie,series}_destination` use cases now
|
||
accept `kb: ReleaseKnowledge` and `prober: MediaProber` as required
|
||
arguments, matching the shape of `inspect_release`. The module-level
|
||
`_KB = YamlReleaseKnowledge()` and `_PROBER = FfprobeMediaProber()`
|
||
singletons that previously lived in
|
||
`alfred/application/filesystem/resolve_destination.py` are removed —
|
||
the application layer no longer reaches into infrastructure. The
|
||
singletons now live at the agent-tools frontier
|
||
(`alfred/agent/tools/filesystem.py`), where the LLM-facing wrappers
|
||
instantiate them once and thread them through. `analyze_release` no
|
||
longer needs the dirty `from ... import _KB` indirection. Tests
|
||
inject their own stubs by keyword (`prober=_StubProber(...)`) instead
|
||
of monkeypatching a module attribute.
|
||
- **`ParsePath` enum renamed to `TokenizationRoute`.** The old name
|
||
collided with `pathlib.Path` in code-reading mental models, and was
|
||
one letter from `parse_path` (the field that holds the value) — making
|
||
it harder than it needed to be to spot the type vs the attribute.
|
||
``TokenizationRoute`` says what it actually captures (DIRECT /
|
||
SANITIZED / AI = how the name reached the tokenizer), and the class
|
||
docstring now spells out the orthogonality with ``Road`` (EASY /
|
||
SHITTY / PATH_OF_PAIN, which captures parser confidence on
|
||
``ParseReport``). The ``parse_path`` field name stays unchanged —
|
||
string values too — so YAML fixtures, the ``analyze_release`` tool
|
||
spec, and any external consumer are untouched.
|
||
- **`enrich_from_probe` codec mappings moved to YAML.** The three
|
||
hard-coded module dicts (`_VIDEO_CODEC_MAP`, `_AUDIO_CODEC_MAP`,
|
||
`_CHANNEL_MAP`) translating ffprobe output to scene tokens
|
||
(`hevc → x265`, `eac3 → EAC3`, `8 → "7.1"`, …) now live in
|
||
`alfred/knowledge/release/probe_mappings.yaml` and are loaded into
|
||
`ReleaseKnowledge.probe_mappings` (new port field, populated by
|
||
`YamlReleaseKnowledge`). `enrich_from_probe` gains a third `kb`
|
||
parameter and reads the maps from there. Aligns with the CLAUDE.md
|
||
rule that lookup tables of domain knowledge belong in YAML, not in
|
||
Python — and opens the door to a future "learn new codec" pass.
|
||
Callers updated: `inspect_release`, `testing/recognize_folders_in_downloads.py`,
|
||
and all 22 sites in `tests/application/test_enrich_from_probe.py`.
|
||
- **`ParsedRelease.tech_string` is now a derived `@property`**
|
||
(`alfred/domain/release/value_objects.py`). It computes
|
||
`quality.source.codec` joined by dots on every access, so it stays in
|
||
sync with the underlying fields by construction. The stored field is
|
||
gone from the dataclass, the dict returned by `assemble()` no longer
|
||
carries the key, `parse_release`'s malformed-name fallback drops the
|
||
`tech_string=""` kwarg, and `enrich_from_probe` no longer re-derives
|
||
it after filling `quality`/`source`/`codec`. Closes the
|
||
parser/enrichment double-source-of-truth that `e79ca46` had to fix
|
||
reactively. The fixtures runner now injects `tech_string` alongside
|
||
`is_season_pack` since `asdict()` skips properties.
|
||
- **`RuleScope.level` is now an enum (`RuleScopeLevel`).** The set of
|
||
valid levels (global, release_group, movie, show, season, episode)
|
||
was documented only in a docstring comment and validated nowhere.
|
||
`RuleScopeLevel(str, Enum)` keeps wire compatibility (YAML
|
||
serialization, `.value` access) while making the closed set explicit
|
||
to type-checkers and IDEs. `to_dict()` emits `.value` strings so
|
||
YAML output is unchanged.
|
||
- **`FilePath` VO uses `__post_init__` instead of a hand-rolled
|
||
`__init__`.** Same public API (accepts `str | Path`), same behavior,
|
||
but the dataclass-generated `__init__` is no longer bypassed. One
|
||
less smell in the shared VOs.
|
||
- **`Language` VO is strict by default; `Language.from_raw()` factory
|
||
for normalization.** The previous `__post_init__` mutated `iso` and
|
||
`aliases` via `object.__setattr__` on a frozen dataclass — a code
|
||
smell hiding behind the dataclass facade. Split: the direct
|
||
constructor now rejects un-normalized input (uppercase iso,
|
||
whitespace in aliases, etc.), and `Language.from_raw()` handles
|
||
arbitrary YAML/user input. Only one caller (LanguageRegistry loading
|
||
the ISO YAML) needed migration.
|
||
- **`ParsedRelease.normalised` renamed to `clean`.** The field name
|
||
promised "dots instead of spaces" but in practice held
|
||
`raw - site_tag - apostrophes` — only used by `season_folder_name()`.
|
||
Renamed and docstring corrected.
|
||
- **`ParsedRelease.media_type` / `parse_path` are strict enums.** The
|
||
fields were already typed as `MediaTypeToken` / `ParsePath`, but a
|
||
tolerant `__post_init__` coerced raw strings. With both classes
|
||
being `(str, Enum)`, the coercion served no purpose. Strict
|
||
constructor; `.value` no longer passed at call sites; dropped the
|
||
unused `_VALID_MEDIA_TYPES` / `_VALID_PARSE_PATHS` lookup tables.
|
||
|
||
### Removed
|
||
|
||
- **`settings.min_movie_size_bytes`** — orphan Pydantic field +
|
||
validator. Its only consumer (`MovieService.validate_movie_file`)
|
||
had been removed during an earlier refactor. The "real movie vs
|
||
sample" rule now lives in extension-based exclusion
|
||
(`application/release/supported_media.py`) and PoP. If a size
|
||
threshold is ever needed, it'll go in a knowledge YAML, not in
|
||
`settings`.
|
||
|
||
### Internal
|
||
|
||
- **Flattened `alfred.domain.shared.media/` package into a single
|
||
`media.py` module.** The 6-file package (audio, video, subtitle,
|
||
info, matching, tracks_mixin + `__init__`) collapsed into one ~250
|
||
LoC module. All 12 import sites continue to resolve unchanged
|
||
(`from alfred.domain.shared.media import AudioTrack, MediaInfo, …`)
|
||
since Python treats `media.py` and `media/__init__.py`
|
||
interchangeably for import paths. Easier to scan when the whole
|
||
bounded-context fits on one screen.
|
||
- **`SubtitleKnowledgeBase` types `language_registry` against the
|
||
`LanguageRepository` port** instead of the concrete `LanguageRegistry`
|
||
class. The default constructor still instantiates the concrete adapter
|
||
when no repository is injected — behaviour is unchanged for existing
|
||
callers. Opens the door to in-memory fakes in future tests without
|
||
loading the full ISO 639 YAML.
|
||
- **Moved `detect_media_type` and `enrich_from_probe` from
|
||
`alfred.application.filesystem` to `alfred.application.release`**.
|
||
They are inspection-pipeline helpers — their natural home is next to
|
||
`inspect_release`, not next to the filesystem use cases. The move
|
||
also eliminates a circular-import workaround in
|
||
`resolve_destination.py`: `inspect_release` can now be imported at
|
||
module top instead of lazily inside `_resolve_parsed`. Public
|
||
surface is unchanged for callers that imported the helpers from
|
||
their full module paths (the only call sites — `inspect.py`, two
|
||
tests, one testing script — were updated in this commit).
|
||
|
||
### Added
|
||
|
||
- **`resolve_*_destination` use cases now consume `inspect_release`**.
|
||
`resolve_episode_destination` and `resolve_movie_destination` reuse
|
||
their existing `source_file` parameter as the inspection target;
|
||
`resolve_season_destination` and `resolve_series_destination` gain
|
||
a new **optional** `source_path` parameter (also threaded through
|
||
the tool wrappers and YAML specs). When the path exists, ffprobe
|
||
data fills tokens missing from the release name (e.g. quality) and
|
||
refreshes `tech_string`, so the destination folder / file names
|
||
end up more accurate. When the path is missing or absent (back-compat
|
||
callers), the use cases fall back to parse-only — same behavior as
|
||
before.
|
||
|
||
### Fixed
|
||
|
||
- **`enrich_from_probe` now refreshes `tech_string`** after filling
|
||
`quality` / `source` / `codec`. Previously the field stayed at its
|
||
parser-time value, so filename builders saw stale tech tokens even
|
||
after a successful probe. New `TestTechString` class in
|
||
`tests/application/test_enrich_from_probe.py` locks the behavior.
|
||
|
||
### Added
|
||
|
||
- **`inspect_release` orchestrator + `InspectedResult` VO**
|
||
(`alfred/application/release/inspect.py`). Single composition of the
|
||
four inspection layers: `parse_release` → `detect_media_type` (patches
|
||
`parsed.media_type`) → `find_main_video` (top-level scan) →
|
||
`prober.probe` + `enrich_from_probe` when a video exists and the
|
||
refined media type isn't in `{"unknown", "other"}`. Returns a frozen
|
||
`InspectedResult(parsed, report, source_path, main_video, media_info,
|
||
probe_used)` that downstream callers consume directly instead of
|
||
rebuilding the same chain. `kb` and `prober` are injected — no
|
||
module-level singletons. Never raises.
|
||
|
||
### Changed
|
||
|
||
- **`analyze_release` tool now delegates to `inspect_release`** — same
|
||
output shape, plus two new fields: `confidence` (0–100) and `road`
|
||
(`"easy"` / `"shitty"` / `"path_of_pain"`) surfaced from the parser's
|
||
`ParseReport`. The tool spec (`specs/analyze_release.yaml`) documents
|
||
both fields so the LLM can route releases by confidence.
|
||
|
||
- **`MediaProber` port now covers full media probing**: added
|
||
`probe(video) -> MediaInfo | None` alongside the existing
|
||
`list_subtitle_streams`. `FfprobeMediaProber` (in
|
||
`alfred/infrastructure/probe/`) implements both methods and is now
|
||
the single adapter shelling out to `ffprobe`. The standalone
|
||
`alfred/infrastructure/filesystem/ffprobe.py` module was removed —
|
||
all callers (tools, testing scripts) instantiate
|
||
`FfprobeMediaProber` instead. Unblocks the upcoming
|
||
`inspect_release` orchestrator, which depends on the port.
|
||
|
||
### Removed
|
||
|
||
- `alfred/infrastructure/filesystem/ffprobe.py` (folded into the
|
||
`FfprobeMediaProber` adapter).
|
||
|
||
---
|
||
|
||
## [2026-05-20] — Release parser confidence scoring + exclusion
|
||
|
||
### Added
|
||
|
||
- **Pre-pipeline exclusion helpers** (`alfred/application/release/supported_media.py`):
|
||
`is_supported_video(path, kb)` (extension-only check against
|
||
`kb.video_extensions`) and `find_main_video(folder, kb)` (top-level
|
||
scan, lexicographically-first eligible file, returns `None` when no
|
||
video qualifies; accepts a bare file as folder for single-file
|
||
releases). No size threshold, no filename heuristics —
|
||
PATH_OF_PAIN handles the exotic cases. Foundation for the future
|
||
`inspect_release` orchestrator.
|
||
|
||
- **Release parser — parse-confidence scoring** (`alfred/domain/release/parser/scoring.py`,
|
||
`alfred/knowledge/release/scoring.yaml`). `parse_release` now returns
|
||
`(ParsedRelease, ParseReport)`. The new `ParseReport` frozen VO
|
||
carries a 0–100 `confidence`, a `road` (`"easy"` / `"shitty"` /
|
||
`"path_of_pain"`), the residual UNKNOWN tokens, and the missing
|
||
critical fields. EASY is decided structurally (a group schema
|
||
matched); SHITTY vs PATH_OF_PAIN is decided by score against a
|
||
YAML-configurable cutoff (default 60). Weights and penalties also
|
||
live in `scoring.yaml` — title 30, media_type 20, year 15, season
|
||
10, episode 5, tech 5 each; penalty 5 per UNKNOWN token capped at
|
||
-30. `Road` is a new enum, distinct from `ParsePath` (which records
|
||
the tokenization route, not the confidence tier). `ReleaseKnowledge`
|
||
port gains a `scoring: dict` field.
|
||
|
||
### Changed
|
||
|
||
- **`parse_release` signature** is now `(name, kb) → tuple[ParsedRelease,
|
||
ParseReport]` instead of returning a bare `ParsedRelease`. Call
|
||
sites updated in `application/filesystem/resolve_destination.py` and
|
||
`agent/tools/filesystem.py`. Tests updated accordingly.
|
||
|
||
---
|
||
|
||
## [2026-05-20] — Release parser v2 (EASY + SHITTY)
|
||
|
||
### Added
|
||
|
||
- **Release parser v2 — EASY path live** (`alfred/domain/release/parser/`):
|
||
new annotate-based pipeline (tokenize → annotate → assemble) drives
|
||
releases from known groups. Exposes `Token` (frozen VO with `index` +
|
||
`role` + `extra`), `TokenRole` enum (structural/technical/meta families),
|
||
and `GroupSchema` / `SchemaChunk` value objects.
|
||
- `pipeline.tokenize`: string-ops separator split (no regex), strips
|
||
a `[site.tag]` prefix/suffix first.
|
||
- `pipeline.annotate`: detects the trailing group right-to-left
|
||
(priority to `codec-GROUP` shape, fallback to any non-source dashed
|
||
token), looks up its `GroupSchema`, then walks tokens and schema
|
||
chunks in lockstep — optional chunks that don't match are skipped,
|
||
mandatory mismatches abort EASY and return `None` so the caller can
|
||
fall back to SHITTY.
|
||
- `pipeline.assemble`: folds annotated tokens into a
|
||
`ParsedRelease`-compatible dict.
|
||
- `parse_release` (in `release.services`) tries the v2 EASY path first
|
||
and falls through to the legacy SHITTY heuristic on `None`. Legacy
|
||
SHITTY/PATH OF PAIN behavior is unchanged.
|
||
- Knowledge: `alfred/knowledge/release/release_groups/{kontrast,elite,
|
||
rarbg}.yaml` declare the canonical chunk order per group, loaded via
|
||
new `ReleaseKnowledge.group_schema(name)` port method.
|
||
- Tests in `tests/domain/release/test_parser_v2_{scaffolding,easy}.py`
|
||
cover token VOs, site-tag stripping, group detection, schema-driven
|
||
annotation (movie, TV episode, season pack with optional source),
|
||
and field assembly.
|
||
|
||
- **Release parser v2 — enricher pass** completes the EASY pipeline.
|
||
The structural schema walk now tolerates non-positional tokens
|
||
between chunks (instead of aborting on leftover tokens), and a second
|
||
pass tags them with audio / video-meta / edition / language roles.
|
||
Multi-token sequences from `audio.yaml`, `video.yaml`, `editions.yaml`
|
||
(e.g. `DTS.HD.MA`, `DV.HDR10`, `TrueHD.Atmos`, `DIRECTORS.CUT`) are
|
||
matched before single tokens. Channel layouts like `5.1` and `7.1`
|
||
(split into two tokens by the `.` separator) are detected as
|
||
consecutive pairs. Sequence members carry an `extra["sequence_member"]`
|
||
marker so `assemble` extracts the canonical value only from the
|
||
primary token. KONTRAST releases with audio / HDR / edition / language
|
||
metadata now produce a fully populated `ParsedRelease`.
|
||
|
||
- **Streaming distributor as a separate dimension** from encoding source.
|
||
New `alfred/knowledge/release/distributors.yaml` (NF, AMZN, DSNP, HMAX,
|
||
ATVP, HULU, PCOK, PMTP, CR) feeds a new `ReleaseKnowledge.distributors`
|
||
port field, a `TokenRole.DISTRIBUTOR` annotation, and a
|
||
`ParsedRelease.distributor` field. `WEB-DL` stays the source; the
|
||
platform that produced the release is now recorded distinctly. The
|
||
five entries (NF, AMZN, DSNP, HMAX, ATVP) were correspondingly removed
|
||
from `sources.yaml`.
|
||
|
||
- **Real-world release fixtures** under `tests/fixtures/releases/{easy,shitty,path_of_pain}/`,
|
||
each documenting an expected `ParsedRelease` plus the future `routing`
|
||
(library / torrents / seed_hardlinks) for the upcoming `organize_media`
|
||
refactor. EASY bucket seeded with 5 cases (movie, single-episode, season
|
||
pack, movie + noise, YTS bracket-heavy). SHITTY bucket seeded with 15
|
||
anti-regression cases covering: 3-level INTEGRALE hierarchy (Angel),
|
||
French custom titles (Buffy, La Nuit au Musée, Chérie j'ai agrandi),
|
||
multi-episode chain `S14E09E10E11` (Archer, captures E11 loss),
|
||
lowercase `s01e01` (Notre Planète), `NxNN` with ` - ` separators
|
||
(Vinyl, captures dash artifact), title-with-year-suffix (Deutschland.83),
|
||
season-range `S01-06` (Tatortreiniger, captures movie misclassification),
|
||
bare folder name (Jurassic Park,
|
||
media_type=unknown), apostrophe-in-name (Honey Don't, captures full AI-path
|
||
degeneration), SUBS-tag movie (Hook), space separators (Predator Badlands,
|
||
captures group=UNKNOWN), subs-only release (Westworld S04).
|
||
PATH OF PAIN bucket seeded with 10 worst-case fixtures covering:
|
||
UTF-8 wide pipe yt-dlp slug (Khruangbin), 3-show franchise box-set
|
||
with double season range and parens-wrapped tech (Deutschland 83-86-89,
|
||
captures `group=S03` misdetection), accented chars in title (Chérie
|
||
BéBé with VFF), 8-word stand-up comedy title (Jimmy Carr), site-tag
|
||
prefix + XviD (OxTorrent), episode title + air-date silently lost
|
||
(Prodiges), full-chaos apostrophe + spaces + Blu-ray dash + 1080i +
|
||
multi-word audio codec (The Prodigy, full AI-path degeneration),
|
||
yt-dlp YouTube ID glued to year (Sleaford Mods), bilingual `[FR-EN]`
|
||
tag mistaken for group (Super Mario Bros), COMPLETE + S01-S07 range +
|
||
REPACK + HEVC (Gilmore Girls, the well-behaved exception).
|
||
Parametrized over `tests/domain/test_release_fixtures.py` for anti-regression.
|
||
- **`NxNN` alt season/episode form supported** by `parse_release`. Releases like
|
||
`Show.1x05.720p.HDTV.x264-GRP` and `Show.2x07x08.1080p.WEB.x265-GRP` (multi-ep
|
||
alt form) now parse as TV shows.
|
||
- **`alfred/knowledge/release/separators.yaml`** declares the token separators
|
||
used by the release-name tokenizer (`.`, ` `, `[`, `]`, `(`, `)`, `_`). New
|
||
conventions can be added without code changes. The canonical `.` is always
|
||
present even if missing from YAML.
|
||
|
||
### Changed
|
||
|
||
- **Release parser v2 — SHITTY simplified to dict-driven tagging**.
|
||
The legacy ~480-line heuristic block in `release/services.py` is gone;
|
||
`pipeline._annotate_shitty` does a single pass that looks each token
|
||
up in the kb buckets (resolutions / sources / codecs / distributors /
|
||
year / `SxxExx`) with first-match-wins semantics, and the leftmost
|
||
contiguous UNKNOWN run becomes the title. `annotate()` no longer
|
||
returns `None` — SHITTY is the always-on fallback when no group schema
|
||
matches. `services.py` shrunk from ~525 to ~85 lines. Four fixtures
|
||
(`deutschland_franchise_box`, `sleaford_yt_slug`,
|
||
`super_mario_bilingual`, `predator_space_separators` — the last one
|
||
moved from `shitty/` → `path_of_pain/`) are now marked
|
||
`pytest.mark.xfail(strict=False)` documenting PoP-grade pathologies
|
||
that SHITTY intentionally won't handle. `ReleaseFixture` grows an
|
||
`xfail_reason` field; the parametrized suite wires the xfail mark
|
||
automatically.
|
||
|
||
- **`parse_release` tokenizer is now data-driven**: it splits on any character
|
||
listed in `separators.yaml` (regex character class) instead of `name.split(".")`.
|
||
This makes YTS-style releases (`The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]`),
|
||
space-separated names (`Inception 2010 1080p BluRay x264-GROUP`), and
|
||
underscore-separated names parse correctly via the direct path — no more
|
||
fallback through sanitization.
|
||
- **`parse_release` flow simplified**: site-tag extraction always runs first
|
||
(so `parse_path == "sanitized"` now reliably indicates a stripped `[tag]`),
|
||
then well-formedness is checked only against truly forbidden chars
|
||
(anything not in the configured separator set).
|
||
- **ISO 639-2/B is now the canonical language code project-wide** (was a mix of
|
||
639-1 and 639-2/T):
|
||
- `SubtitlePreferences.languages` default is now `["fre", "eng"]` (was
|
||
`["fr", "en"]`). Old LTM files are not auto-migrated — delete
|
||
`data/memory/ltm.json` to regenerate with the new defaults.
|
||
- Subtitle output filenames are now `{iso639_2b}.srt` (e.g. `fre.srt`,
|
||
`fre.sdh.srt`). Existing `fr.srt` files are still **read** correctly
|
||
(recognized as French via alias) but new files are written canonically.
|
||
- `Language` value object docstring corrected: it has always stored 639-2/B
|
||
(matching what ffprobe emits), not 639-2/T as previously documented.
|
||
- **`MovieService.validate_movie_file` minimum size is now configurable** via
|
||
`settings.min_movie_size_bytes` (default unchanged: 100 MB). Constructor
|
||
accepts an optional `min_movie_size_bytes` override for tests.
|
||
- **`SubtitleKnowledgeBase` delegates language lookup to `LanguageRegistry`**
|
||
rather than duplicating tokens. `subtitles.yaml` now only declares
|
||
subtitle-specific tokens (e.g. `vostfr`, `vf`, `vff`) under a new
|
||
`language_tokens` section.
|
||
|
||
### Removed
|
||
|
||
- **`alfred/domain/tv_shows/services.py`** and **`alfred/domain/movies/services.py`**
|
||
deleted entirely. They held fossil parsers (`parse_episode_filename`,
|
||
`extract_movie_metadata`, …) with zero production callers — superseded by
|
||
`parse_release` as the single source of truth for release-name parsing.
|
||
Associated tests (`tests/domain/test_movies.py`, `tests/domain/test_tv_shows_service.py`)
|
||
removed as well.
|
||
- `_sanitize` and `_normalize` helpers in `alfred/domain/release/services.py` —
|
||
the new tokenizer makes them redundant.
|
||
- `_LANG_KEYWORDS`, `_SDH_TOKENS`, `_FORCED_TOKENS`, `SUBTITLE_EXTENSIONS`
|
||
hardcoded dicts in `alfred/domain/subtitles/scanner.py` — all knowledge now
|
||
lives in YAML (CLAUDE.md compliance).
|
||
- `_MIN_MOVIE_SIZE_BYTES` module-level constant in
|
||
`alfred/domain/movies/services.py` — replaced by the new setting.
|
||
- Top-level `languages:` block in `subtitles.yaml` — superseded by
|
||
`language_tokens:` (subtitle-specific only) since iso_languages.yaml is the
|
||
canonical source.
|
||
|
||
### Fixed
|
||
|
||
- **`hi` token no longer marks a subtitle as SDH** (it conflicted with the
|
||
ISO 639-1 alias for Hindi). SDH is now detected only via `sdh`, `cc`, and
|
||
`hearing` tokens.
|
||
- `SubtitleKnowledgeBase` default rules used `"fra"` while
|
||
`iso_languages.yaml` exposes French as `"fre"` — preferred languages
|
||
defaults now match the canonical form.
|
||
|
||
### Internal
|
||
|
||
- **Domain I/O extraction** (`refactor/domain-io-extraction`): the domain
|
||
layer no longer performs subprocess calls, filesystem scans, or YAML
|
||
loading. Achieved in a series of focused commits:
|
||
- **Knowledge YAML loaders moved to infrastructure**:
|
||
`alfred/domain/release/knowledge.py`,
|
||
`alfred/domain/shared/knowledge/language_registry.py`, and
|
||
`alfred/domain/subtitles/knowledge/{base,loader}.py` relocated to
|
||
`alfred/infrastructure/knowledge/`. Re-exports were dropped — callers
|
||
import directly from the new location.
|
||
- **`MediaProber` and `FilesystemScanner` Protocol ports** introduced at
|
||
`alfred/domain/shared/ports/` with frozen-dataclass DTOs
|
||
(`SubtitleStreamInfo`, `FileEntry`). `SubtitleIdentifier` and
|
||
`PatternDetector` are now constructor-injected with concrete adapters
|
||
(`FfprobeMediaProber` wrapping `subprocess.run(ffprobe)` and
|
||
`PathlibFilesystemScanner` wrapping `pathlib`). No more direct
|
||
`subprocess`/`pathlib` usage from the subtitle domain services.
|
||
- **Live filesystem methods removed from VOs and entities**:
|
||
`FilePath.exists()` / `.is_file()` / `.is_dir()` deleted —
|
||
`FilePath` is now a pure address VO. `Movie.has_file()` and
|
||
`Episode.is_downloaded()` dropped. Callers either rely on a prior
|
||
detection step or use try/except over pre-checks (eliminates
|
||
TOCTOU races).
|
||
- **`SubtitlePlacer` moved to the application layer** at
|
||
`alfred/application/subtitles/placer.py` — it performs `os.link`
|
||
I/O, which doesn't belong in the domain. Pre-checks replaced with
|
||
try/except for `FileNotFoundError`/`FileExistsError`.
|
||
- **`SubtitleRuleSet.resolve()` no longer reaches into the knowledge
|
||
base**: the implicit `DEFAULT_RULES()` helper is gone, replaced by
|
||
an explicit `default_rules: SubtitleMatchingRules` parameter. The
|
||
`ManageSubtitles` use case loads defaults from the KB once and
|
||
passes them in.
|
||
- **`SubtitleKnowledge` Protocol port** at
|
||
`alfred/domain/subtitles/ports/knowledge.py` declares the read-only
|
||
query surface domain services consume (7 methods:
|
||
`known_extensions`, `format_for_extension`, `language_for_token`,
|
||
`is_known_lang_token`, `type_for_token`, `is_known_type_token`,
|
||
`patterns`). `SubtitleIdentifier` and `PatternDetector` depend on
|
||
this Protocol instead of the concrete `SubtitleKnowledgeBase` from
|
||
infrastructure — `domain/subtitles/` now has zero imports from
|
||
`infrastructure/`. The remaining domain → infra leak
|
||
(`domain/release/` loading separator YAML at import-time) is
|
||
documented in tech-debt and scheduled for its own branch.
|
||
- **`to_dot_folder_name(title)` helper** in
|
||
`alfred/domain/shared/value_objects.py` — extracts the
|
||
`re.sub(r"[^\w\s\.\-]", "", title).replace(" ", ".")` pattern that was
|
||
duplicated between `MovieTitle.normalized()` and `TVShow.get_folder_name()`.
|
||
- **`ParsedRelease.languages` uses `field(default_factory=list)`** instead of
|
||
a manual `__post_init__` that assigned `[]` via `object.__setattr__`.
|
||
- **`file_extensions.yaml` splits subtitle sidecars (`.srt`, `.sub`, `.idx`,
|
||
`.ass`, `.ssa`) into a dedicated `subtitle:` category** instead of lumping
|
||
them under `metadata:`. The `_METADATA_EXTENSIONS` set used by
|
||
`detect_media_type` remains the union of both (same behavior — subtitles
|
||
are still ignored when deciding the media type of a folder), but a new
|
||
`load_subtitle_extensions()` loader is now available for the subtitles
|
||
domain. Sematic clarity, no functional change.
|
||
- **`tv_shows/entities.py` module docstring** now shows the aggregate
|
||
ownership as an ASCII tree before the rule text — quicker visual scan
|
||
of the DDD structure.
|
||
- Removed backward-compat shims `_sanitise_for_fs` /
|
||
`_strip_episode_from_normalised` from `domain/release/value_objects.py`
|
||
(zero callers).
|
||
- Cleaned ruff warnings across the codebase: `subprocess.run` calls now pass
|
||
explicit `check=False` (PLW1510); lazy imports promoted to module top where
|
||
there was no cycle (PLC0415 in `manage_subtitles.py`, `placer.py`,
|
||
`qbittorrent/client.py`, `file_manager.py`); fixed module-level import
|
||
ordering (E402) in `language_registry.py` and `subtitles/knowledge/loader.py`;
|
||
removed unused locals (F841 / B007); replaced unnecessary set comprehension
|
||
with `set()` in `release/knowledge.py` (C416).
|
||
- Ruff config: ignore `PLR0911` / `PLR0912` (too-many-returns / too-many-branches)
|
||
globally — noisy on parser mappers and orchestrator use-cases where early-return
|
||
validation is essential complexity. Ignore `PLW0603` for the documented memory
|
||
singleton (`infrastructure/persistence/context.py`).
|
||
- **Release-knowledge DDD purification** (`refactor/domain-release-knowledge`):
|
||
the last domain → infrastructure leak (`domain/release/value_objects.py`
|
||
loading YAML at import-time) is gone. Achieved via:
|
||
- **`ReleaseKnowledge` Protocol port** at
|
||
`alfred/domain/release/ports/knowledge.py` declares the read-only query
|
||
surface release parsing needs (token sets for resolutions, sources, codecs,
|
||
languages, hdr extras; structured dicts for audio, video_meta, editions,
|
||
media_type_tokens; separators list; file-extension sets used by
|
||
application/infra callers; `sanitize_for_fs(text)` method).
|
||
- **`YamlReleaseKnowledge` adapter** at
|
||
`alfred/infrastructure/knowledge/release_kb.py` loads every YAML constant
|
||
once at construction. Builds an immutable `str.maketrans` translation
|
||
table for filesystem sanitization.
|
||
- **`parse_release(name, kb)`** takes the knowledge as an explicit
|
||
parameter — no more module-level YAML loading inside the domain. Every
|
||
internal helper (`_tokenize`, `_extract_tech`, `_extract_languages`,
|
||
`_extract_audio`, `_extract_video_meta`, `_extract_edition`,
|
||
`_extract_title`, `_infer_media_type`, `_is_well_formed`) takes `kb`.
|
||
- **`ParsedRelease` Option B**: sanitization happens once at parse time
|
||
and is stored on a new `title_sanitized: str` field. Builder methods
|
||
(`show_folder_name`, `season_folder_name`, `episode_filename`,
|
||
`movie_folder_name`, `movie_filename`) are now pure — they accept
|
||
already-sanitized `tmdb_title_safe` / `tmdb_episode_title_safe`
|
||
arguments. Callers at the use-case boundary sanitize TMDB strings
|
||
via `kb.sanitize_for_fs(...)` before passing them in.
|
||
- **All domain-knowledge constants removed from `value_objects.py`**:
|
||
`_RESOLUTIONS`, `_SOURCES`, `_CODECS`, `_AUDIO`, `_VIDEO_META`,
|
||
`_EDITIONS`, `_HDR_EXTRA`, `_MEDIA_TYPE_TOKENS`, `_LANGUAGE_TOKENS`,
|
||
`_FORBIDDEN_CHARS`, `_VIDEO_EXTENSIONS`, `_NON_VIDEO_EXTENSIONS`,
|
||
`_SUBTITLE_EXTENSIONS`, `_METADATA_EXTENSIONS`, `_WIN_FORBIDDEN_TABLE`,
|
||
and the `_sanitize_for_fs` helper. The domain module is now pure.
|
||
- **Application-layer KB singleton**: `resolve_destination.py` instantiates
|
||
a module-level `_KB: ReleaseKnowledge = YamlReleaseKnowledge()` and
|
||
threads it through every `parse_release(...)` call. The local
|
||
`_sanitize` helper and `_WIN_FORBIDDEN` regex were dropped in favor of
|
||
`_KB.sanitize_for_fs(...)`.
|
||
- **`detect_media_type(parsed, source_path, kb)` and
|
||
`find_video_file(path, kb)`** now take the knowledge explicitly
|
||
instead of importing `_*_EXTENSIONS` constants from the domain.
|
||
`agent/tools/filesystem.py::analyze_release` imports the application
|
||
KB singleton and passes it through.
|
||
|
||
---
|
||
|
||
## [2026-05-17] — TVShow & Movie aggregate refactor
|
||
|
||
Multi-phase refonte of the TV show domain into a real DDD aggregate, with
|
||
matching parity work on `Movie`, a language knowledge system, and the
|
||
`shared/media` restructure that supports both.
|
||
|
||
### Added
|
||
|
||
- **Language knowledge system** (`alfred/knowledge/iso_languages.yaml` + 42
|
||
languages including `und` for undetermined).
|
||
- `Language` value object (frozen dataclass) with `iso`, `english_name`,
|
||
`native_name`, `aliases`, and a `matches(raw)` cross-format helper.
|
||
- `LanguageRegistry` loader (`alfred/domain/shared/knowledge/`) merging
|
||
builtin + learned YAML. Not a singleton — the application layer
|
||
instantiates it.
|
||
- ISO 639-2/B is the canonical key; aliases cover 639-1, 639-2/T, English
|
||
name, native name, and common spellings.
|
||
- **`VideoTrack`** dataclass (`alfred/domain/shared/media/video.py`) with a
|
||
`resolution` property using width-priority bucket detection (handles
|
||
cinema/scope crops like 1920×960 → 1080p).
|
||
- **`shared/media/matching.py`** — `track_lang_matches` helper shared by
|
||
`Episode` and `Movie`. Implements the **"C+" contract** for language helpers:
|
||
- `Language` query → cross-format match via `Language.matches()`
|
||
- `str` query → case-insensitive direct comparison (no normalization)
|
||
- **TVShow aggregate composition**:
|
||
- `TVShow.seasons: dict[SeasonNumber, Season]`
|
||
- `Season.episodes: dict[EpisodeNumber, Episode]`
|
||
- `Season.expected_episodes` / `Season.aired_episodes` (split so collection
|
||
state can compare "owned vs aired today" without confusing in-flight
|
||
seasons with future ones)
|
||
- **Aggregate methods on `TVShow`**:
|
||
- `add_episode(ep)` — sole sanctioned mutation entry point (creates the
|
||
season if missing)
|
||
- `add_season(season)` — replaces a season wholesale
|
||
- `collection_status()` → `CollectionStatus.{EMPTY, PARTIAL, COMPLETE}`
|
||
- `is_complete_series()` — true iff `ENDED + COMPLETE`
|
||
- `missing_episodes()` — flat list of all aired-but-not-owned
|
||
`(season, episode)` pairs
|
||
- **`CollectionStatus`** enum (orthogonal to `ShowStatus`).
|
||
- **Episode track helpers** (`has_audio_in`, `has_subtitles_in`,
|
||
`has_forced_subs`, `audio_languages`, `subtitle_languages`), driven by
|
||
`Episode.audio_tracks` / `Episode.subtitle_tracks`.
|
||
- **Movie aggregate parity** — `Movie` now carries `audio_tracks` /
|
||
`subtitle_tracks` and exposes the same helpers as `Episode` (same C+
|
||
contract).
|
||
- **`CHANGELOG.md`** (this file).
|
||
|
||
### Changed
|
||
|
||
- **`shared/media_info.py` exploded into `shared/media/{audio,video,subtitle,info,matching}.py`.**
|
||
`MediaInfo` is now symmetric: every stream type is a `list[Track]`. Flat
|
||
accessors (`width`, `height`, `video_codec`, `resolution`) remain as
|
||
properties that read the first video track.
|
||
- **`MediaInfo.duration_seconds` / `bitrate_kbps`** moved from `VideoTrack` to
|
||
`MediaInfo` (file-level — they come from the ffprobe `format` block, not a
|
||
stream). Files without a video stream now correctly expose duration.
|
||
- **`ShowStatus.from_string`** extended to map TMDB strings (`Returning
|
||
Series`, `In Production`, `Pilot`, `Planned`, `Canceled`, `Cancelled`).
|
||
Comparison is whitespace-trimmed and case-insensitive.
|
||
- **`Season` / `Episode`** dropped their `show_imdb_id` back-references. They
|
||
are owned by `TVShow` and reached only through it.
|
||
- **`TVShow.seasons_count` and `episode_count`** are now `@property` (computed
|
||
from the dict) instead of stored ints.
|
||
- **`TVShowService.parse_episode_from_filename`** rewritten in string
|
||
operations (no regex). Supports `S01E05` / `s1e5` and `1x05` / `01x5` forms.
|
||
- **`TVShowService.find_next_episode`** now drives off
|
||
`show.missing_episodes()` instead of the hardcoded "max 50 episodes per
|
||
season" heuristic.
|
||
- **`TVShowService` constructor** no longer takes `season_repository` /
|
||
`episode_repository` — the aggregate persists in one block via
|
||
`TVShowRepository` only.
|
||
- **`SubtitleTrack` in `alfred.domain.subtitles.entities` renamed to
|
||
`SubtitleCandidate`.** Coexists with the `shared.media.SubtitleTrack`
|
||
ffprobe-view dataclass (different bounded contexts, kept separate
|
||
intentionally).
|
||
- **`tv_shows/services.py` `_VIDEO_EXTENSIONS`** now loaded from
|
||
`knowledge/release/file_extensions.yaml` via `load_video_extensions()`
|
||
(single source of truth).
|
||
- **`CLAUDE.md`** updated with three new policy sections:
|
||
- "Tests" — small updates OK during normal work, no mass-update sprees
|
||
- "Backwards-compatibility shims" — prefer clean migration over shims
|
||
- "Regex" — not forbidden, use judgment when string ops would be fragile
|
||
|
||
### Removed
|
||
|
||
- **Legacy `Season N Episode N` filename form** in
|
||
`TVShowService.parse_episode_from_filename`. It never appears in the release
|
||
names Alfred handles, and supporting it forced a regex.
|
||
- **`SeasonRepository` and `EpisodeRepository`** — only the aggregate root has
|
||
a repository (DDD rule: one repo per aggregate).
|
||
- **`shared/media_info.py`** compatibility shim — callers updated.
|
||
- **`SubtitleTrack` compatibility alias** in `subtitles.entities` — callers
|
||
updated to `SubtitleCandidate`.
|
||
|
||
### Fixed
|
||
|
||
- **`MediaInfo.duration_seconds` returns `None` on audio-only files** instead
|
||
of crashing through `primary_video.duration_seconds` (see the duration/bitrate
|
||
move under **Changed**).
|
||
- **`MediaOrganizer`** (`infrastructure/filesystem/organizer.py`) no longer
|
||
passes the removed `show_imdb_id` / `episode_count` kwargs when constructing
|
||
a `Season` for folder-name generation.
|
||
|
||
### Internal
|
||
|
||
- Test suite rewritten where the aggregate redesign broke fixtures:
|
||
`tests/domain/test_tv_shows.py` (69 tests), `tests/domain/test_media_info.py`
|
||
(rewritten for `VideoTrack`), `tests/application/test_enrich_from_probe.py`
|
||
(helper added), `tests/infrastructure/test_filesystem_extras.py` (fixtures),
|
||
`tests/domain/test_tv_shows_service.py` (find_next_episode driven by real
|
||
aggregate state).
|
||
- Subtitle services internal migration: `matcher.py`, `utils.py`, `placer.py`,
|
||
`identifier.py` updated to import `SubtitleCandidate`.
|
||
- Suite status at end of block: **1066 passed, 8 skipped, 0 failed**.
|