9f1ce94690
Remove the module-level _KB / _PROBER singletons from
alfred/application/filesystem/resolve_destination.py. The four
resolve_{season,episode,movie,series}_destination use cases now take
kb: ReleaseKnowledge and prober: MediaProber as required arguments,
matching the shape of inspect_release.
The singletons now live at the agent-tools frontier
(alfred/agent/tools/filesystem.py), where the LLM-facing wrappers
instantiate YamlReleaseKnowledge / FfprobeMediaProber once and thread
them through. The wrappers' Python signatures are unchanged — the
inspect-based JSON-schema generator in agent/registry.py still sees the
same LLM-passable params.
analyze_release drops the dirty 'from ... import _KB' indirection.
Tests inject their own stubs by keyword (prober=_StubProber(...)) via
thin convenience wrappers, replacing the prior
monkeypatch.setattr(rd, '_PROBER', ...) pattern.
testing/debug_release.py: instantiate YamlReleaseKnowledge() /
FfprobeMediaProber() inline at the two call sites.
Suite: 1077 passed.
665 lines
37 KiB
Markdown
665 lines
37 KiB
Markdown
# Changelog
|
||
|
||
All notable changes to Alfred are documented here.
|
||
|
||
The format is loosely based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
|
||
Alfred is not yet on SemVer — entries are grouped by **dated work blocks** instead
|
||
of release numbers. Granularity targets behavioral or API-visible changes; refer
|
||
to `git log` for commit-level detail.
|
||
|
||
Sections used per block: **Added** / **Changed** / **Deprecated** / **Removed** /
|
||
**Fixed** / **Internal** (for tech-debt and refactor noise that doesn't affect
|
||
callers).
|
||
|
||
---
|
||
|
||
## [Unreleased]
|
||
|
||
### Fixed
|
||
|
||
- **Multi-episode chain (e.g. `S14E09E10E11`) now collapses to a full
|
||
range.** The parser previously captured `episode=9, episode_end=10`
|
||
and dropped E11+. It now returns `episode=first, episode_end=last`,
|
||
with intermediate values implied. Fixture
|
||
`shitty/archer_multi_episode/` updated from anti-regression-of-bug
|
||
to anti-regression-of-fix.
|
||
- **Apostrophes in titles no longer push the release through the AI
|
||
fallback.** `Honey.Don't.2025.2160p.WEBRip.DSNP.DV.HDR.x265-Amen`
|
||
previously parsed with `parse_path="ai"` and everything UNKNOWN
|
||
because `'` is in the forbidden-chars list. Apostrophes are now
|
||
pre-stripped before the well-formed check, so the parse completes
|
||
normally (`title=Honey.Dont, year=2025, quality=2160p, ...`); only
|
||
the title text loses its apostrophe. `parse_path` becomes
|
||
`sanitized` to surface the cleanup. Side win: PoP fixture
|
||
`the_prodigy_full_chaos/` also moves from total failure to a
|
||
partially-correct parse (year, source, codec extracted).
|
||
- **Season-range markers (`Sxx-yy`) are now recognized as
|
||
`tv_complete`.** `Der.Tatortreiniger.S01-06.GERMAN...` previously
|
||
parsed as `media_type=movie` with `S01-06` glued onto the title.
|
||
The parser now recognizes the range, sets `season=first`,
|
||
`media_type=tv_complete`, and removes the marker from the title.
|
||
`is_season_pack` flips to `true`.
|
||
- **Pure-punctuation TITLE tokens are dropped at assembly.** Releases
|
||
with surrounding ` - ` separators (`Vinyl - 1x01 - FHD`) previously
|
||
produced `title="Vinyl.-"`. Such tokens (a stray dash, a wide pipe
|
||
`|`, …) carry no title content and are now filtered out. Side
|
||
effect: PoP fixture `khruangbin_yt_wide_pipe/` also benefits — the
|
||
YouTube wide-pipe no longer leaks into the title.
|
||
|
||
### Added
|
||
|
||
- **`LanguageRepository` port** in `alfred.domain.shared.ports`. Structural
|
||
Protocol covering `from_iso`, `from_any`, `all`, `__contains__`, `__len__`
|
||
— the surface previously coupled to the concrete `LanguageRegistry`.
|
||
Mirrors the `MediaProber` / `FilesystemScanner` pattern: domain code
|
||
depends on the Protocol, infrastructure provides the YAML-backed
|
||
adapter. Tests in `tests/infrastructure/test_language_registry.py`.
|
||
|
||
### Changed
|
||
|
||
- **`resolve_destination` use cases take `kb` / `prober` as required
|
||
params; module-level singletons gone.** The four
|
||
`resolve_{season,episode,movie,series}_destination` use cases now
|
||
accept `kb: ReleaseKnowledge` and `prober: MediaProber` as required
|
||
arguments, matching the shape of `inspect_release`. The module-level
|
||
`_KB = YamlReleaseKnowledge()` and `_PROBER = FfprobeMediaProber()`
|
||
singletons that previously lived in
|
||
`alfred/application/filesystem/resolve_destination.py` are removed —
|
||
the application layer no longer reaches into infrastructure. The
|
||
singletons now live at the agent-tools frontier
|
||
(`alfred/agent/tools/filesystem.py`), where the LLM-facing wrappers
|
||
instantiate them once and thread them through. `analyze_release` no
|
||
longer needs the dirty `from ... import _KB` indirection. Tests
|
||
inject their own stubs by keyword (`prober=_StubProber(...)`) instead
|
||
of monkeypatching a module attribute.
|
||
- **`ParsePath` enum renamed to `TokenizationRoute`.** The old name
|
||
collided with `pathlib.Path` in code-reading mental models, and was
|
||
one letter from `parse_path` (the field that holds the value) — making
|
||
it harder than it needed to be to spot the type vs the attribute.
|
||
``TokenizationRoute`` says what it actually captures (DIRECT /
|
||
SANITIZED / AI = how the name reached the tokenizer), and the class
|
||
docstring now spells out the orthogonality with ``Road`` (EASY /
|
||
SHITTY / PATH_OF_PAIN, which captures parser confidence on
|
||
``ParseReport``). The ``parse_path`` field name stays unchanged —
|
||
string values too — so YAML fixtures, the ``analyze_release`` tool
|
||
spec, and any external consumer are untouched.
|
||
- **`enrich_from_probe` codec mappings moved to YAML.** The three
|
||
hard-coded module dicts (`_VIDEO_CODEC_MAP`, `_AUDIO_CODEC_MAP`,
|
||
`_CHANNEL_MAP`) translating ffprobe output to scene tokens
|
||
(`hevc → x265`, `eac3 → EAC3`, `8 → "7.1"`, …) now live in
|
||
`alfred/knowledge/release/probe_mappings.yaml` and are loaded into
|
||
`ReleaseKnowledge.probe_mappings` (new port field, populated by
|
||
`YamlReleaseKnowledge`). `enrich_from_probe` gains a third `kb`
|
||
parameter and reads the maps from there. Aligns with the CLAUDE.md
|
||
rule that lookup tables of domain knowledge belong in YAML, not in
|
||
Python — and opens the door to a future "learn new codec" pass.
|
||
Callers updated: `inspect_release`, `testing/recognize_folders_in_downloads.py`,
|
||
and all 22 sites in `tests/application/test_enrich_from_probe.py`.
|
||
- **`ParsedRelease.tech_string` is now a derived `@property`**
|
||
(`alfred/domain/release/value_objects.py`). It computes
|
||
`quality.source.codec` joined by dots on every access, so it stays in
|
||
sync with the underlying fields by construction. The stored field is
|
||
gone from the dataclass, the dict returned by `assemble()` no longer
|
||
carries the key, `parse_release`'s malformed-name fallback drops the
|
||
`tech_string=""` kwarg, and `enrich_from_probe` no longer re-derives
|
||
it after filling `quality`/`source`/`codec`. Closes the
|
||
parser/enrichment double-source-of-truth that `e79ca46` had to fix
|
||
reactively. The fixtures runner now injects `tech_string` alongside
|
||
`is_season_pack` since `asdict()` skips properties.
|
||
- **`RuleScope.level` is now an enum (`RuleScopeLevel`).** The set of
|
||
valid levels (global, release_group, movie, show, season, episode)
|
||
was documented only in a docstring comment and validated nowhere.
|
||
`RuleScopeLevel(str, Enum)` keeps wire compatibility (YAML
|
||
serialization, `.value` access) while making the closed set explicit
|
||
to type-checkers and IDEs. `to_dict()` emits `.value` strings so
|
||
YAML output is unchanged.
|
||
- **`FilePath` VO uses `__post_init__` instead of a hand-rolled
|
||
`__init__`.** Same public API (accepts `str | Path`), same behavior,
|
||
but the dataclass-generated `__init__` is no longer bypassed. One
|
||
less smell in the shared VOs.
|
||
- **`Language` VO is strict by default; `Language.from_raw()` factory
|
||
for normalization.** The previous `__post_init__` mutated `iso` and
|
||
`aliases` via `object.__setattr__` on a frozen dataclass — a code
|
||
smell hiding behind the dataclass facade. Split: the direct
|
||
constructor now rejects un-normalized input (uppercase iso,
|
||
whitespace in aliases, etc.), and `Language.from_raw()` handles
|
||
arbitrary YAML/user input. Only one caller (LanguageRegistry loading
|
||
the ISO YAML) needed migration.
|
||
- **`ParsedRelease.normalised` renamed to `clean`.** The field name
|
||
promised "dots instead of spaces" but in practice held
|
||
`raw - site_tag - apostrophes` — only used by `season_folder_name()`.
|
||
Renamed and docstring corrected.
|
||
- **`ParsedRelease.media_type` / `parse_path` are strict enums.** The
|
||
fields were already typed as `MediaTypeToken` / `ParsePath`, but a
|
||
tolerant `__post_init__` coerced raw strings. With both classes
|
||
being `(str, Enum)`, the coercion served no purpose. Strict
|
||
constructor; `.value` no longer passed at call sites; dropped the
|
||
unused `_VALID_MEDIA_TYPES` / `_VALID_PARSE_PATHS` lookup tables.
|
||
|
||
### Removed
|
||
|
||
- **`settings.min_movie_size_bytes`** — orphan Pydantic field +
|
||
validator. Its only consumer (`MovieService.validate_movie_file`)
|
||
had been removed during an earlier refactor. The "real movie vs
|
||
sample" rule now lives in extension-based exclusion
|
||
(`application/release/supported_media.py`) and PoP. If a size
|
||
threshold is ever needed, it'll go in a knowledge YAML, not in
|
||
`settings`.
|
||
|
||
### Internal
|
||
|
||
- **Flattened `alfred.domain.shared.media/` package into a single
|
||
`media.py` module.** The 6-file package (audio, video, subtitle,
|
||
info, matching, tracks_mixin + `__init__`) collapsed into one ~250
|
||
LoC module. All 12 import sites continue to resolve unchanged
|
||
(`from alfred.domain.shared.media import AudioTrack, MediaInfo, …`)
|
||
since Python treats `media.py` and `media/__init__.py`
|
||
interchangeably for import paths. Easier to scan when the whole
|
||
bounded-context fits on one screen.
|
||
- **`SubtitleKnowledgeBase` types `language_registry` against the
|
||
`LanguageRepository` port** instead of the concrete `LanguageRegistry`
|
||
class. The default constructor still instantiates the concrete adapter
|
||
when no repository is injected — behaviour is unchanged for existing
|
||
callers. Opens the door to in-memory fakes in future tests without
|
||
loading the full ISO 639 YAML.
|
||
- **Moved `detect_media_type` and `enrich_from_probe` from
|
||
`alfred.application.filesystem` to `alfred.application.release`**.
|
||
They are inspection-pipeline helpers — their natural home is next to
|
||
`inspect_release`, not next to the filesystem use cases. The move
|
||
also eliminates a circular-import workaround in
|
||
`resolve_destination.py`: `inspect_release` can now be imported at
|
||
module top instead of lazily inside `_resolve_parsed`. Public
|
||
surface is unchanged for callers that imported the helpers from
|
||
their full module paths (the only call sites — `inspect.py`, two
|
||
tests, one testing script — were updated in this commit).
|
||
|
||
### Added
|
||
|
||
- **`resolve_*_destination` use cases now consume `inspect_release`**.
|
||
`resolve_episode_destination` and `resolve_movie_destination` reuse
|
||
their existing `source_file` parameter as the inspection target;
|
||
`resolve_season_destination` and `resolve_series_destination` gain
|
||
a new **optional** `source_path` parameter (also threaded through
|
||
the tool wrappers and YAML specs). When the path exists, ffprobe
|
||
data fills tokens missing from the release name (e.g. quality) and
|
||
refreshes `tech_string`, so the destination folder / file names
|
||
end up more accurate. When the path is missing or absent (back-compat
|
||
callers), the use cases fall back to parse-only — same behavior as
|
||
before.
|
||
|
||
### Fixed
|
||
|
||
- **`enrich_from_probe` now refreshes `tech_string`** after filling
|
||
`quality` / `source` / `codec`. Previously the field stayed at its
|
||
parser-time value, so filename builders saw stale tech tokens even
|
||
after a successful probe. New `TestTechString` class in
|
||
`tests/application/test_enrich_from_probe.py` locks the behavior.
|
||
|
||
### Added
|
||
|
||
- **`inspect_release` orchestrator + `InspectedResult` VO**
|
||
(`alfred/application/release/inspect.py`). Single composition of the
|
||
four inspection layers: `parse_release` → `detect_media_type` (patches
|
||
`parsed.media_type`) → `find_main_video` (top-level scan) →
|
||
`prober.probe` + `enrich_from_probe` when a video exists and the
|
||
refined media type isn't in `{"unknown", "other"}`. Returns a frozen
|
||
`InspectedResult(parsed, report, source_path, main_video, media_info,
|
||
probe_used)` that downstream callers consume directly instead of
|
||
rebuilding the same chain. `kb` and `prober` are injected — no
|
||
module-level singletons. Never raises.
|
||
|
||
### Changed
|
||
|
||
- **`analyze_release` tool now delegates to `inspect_release`** — same
|
||
output shape, plus two new fields: `confidence` (0–100) and `road`
|
||
(`"easy"` / `"shitty"` / `"path_of_pain"`) surfaced from the parser's
|
||
`ParseReport`. The tool spec (`specs/analyze_release.yaml`) documents
|
||
both fields so the LLM can route releases by confidence.
|
||
|
||
- **`MediaProber` port now covers full media probing**: added
|
||
`probe(video) -> MediaInfo | None` alongside the existing
|
||
`list_subtitle_streams`. `FfprobeMediaProber` (in
|
||
`alfred/infrastructure/probe/`) implements both methods and is now
|
||
the single adapter shelling out to `ffprobe`. The standalone
|
||
`alfred/infrastructure/filesystem/ffprobe.py` module was removed —
|
||
all callers (tools, testing scripts) instantiate
|
||
`FfprobeMediaProber` instead. Unblocks the upcoming
|
||
`inspect_release` orchestrator, which depends on the port.
|
||
|
||
### Removed
|
||
|
||
- `alfred/infrastructure/filesystem/ffprobe.py` (folded into the
|
||
`FfprobeMediaProber` adapter).
|
||
|
||
---
|
||
|
||
## [2026-05-20] — Release parser confidence scoring + exclusion
|
||
|
||
### Added
|
||
|
||
- **Pre-pipeline exclusion helpers** (`alfred/application/release/supported_media.py`):
|
||
`is_supported_video(path, kb)` (extension-only check against
|
||
`kb.video_extensions`) and `find_main_video(folder, kb)` (top-level
|
||
scan, lexicographically-first eligible file, returns `None` when no
|
||
video qualifies; accepts a bare file as folder for single-file
|
||
releases). No size threshold, no filename heuristics —
|
||
PATH_OF_PAIN handles the exotic cases. Foundation for the future
|
||
`inspect_release` orchestrator.
|
||
|
||
- **Release parser — parse-confidence scoring** (`alfred/domain/release/parser/scoring.py`,
|
||
`alfred/knowledge/release/scoring.yaml`). `parse_release` now returns
|
||
`(ParsedRelease, ParseReport)`. The new `ParseReport` frozen VO
|
||
carries a 0–100 `confidence`, a `road` (`"easy"` / `"shitty"` /
|
||
`"path_of_pain"`), the residual UNKNOWN tokens, and the missing
|
||
critical fields. EASY is decided structurally (a group schema
|
||
matched); SHITTY vs PATH_OF_PAIN is decided by score against a
|
||
YAML-configurable cutoff (default 60). Weights and penalties also
|
||
live in `scoring.yaml` — title 30, media_type 20, year 15, season
|
||
10, episode 5, tech 5 each; penalty 5 per UNKNOWN token capped at
|
||
-30. `Road` is a new enum, distinct from `ParsePath` (which records
|
||
the tokenization route, not the confidence tier). `ReleaseKnowledge`
|
||
port gains a `scoring: dict` field.
|
||
|
||
### Changed
|
||
|
||
- **`parse_release` signature** is now `(name, kb) → tuple[ParsedRelease,
|
||
ParseReport]` instead of returning a bare `ParsedRelease`. Call
|
||
sites updated in `application/filesystem/resolve_destination.py` and
|
||
`agent/tools/filesystem.py`. Tests updated accordingly.
|
||
|
||
---
|
||
|
||
## [2026-05-20] — Release parser v2 (EASY + SHITTY)
|
||
|
||
### Added
|
||
|
||
- **Release parser v2 — EASY path live** (`alfred/domain/release/parser/`):
|
||
new annotate-based pipeline (tokenize → annotate → assemble) drives
|
||
releases from known groups. Exposes `Token` (frozen VO with `index` +
|
||
`role` + `extra`), `TokenRole` enum (structural/technical/meta families),
|
||
and `GroupSchema` / `SchemaChunk` value objects.
|
||
- `pipeline.tokenize`: string-ops separator split (no regex), strips
|
||
a `[site.tag]` prefix/suffix first.
|
||
- `pipeline.annotate`: detects the trailing group right-to-left
|
||
(priority to `codec-GROUP` shape, fallback to any non-source dashed
|
||
token), looks up its `GroupSchema`, then walks tokens and schema
|
||
chunks in lockstep — optional chunks that don't match are skipped,
|
||
mandatory mismatches abort EASY and return `None` so the caller can
|
||
fall back to SHITTY.
|
||
- `pipeline.assemble`: folds annotated tokens into a
|
||
`ParsedRelease`-compatible dict.
|
||
- `parse_release` (in `release.services`) tries the v2 EASY path first
|
||
and falls through to the legacy SHITTY heuristic on `None`. Legacy
|
||
SHITTY/PATH OF PAIN behavior is unchanged.
|
||
- Knowledge: `alfred/knowledge/release/release_groups/{kontrast,elite,
|
||
rarbg}.yaml` declare the canonical chunk order per group, loaded via
|
||
new `ReleaseKnowledge.group_schema(name)` port method.
|
||
- Tests in `tests/domain/release/test_parser_v2_{scaffolding,easy}.py`
|
||
cover token VOs, site-tag stripping, group detection, schema-driven
|
||
annotation (movie, TV episode, season pack with optional source),
|
||
and field assembly.
|
||
|
||
- **Release parser v2 — enricher pass** completes the EASY pipeline.
|
||
The structural schema walk now tolerates non-positional tokens
|
||
between chunks (instead of aborting on leftover tokens), and a second
|
||
pass tags them with audio / video-meta / edition / language roles.
|
||
Multi-token sequences from `audio.yaml`, `video.yaml`, `editions.yaml`
|
||
(e.g. `DTS.HD.MA`, `DV.HDR10`, `TrueHD.Atmos`, `DIRECTORS.CUT`) are
|
||
matched before single tokens. Channel layouts like `5.1` and `7.1`
|
||
(split into two tokens by the `.` separator) are detected as
|
||
consecutive pairs. Sequence members carry an `extra["sequence_member"]`
|
||
marker so `assemble` extracts the canonical value only from the
|
||
primary token. KONTRAST releases with audio / HDR / edition / language
|
||
metadata now produce a fully populated `ParsedRelease`.
|
||
|
||
- **Streaming distributor as a separate dimension** from encoding source.
|
||
New `alfred/knowledge/release/distributors.yaml` (NF, AMZN, DSNP, HMAX,
|
||
ATVP, HULU, PCOK, PMTP, CR) feeds a new `ReleaseKnowledge.distributors`
|
||
port field, a `TokenRole.DISTRIBUTOR` annotation, and a
|
||
`ParsedRelease.distributor` field. `WEB-DL` stays the source; the
|
||
platform that produced the release is now recorded distinctly. The
|
||
five entries (NF, AMZN, DSNP, HMAX, ATVP) were correspondingly removed
|
||
from `sources.yaml`.
|
||
|
||
- **Real-world release fixtures** under `tests/fixtures/releases/{easy,shitty,path_of_pain}/`,
|
||
each documenting an expected `ParsedRelease` plus the future `routing`
|
||
(library / torrents / seed_hardlinks) for the upcoming `organize_media`
|
||
refactor. EASY bucket seeded with 5 cases (movie, single-episode, season
|
||
pack, movie + noise, YTS bracket-heavy). SHITTY bucket seeded with 15
|
||
anti-regression cases covering: 3-level INTEGRALE hierarchy (Angel),
|
||
French custom titles (Buffy, La Nuit au Musée, Chérie j'ai agrandi),
|
||
multi-episode chain `S14E09E10E11` (Archer, captures E11 loss),
|
||
lowercase `s01e01` (Notre Planète), `NxNN` with ` - ` separators
|
||
(Vinyl, captures dash artifact), title-with-year-suffix (Deutschland.83),
|
||
season-range `S01-06` (Tatortreiniger, captures movie misclassification),
|
||
bare folder name (Jurassic Park,
|
||
media_type=unknown), apostrophe-in-name (Honey Don't, captures full AI-path
|
||
degeneration), SUBS-tag movie (Hook), space separators (Predator Badlands,
|
||
captures group=UNKNOWN), subs-only release (Westworld S04).
|
||
PATH OF PAIN bucket seeded with 10 worst-case fixtures covering:
|
||
UTF-8 wide pipe yt-dlp slug (Khruangbin), 3-show franchise box-set
|
||
with double season range and parens-wrapped tech (Deutschland 83-86-89,
|
||
captures `group=S03` misdetection), accented chars in title (Chérie
|
||
BéBé with VFF), 8-word stand-up comedy title (Jimmy Carr), site-tag
|
||
prefix + XviD (OxTorrent), episode title + air-date silently lost
|
||
(Prodiges), full-chaos apostrophe + spaces + Blu-ray dash + 1080i +
|
||
multi-word audio codec (The Prodigy, full AI-path degeneration),
|
||
yt-dlp YouTube ID glued to year (Sleaford Mods), bilingual `[FR-EN]`
|
||
tag mistaken for group (Super Mario Bros), COMPLETE + S01-S07 range +
|
||
REPACK + HEVC (Gilmore Girls, the well-behaved exception).
|
||
Parametrized over `tests/domain/test_release_fixtures.py` for anti-regression.
|
||
- **`NxNN` alt season/episode form supported** by `parse_release`. Releases like
|
||
`Show.1x05.720p.HDTV.x264-GRP` and `Show.2x07x08.1080p.WEB.x265-GRP` (multi-ep
|
||
alt form) now parse as TV shows.
|
||
- **`alfred/knowledge/release/separators.yaml`** declares the token separators
|
||
used by the release-name tokenizer (`.`, ` `, `[`, `]`, `(`, `)`, `_`). New
|
||
conventions can be added without code changes. The canonical `.` is always
|
||
present even if missing from YAML.
|
||
|
||
### Changed
|
||
|
||
- **Release parser v2 — SHITTY simplified to dict-driven tagging**.
|
||
The legacy ~480-line heuristic block in `release/services.py` is gone;
|
||
`pipeline._annotate_shitty` does a single pass that looks each token
|
||
up in the kb buckets (resolutions / sources / codecs / distributors /
|
||
year / `SxxExx`) with first-match-wins semantics, and the leftmost
|
||
contiguous UNKNOWN run becomes the title. `annotate()` no longer
|
||
returns `None` — SHITTY is the always-on fallback when no group schema
|
||
matches. `services.py` shrunk from ~525 to ~85 lines. Four fixtures
|
||
(`deutschland_franchise_box`, `sleaford_yt_slug`,
|
||
`super_mario_bilingual`, `predator_space_separators` — the last one
|
||
moved from `shitty/` → `path_of_pain/`) are now marked
|
||
`pytest.mark.xfail(strict=False)` documenting PoP-grade pathologies
|
||
that SHITTY intentionally won't handle. `ReleaseFixture` grows an
|
||
`xfail_reason` field; the parametrized suite wires the xfail mark
|
||
automatically.
|
||
|
||
- **`parse_release` tokenizer is now data-driven**: it splits on any character
|
||
listed in `separators.yaml` (regex character class) instead of `name.split(".")`.
|
||
This makes YTS-style releases (`The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]`),
|
||
space-separated names (`Inception 2010 1080p BluRay x264-GROUP`), and
|
||
underscore-separated names parse correctly via the direct path — no more
|
||
fallback through sanitization.
|
||
- **`parse_release` flow simplified**: site-tag extraction always runs first
|
||
(so `parse_path == "sanitized"` now reliably indicates a stripped `[tag]`),
|
||
then well-formedness is checked only against truly forbidden chars
|
||
(anything not in the configured separator set).
|
||
- **ISO 639-2/B is now the canonical language code project-wide** (was a mix of
|
||
639-1 and 639-2/T):
|
||
- `SubtitlePreferences.languages` default is now `["fre", "eng"]` (was
|
||
`["fr", "en"]`). Old LTM files are not auto-migrated — delete
|
||
`data/memory/ltm.json` to regenerate with the new defaults.
|
||
- Subtitle output filenames are now `{iso639_2b}.srt` (e.g. `fre.srt`,
|
||
`fre.sdh.srt`). Existing `fr.srt` files are still **read** correctly
|
||
(recognized as French via alias) but new files are written canonically.
|
||
- `Language` value object docstring corrected: it has always stored 639-2/B
|
||
(matching what ffprobe emits), not 639-2/T as previously documented.
|
||
- **`MovieService.validate_movie_file` minimum size is now configurable** via
|
||
`settings.min_movie_size_bytes` (default unchanged: 100 MB). Constructor
|
||
accepts an optional `min_movie_size_bytes` override for tests.
|
||
- **`SubtitleKnowledgeBase` delegates language lookup to `LanguageRegistry`**
|
||
rather than duplicating tokens. `subtitles.yaml` now only declares
|
||
subtitle-specific tokens (e.g. `vostfr`, `vf`, `vff`) under a new
|
||
`language_tokens` section.
|
||
|
||
### Removed
|
||
|
||
- **`alfred/domain/tv_shows/services.py`** and **`alfred/domain/movies/services.py`**
|
||
deleted entirely. They held fossil parsers (`parse_episode_filename`,
|
||
`extract_movie_metadata`, …) with zero production callers — superseded by
|
||
`parse_release` as the single source of truth for release-name parsing.
|
||
Associated tests (`tests/domain/test_movies.py`, `tests/domain/test_tv_shows_service.py`)
|
||
removed as well.
|
||
- `_sanitize` and `_normalize` helpers in `alfred/domain/release/services.py` —
|
||
the new tokenizer makes them redundant.
|
||
- `_LANG_KEYWORDS`, `_SDH_TOKENS`, `_FORCED_TOKENS`, `SUBTITLE_EXTENSIONS`
|
||
hardcoded dicts in `alfred/domain/subtitles/scanner.py` — all knowledge now
|
||
lives in YAML (CLAUDE.md compliance).
|
||
- `_MIN_MOVIE_SIZE_BYTES` module-level constant in
|
||
`alfred/domain/movies/services.py` — replaced by the new setting.
|
||
- Top-level `languages:` block in `subtitles.yaml` — superseded by
|
||
`language_tokens:` (subtitle-specific only) since iso_languages.yaml is the
|
||
canonical source.
|
||
|
||
### Fixed
|
||
|
||
- **`hi` token no longer marks a subtitle as SDH** (it conflicted with the
|
||
ISO 639-1 alias for Hindi). SDH is now detected only via `sdh`, `cc`, and
|
||
`hearing` tokens.
|
||
- `SubtitleKnowledgeBase` default rules used `"fra"` while
|
||
`iso_languages.yaml` exposes French as `"fre"` — preferred languages
|
||
defaults now match the canonical form.
|
||
|
||
### Internal
|
||
|
||
- **Domain I/O extraction** (`refactor/domain-io-extraction`): the domain
|
||
layer no longer performs subprocess calls, filesystem scans, or YAML
|
||
loading. Achieved in a series of focused commits:
|
||
- **Knowledge YAML loaders moved to infrastructure**:
|
||
`alfred/domain/release/knowledge.py`,
|
||
`alfred/domain/shared/knowledge/language_registry.py`, and
|
||
`alfred/domain/subtitles/knowledge/{base,loader}.py` relocated to
|
||
`alfred/infrastructure/knowledge/`. Re-exports were dropped — callers
|
||
import directly from the new location.
|
||
- **`MediaProber` and `FilesystemScanner` Protocol ports** introduced at
|
||
`alfred/domain/shared/ports/` with frozen-dataclass DTOs
|
||
(`SubtitleStreamInfo`, `FileEntry`). `SubtitleIdentifier` and
|
||
`PatternDetector` are now constructor-injected with concrete adapters
|
||
(`FfprobeMediaProber` wrapping `subprocess.run(ffprobe)` and
|
||
`PathlibFilesystemScanner` wrapping `pathlib`). No more direct
|
||
`subprocess`/`pathlib` usage from the subtitle domain services.
|
||
- **Live filesystem methods removed from VOs and entities**:
|
||
`FilePath.exists()` / `.is_file()` / `.is_dir()` deleted —
|
||
`FilePath` is now a pure address VO. `Movie.has_file()` and
|
||
`Episode.is_downloaded()` dropped. Callers either rely on a prior
|
||
detection step or use try/except over pre-checks (eliminates
|
||
TOCTOU races).
|
||
- **`SubtitlePlacer` moved to the application layer** at
|
||
`alfred/application/subtitles/placer.py` — it performs `os.link`
|
||
I/O, which doesn't belong in the domain. Pre-checks replaced with
|
||
try/except for `FileNotFoundError`/`FileExistsError`.
|
||
- **`SubtitleRuleSet.resolve()` no longer reaches into the knowledge
|
||
base**: the implicit `DEFAULT_RULES()` helper is gone, replaced by
|
||
an explicit `default_rules: SubtitleMatchingRules` parameter. The
|
||
`ManageSubtitles` use case loads defaults from the KB once and
|
||
passes them in.
|
||
- **`SubtitleKnowledge` Protocol port** at
|
||
`alfred/domain/subtitles/ports/knowledge.py` declares the read-only
|
||
query surface domain services consume (7 methods:
|
||
`known_extensions`, `format_for_extension`, `language_for_token`,
|
||
`is_known_lang_token`, `type_for_token`, `is_known_type_token`,
|
||
`patterns`). `SubtitleIdentifier` and `PatternDetector` depend on
|
||
this Protocol instead of the concrete `SubtitleKnowledgeBase` from
|
||
infrastructure — `domain/subtitles/` now has zero imports from
|
||
`infrastructure/`. The remaining domain → infra leak
|
||
(`domain/release/` loading separator YAML at import-time) is
|
||
documented in tech-debt and scheduled for its own branch.
|
||
- **`to_dot_folder_name(title)` helper** in
|
||
`alfred/domain/shared/value_objects.py` — extracts the
|
||
`re.sub(r"[^\w\s\.\-]", "", title).replace(" ", ".")` pattern that was
|
||
duplicated between `MovieTitle.normalized()` and `TVShow.get_folder_name()`.
|
||
- **`ParsedRelease.languages` uses `field(default_factory=list)`** instead of
|
||
a manual `__post_init__` that assigned `[]` via `object.__setattr__`.
|
||
- **`file_extensions.yaml` splits subtitle sidecars (`.srt`, `.sub`, `.idx`,
|
||
`.ass`, `.ssa`) into a dedicated `subtitle:` category** instead of lumping
|
||
them under `metadata:`. The `_METADATA_EXTENSIONS` set used by
|
||
`detect_media_type` remains the union of both (same behavior — subtitles
|
||
are still ignored when deciding the media type of a folder), but a new
|
||
`load_subtitle_extensions()` loader is now available for the subtitles
|
||
domain. Sematic clarity, no functional change.
|
||
- **`tv_shows/entities.py` module docstring** now shows the aggregate
|
||
ownership as an ASCII tree before the rule text — quicker visual scan
|
||
of the DDD structure.
|
||
- Removed backward-compat shims `_sanitise_for_fs` /
|
||
`_strip_episode_from_normalised` from `domain/release/value_objects.py`
|
||
(zero callers).
|
||
- Cleaned ruff warnings across the codebase: `subprocess.run` calls now pass
|
||
explicit `check=False` (PLW1510); lazy imports promoted to module top where
|
||
there was no cycle (PLC0415 in `manage_subtitles.py`, `placer.py`,
|
||
`qbittorrent/client.py`, `file_manager.py`); fixed module-level import
|
||
ordering (E402) in `language_registry.py` and `subtitles/knowledge/loader.py`;
|
||
removed unused locals (F841 / B007); replaced unnecessary set comprehension
|
||
with `set()` in `release/knowledge.py` (C416).
|
||
- Ruff config: ignore `PLR0911` / `PLR0912` (too-many-returns / too-many-branches)
|
||
globally — noisy on parser mappers and orchestrator use-cases where early-return
|
||
validation is essential complexity. Ignore `PLW0603` for the documented memory
|
||
singleton (`infrastructure/persistence/context.py`).
|
||
- **Release-knowledge DDD purification** (`refactor/domain-release-knowledge`):
|
||
the last domain → infrastructure leak (`domain/release/value_objects.py`
|
||
loading YAML at import-time) is gone. Achieved via:
|
||
- **`ReleaseKnowledge` Protocol port** at
|
||
`alfred/domain/release/ports/knowledge.py` declares the read-only query
|
||
surface release parsing needs (token sets for resolutions, sources, codecs,
|
||
languages, hdr extras; structured dicts for audio, video_meta, editions,
|
||
media_type_tokens; separators list; file-extension sets used by
|
||
application/infra callers; `sanitize_for_fs(text)` method).
|
||
- **`YamlReleaseKnowledge` adapter** at
|
||
`alfred/infrastructure/knowledge/release_kb.py` loads every YAML constant
|
||
once at construction. Builds an immutable `str.maketrans` translation
|
||
table for filesystem sanitization.
|
||
- **`parse_release(name, kb)`** takes the knowledge as an explicit
|
||
parameter — no more module-level YAML loading inside the domain. Every
|
||
internal helper (`_tokenize`, `_extract_tech`, `_extract_languages`,
|
||
`_extract_audio`, `_extract_video_meta`, `_extract_edition`,
|
||
`_extract_title`, `_infer_media_type`, `_is_well_formed`) takes `kb`.
|
||
- **`ParsedRelease` Option B**: sanitization happens once at parse time
|
||
and is stored on a new `title_sanitized: str` field. Builder methods
|
||
(`show_folder_name`, `season_folder_name`, `episode_filename`,
|
||
`movie_folder_name`, `movie_filename`) are now pure — they accept
|
||
already-sanitized `tmdb_title_safe` / `tmdb_episode_title_safe`
|
||
arguments. Callers at the use-case boundary sanitize TMDB strings
|
||
via `kb.sanitize_for_fs(...)` before passing them in.
|
||
- **All domain-knowledge constants removed from `value_objects.py`**:
|
||
`_RESOLUTIONS`, `_SOURCES`, `_CODECS`, `_AUDIO`, `_VIDEO_META`,
|
||
`_EDITIONS`, `_HDR_EXTRA`, `_MEDIA_TYPE_TOKENS`, `_LANGUAGE_TOKENS`,
|
||
`_FORBIDDEN_CHARS`, `_VIDEO_EXTENSIONS`, `_NON_VIDEO_EXTENSIONS`,
|
||
`_SUBTITLE_EXTENSIONS`, `_METADATA_EXTENSIONS`, `_WIN_FORBIDDEN_TABLE`,
|
||
and the `_sanitize_for_fs` helper. The domain module is now pure.
|
||
- **Application-layer KB singleton**: `resolve_destination.py` instantiates
|
||
a module-level `_KB: ReleaseKnowledge = YamlReleaseKnowledge()` and
|
||
threads it through every `parse_release(...)` call. The local
|
||
`_sanitize` helper and `_WIN_FORBIDDEN` regex were dropped in favor of
|
||
`_KB.sanitize_for_fs(...)`.
|
||
- **`detect_media_type(parsed, source_path, kb)` and
|
||
`find_video_file(path, kb)`** now take the knowledge explicitly
|
||
instead of importing `_*_EXTENSIONS` constants from the domain.
|
||
`agent/tools/filesystem.py::analyze_release` imports the application
|
||
KB singleton and passes it through.
|
||
|
||
---
|
||
|
||
## [2026-05-17] — TVShow & Movie aggregate refactor
|
||
|
||
Multi-phase refonte of the TV show domain into a real DDD aggregate, with
|
||
matching parity work on `Movie`, a language knowledge system, and the
|
||
`shared/media` restructure that supports both.
|
||
|
||
### Added
|
||
|
||
- **Language knowledge system** (`alfred/knowledge/iso_languages.yaml` + 42
|
||
languages including `und` for undetermined).
|
||
- `Language` value object (frozen dataclass) with `iso`, `english_name`,
|
||
`native_name`, `aliases`, and a `matches(raw)` cross-format helper.
|
||
- `LanguageRegistry` loader (`alfred/domain/shared/knowledge/`) merging
|
||
builtin + learned YAML. Not a singleton — the application layer
|
||
instantiates it.
|
||
- ISO 639-2/B is the canonical key; aliases cover 639-1, 639-2/T, English
|
||
name, native name, and common spellings.
|
||
- **`VideoTrack`** dataclass (`alfred/domain/shared/media/video.py`) with a
|
||
`resolution` property using width-priority bucket detection (handles
|
||
cinema/scope crops like 1920×960 → 1080p).
|
||
- **`shared/media/matching.py`** — `track_lang_matches` helper shared by
|
||
`Episode` and `Movie`. Implements the **"C+" contract** for language helpers:
|
||
- `Language` query → cross-format match via `Language.matches()`
|
||
- `str` query → case-insensitive direct comparison (no normalization)
|
||
- **TVShow aggregate composition**:
|
||
- `TVShow.seasons: dict[SeasonNumber, Season]`
|
||
- `Season.episodes: dict[EpisodeNumber, Episode]`
|
||
- `Season.expected_episodes` / `Season.aired_episodes` (split so collection
|
||
state can compare "owned vs aired today" without confusing in-flight
|
||
seasons with future ones)
|
||
- **Aggregate methods on `TVShow`**:
|
||
- `add_episode(ep)` — sole sanctioned mutation entry point (creates the
|
||
season if missing)
|
||
- `add_season(season)` — replaces a season wholesale
|
||
- `collection_status()` → `CollectionStatus.{EMPTY, PARTIAL, COMPLETE}`
|
||
- `is_complete_series()` — true iff `ENDED + COMPLETE`
|
||
- `missing_episodes()` — flat list of all aired-but-not-owned
|
||
`(season, episode)` pairs
|
||
- **`CollectionStatus`** enum (orthogonal to `ShowStatus`).
|
||
- **Episode track helpers** (`has_audio_in`, `has_subtitles_in`,
|
||
`has_forced_subs`, `audio_languages`, `subtitle_languages`), driven by
|
||
`Episode.audio_tracks` / `Episode.subtitle_tracks`.
|
||
- **Movie aggregate parity** — `Movie` now carries `audio_tracks` /
|
||
`subtitle_tracks` and exposes the same helpers as `Episode` (same C+
|
||
contract).
|
||
- **`CHANGELOG.md`** (this file).
|
||
|
||
### Changed
|
||
|
||
- **`shared/media_info.py` exploded into `shared/media/{audio,video,subtitle,info,matching}.py`.**
|
||
`MediaInfo` is now symmetric: every stream type is a `list[Track]`. Flat
|
||
accessors (`width`, `height`, `video_codec`, `resolution`) remain as
|
||
properties that read the first video track.
|
||
- **`MediaInfo.duration_seconds` / `bitrate_kbps`** moved from `VideoTrack` to
|
||
`MediaInfo` (file-level — they come from the ffprobe `format` block, not a
|
||
stream). Files without a video stream now correctly expose duration.
|
||
- **`ShowStatus.from_string`** extended to map TMDB strings (`Returning
|
||
Series`, `In Production`, `Pilot`, `Planned`, `Canceled`, `Cancelled`).
|
||
Comparison is whitespace-trimmed and case-insensitive.
|
||
- **`Season` / `Episode`** dropped their `show_imdb_id` back-references. They
|
||
are owned by `TVShow` and reached only through it.
|
||
- **`TVShow.seasons_count` and `episode_count`** are now `@property` (computed
|
||
from the dict) instead of stored ints.
|
||
- **`TVShowService.parse_episode_from_filename`** rewritten in string
|
||
operations (no regex). Supports `S01E05` / `s1e5` and `1x05` / `01x5` forms.
|
||
- **`TVShowService.find_next_episode`** now drives off
|
||
`show.missing_episodes()` instead of the hardcoded "max 50 episodes per
|
||
season" heuristic.
|
||
- **`TVShowService` constructor** no longer takes `season_repository` /
|
||
`episode_repository` — the aggregate persists in one block via
|
||
`TVShowRepository` only.
|
||
- **`SubtitleTrack` in `alfred.domain.subtitles.entities` renamed to
|
||
`SubtitleCandidate`.** Coexists with the `shared.media.SubtitleTrack`
|
||
ffprobe-view dataclass (different bounded contexts, kept separate
|
||
intentionally).
|
||
- **`tv_shows/services.py` `_VIDEO_EXTENSIONS`** now loaded from
|
||
`knowledge/release/file_extensions.yaml` via `load_video_extensions()`
|
||
(single source of truth).
|
||
- **`CLAUDE.md`** updated with three new policy sections:
|
||
- "Tests" — small updates OK during normal work, no mass-update sprees
|
||
- "Backwards-compatibility shims" — prefer clean migration over shims
|
||
- "Regex" — not forbidden, use judgment when string ops would be fragile
|
||
|
||
### Removed
|
||
|
||
- **Legacy `Season N Episode N` filename form** in
|
||
`TVShowService.parse_episode_from_filename`. It never appears in the release
|
||
names Alfred handles, and supporting it forced a regex.
|
||
- **`SeasonRepository` and `EpisodeRepository`** — only the aggregate root has
|
||
a repository (DDD rule: one repo per aggregate).
|
||
- **`shared/media_info.py`** compatibility shim — callers updated.
|
||
- **`SubtitleTrack` compatibility alias** in `subtitles.entities` — callers
|
||
updated to `SubtitleCandidate`.
|
||
|
||
### Fixed
|
||
|
||
- **`MediaInfo.duration_seconds` returns `None` on audio-only files** instead
|
||
of crashing through `primary_video.duration_seconds` (see the duration/bitrate
|
||
move under **Changed**).
|
||
- **`MediaOrganizer`** (`infrastructure/filesystem/organizer.py`) no longer
|
||
passes the removed `show_imdb_id` / `episode_count` kwargs when constructing
|
||
a `Season` for folder-name generation.
|
||
|
||
### Internal
|
||
|
||
- Test suite rewritten where the aggregate redesign broke fixtures:
|
||
`tests/domain/test_tv_shows.py` (69 tests), `tests/domain/test_media_info.py`
|
||
(rewritten for `VideoTrack`), `tests/application/test_enrich_from_probe.py`
|
||
(helper added), `tests/infrastructure/test_filesystem_extras.py` (fixtures),
|
||
`tests/domain/test_tv_shows_service.py` (find_next_episode driven by real
|
||
aggregate state).
|
||
- Subtitle services internal migration: `matcher.py`, `utils.py`, `placer.py`,
|
||
`identifier.py` updated to import `SubtitleCandidate`.
|
||
- Suite status at end of block: **1066 passed, 8 skipped, 0 failed**.
|