chore: sprint cleanup — language unification, parser unification, fossils removal

Several weeks of work accumulated without being committed. Grouped here for
clarity; see CHANGELOG.md [Unreleased] for the user-facing summary.

Highlights
----------

P1 #2 — ISO 639-2/B canonical migration
- New Language VO + LanguageRegistry (alfred/domain/shared/knowledge/).
- iso_languages.yaml as single source of truth for language codes.
- SubtitleKnowledgeBase now delegates lookup to LanguageRegistry; subtitles.yaml
  only declares subtitle-specific tokens (vostfr, vf, vff, …).
- SubtitlePreferences default → ["fre", "eng"]; subtitle filenames written as
  {iso639_2b}.srt (legacy fr.srt still read via alias).
- Scanner: dropped _LANG_KEYWORDS / _SDH_TOKENS / _FORCED_TOKENS /
  SUBTITLE_EXTENSIONS hardcoded dicts.
- Fixed: 'hi' token no longer marks SDH (conflicted with Hindi alias).
- Added settings.min_movie_size_bytes (was a module constant).

P1 #3 — Release parser unification + data-driven tokenizer
- parse_release() is now the single source of truth for release-name parsing.
- alfred/knowledge/release/separators.yaml declares the token separators used
  by the tokenizer (., space, [, ], (, ), _). New conventions can be added
  without code changes.
- Tokenizer now splits on any configured separator instead of name.split('.').
  Releases like 'The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]' parse via
  the direct path without sanitization fallback.
- Site-tag extraction always runs first; well-formedness only rejects truly
  forbidden chars.
- _parse_season_episode() extended with NxNN / NxNNxNN alt forms.
- Removed dead helpers: _sanitize, _normalize.

Domain cleanup
- Deleted fossil services with zero production callers:
    alfred/domain/movies/services.py
    alfred/domain/tv_shows/services.py
    alfred/domain/subtitles/services.py (replaced by subtitles/services/ package)
    alfred/domain/subtitles/repositories.py
- Split monolithic subtitle services into a package (identifier, matcher,
  placer, pattern_detector, utils) + dedicated knowledge/ package.
- MediaInfo split into dedicated package (alfred/domain/shared/media/:
  audio, video, subtitle, info, matching).

Persistence cleanup
- Removed dead JSON repositories (movie/subtitle/tvshow_repository.py).

Tests
- Major expansion of the test suite organized to mirror the source tree.
- Removed obsolete *_edge_cases test files superseded by structured tests.
- Suite: 990 passed, 8 skipped.

Misc
- .gitignore: exclude env_backup/ and *.bak.
- Adjustments across agent/llm, app.py, application/filesystem, and
  infrastructure/filesystem to align with the new domain layout.
This commit is contained in:
2026-05-17 23:38:00 +02:00
parent ba6f016d49
commit e07c9ec77b
99 changed files with 8833 additions and 6533 deletions
+341 -107
View File
@@ -1,10 +1,40 @@
"""Tests for TV Show domain — entities and value objects."""
"""Tests for the TV Show domain — entities, value objects, aggregate behavior.
Rewritten for the post-refactor aggregate:
* ``TVShow`` is the root, owning ``seasons: dict[SeasonNumber, Season]``.
* ``Season`` owns ``episodes: dict[EpisodeNumber, Episode]`` and tracks
``expected_episodes`` + ``aired_episodes``.
* ``Episode`` carries ``audio_tracks`` + ``subtitle_tracks`` and exposes
language helpers following contract C+ (``str`` direct compare, ``Language``
cross-format).
* No back-references on Season/Episode — they are reached through the root.
* Sole sanctioned mutation entry point: ``TVShow.add_episode(ep)``.
Coverage:
* ``TestShowStatus`` — including the extended TMDB string mapping.
* ``TestSeasonNumber`` / ``TestEpisodeNumber`` — value-object validation.
* ``TestEpisode`` — basic shape, file presence, audio/subtitle helpers.
* ``TestSeason`` — episode insertion, completeness vs aired, missing list.
* ``TestTVShow`` — aggregate invariants, ``add_episode``, ``collection_status``,
``missing_episodes``, ``is_complete_series``.
"""
from __future__ import annotations
import pytest
from alfred.domain.shared.exceptions import ValidationError
from alfred.domain.shared.media import AudioTrack, SubtitleTrack
from alfred.domain.shared.value_objects import ImdbId, Language
from alfred.domain.tv_shows.entities import Episode, Season, TVShow
from alfred.domain.tv_shows.value_objects import EpisodeNumber, SeasonNumber, ShowStatus
from alfred.domain.tv_shows.value_objects import (
CollectionStatus,
EpisodeNumber,
SeasonNumber,
ShowStatus,
)
# ---------------------------------------------------------------------------
# ShowStatus
@@ -20,11 +50,25 @@ class TestShowStatus:
def test_from_string_case_insensitive(self):
assert ShowStatus.from_string("ONGOING") == ShowStatus.ONGOING
assert ShowStatus.from_string("Ended") == ShowStatus.ENDED
assert ShowStatus.from_string(" Ended ") == ShowStatus.ENDED
def test_from_string_unknown(self):
assert ShowStatus.from_string("cancelled") == ShowStatus.UNKNOWN
@pytest.mark.parametrize(
"raw,expected",
[
("Returning Series", ShowStatus.ONGOING),
("In Production", ShowStatus.ONGOING),
("Pilot", ShowStatus.ONGOING),
("Planned", ShowStatus.ONGOING),
("Canceled", ShowStatus.ENDED),
("Cancelled", ShowStatus.ENDED),
],
)
def test_from_string_tmdb_mappings(self, raw, expected):
assert ShowStatus.from_string(raw) == expected
def test_from_string_empty_or_unknown(self):
assert ShowStatus.from_string("") == ShowStatus.UNKNOWN
assert ShowStatus.from_string("borked") == ShowStatus.UNKNOWN
# ---------------------------------------------------------------------------
@@ -34,12 +78,10 @@ class TestShowStatus:
class TestSeasonNumber:
def test_valid_season(self):
s = SeasonNumber(1)
assert s.value == 1
assert SeasonNumber(1).value == 1
def test_season_zero_is_specials(self):
s = SeasonNumber(0)
assert s.is_special()
assert SeasonNumber(0).is_special()
def test_normal_season_not_special(self):
assert not SeasonNumber(3).is_special()
@@ -69,8 +111,7 @@ class TestSeasonNumber:
class TestEpisodeNumber:
def test_valid_episode(self):
e = EpisodeNumber(1)
assert e.value == 1
assert EpisodeNumber(1).value == 1
def test_zero_raises(self):
with pytest.raises(ValidationError):
@@ -91,64 +132,107 @@ class TestEpisodeNumber:
# ---------------------------------------------------------------------------
# TVShow entity
# Episode entity
# ---------------------------------------------------------------------------
class TestTVShow:
def _make(
self, imdb_id="tt0903747", title="Breaking Bad", seasons=5, status="ended"
):
return TVShow(
imdb_id=imdb_id, title=title, seasons_count=seasons, status=status
class TestEpisode:
def _ep(self, *, season=1, episode=1, title="Pilot", **kwargs) -> Episode:
return Episode(
season_number=season,
episode_number=episode,
title=title,
**kwargs,
)
def test_basic_creation(self):
show = self._make()
assert show.title == "Breaking Bad"
assert show.seasons_count == 5
def test_basic_creation_coerces_numbers(self):
e = self._ep()
assert e.title == "Pilot"
assert isinstance(e.season_number, SeasonNumber)
assert isinstance(e.episode_number, EpisodeNumber)
def test_coerces_string_imdb_id(self):
show = self._make()
from alfred.domain.shared.value_objects import ImdbId
def test_get_filename_format(self):
e = self._ep(season=1, episode=5, title="Gray Matter")
filename = e.get_filename()
assert filename.startswith("S01E05")
assert "Gray.Matter" in filename
assert isinstance(show.imdb_id, ImdbId)
def test_has_file_false_when_no_path(self):
e = self._ep()
assert not e.has_file()
assert not e.is_downloaded()
def test_coerces_string_status(self):
show = self._make(status="ongoing")
assert show.status == ShowStatus.ONGOING
def test_str_format(self):
e = self._ep(season=2, episode=3, title="Bit by a Dead Bee")
s = str(e)
assert "S02E03" in s
assert "Bit by a Dead Bee" in s
def test_is_ongoing(self):
show = self._make(status="ongoing")
assert show.is_ongoing()
assert not show.is_ended()
# ── Audio helpers ──────────────────────────────────────────────────
def test_is_ended(self):
show = self._make(status="ended")
assert show.is_ended()
assert not show.is_ongoing()
def test_has_audio_in_with_str(self):
e = self._ep(
audio_tracks=[
AudioTrack(0, "eac3", 6, "5.1", "eng"),
AudioTrack(1, "ac3", 6, "5.1", "fre"),
]
)
assert e.has_audio_in("eng") is True
assert e.has_audio_in("ENG") is True # case-insensitive
assert e.has_audio_in("ger") is False
def test_negative_seasons_raises(self):
with pytest.raises(ValueError):
TVShow(imdb_id="tt0903747", title="X", seasons_count=-1, status="ended")
def test_has_audio_in_with_language(self):
lang = Language(iso="fre", english_name="French", native_name="Français",
aliases=("fr", "fra", "french"))
e = self._ep(
audio_tracks=[AudioTrack(0, "ac3", 6, "5.1", "fr")]
)
# str query "fre" wouldn't match "fr" directly — but Language does cross-format
assert e.has_audio_in(lang) is True
assert e.has_audio_in("fre") is False # direct compare misses
def test_invalid_imdb_id_type_raises(self):
with pytest.raises(ValueError):
TVShow(imdb_id=12345, title="X", seasons_count=1, status="ended") # type: ignore
def test_audio_languages_dedup_in_order(self):
e = self._ep(
audio_tracks=[
AudioTrack(0, "ac3", 6, "5.1", "eng"),
AudioTrack(1, "ac3", 6, "5.1", "fre"),
AudioTrack(2, "aac", 2, "stereo", "eng"), # dupe
AudioTrack(3, "aac", 2, "stereo", None), # skipped
]
)
assert e.audio_languages() == ["eng", "fre"]
def test_get_folder_name_replaces_spaces(self):
show = self._make(title="Breaking Bad")
assert show.get_folder_name() == "Breaking.Bad"
# ── Subtitle helpers ───────────────────────────────────────────────
def test_get_folder_name_strips_special_chars(self):
show = self._make(title="It's Always Sunny")
name = show.get_folder_name()
assert "'" not in name
def test_has_subtitles_in(self):
e = self._ep(
subtitle_tracks=[SubtitleTrack(0, "subrip", "fre")]
)
assert e.has_subtitles_in("fre") is True
assert e.has_subtitles_in("eng") is False
def test_str_repr(self):
show = self._make()
assert "Breaking Bad" in str(show)
assert "tt0903747" in repr(show)
def test_has_forced_subs(self):
e = self._ep(
subtitle_tracks=[
SubtitleTrack(0, "subrip", "eng", is_forced=False),
SubtitleTrack(1, "subrip", "eng", is_forced=True),
]
)
assert e.has_forced_subs() is True
def test_has_forced_subs_false_when_none(self):
e = self._ep(subtitle_tracks=[SubtitleTrack(0, "subrip", "eng")])
assert e.has_forced_subs() is False
def test_subtitle_languages_dedup_in_order(self):
e = self._ep(
subtitle_tracks=[
SubtitleTrack(0, "subrip", "eng"),
SubtitleTrack(1, "subrip", "fre"),
SubtitleTrack(2, "subrip", "eng"),
]
)
assert e.subtitle_languages() == ["eng", "fre"]
# ---------------------------------------------------------------------------
@@ -157,76 +241,226 @@ class TestTVShow:
class TestSeason:
def test_basic_creation(self):
s = Season(show_imdb_id="tt0903747", season_number=1, episode_count=7)
assert s.episode_count == 7
def _ep(self, episode: int) -> Episode:
return Episode(season_number=1, episode_number=episode, title=f"Ep {episode}")
def test_basic_creation_coerces_season_number(self):
s = Season(season_number=1)
assert isinstance(s.season_number, SeasonNumber)
assert s.episode_count == 0
assert s.episodes == {}
def test_get_folder_name_normal(self):
s = Season(show_imdb_id="tt0903747", season_number=2, episode_count=13)
assert s.get_folder_name() == "Season 02"
assert Season(season_number=2).get_folder_name() == "Season 02"
def test_get_folder_name_specials(self):
s = Season(show_imdb_id="tt0903747", season_number=0, episode_count=3)
s = Season(season_number=0)
assert s.get_folder_name() == "Specials"
assert s.is_special()
def test_negative_episode_count_raises(self):
def test_negative_aired_raises(self):
with pytest.raises(ValueError):
Season(show_imdb_id="tt0903747", season_number=1, episode_count=-1)
Season(season_number=1, aired_episodes=-1)
def test_str(self):
s = Season(
show_imdb_id="tt0903747",
season_number=1,
episode_count=7,
name="Pilot Season",
)
def test_aired_cannot_exceed_expected(self):
with pytest.raises(ValueError):
Season(season_number=1, expected_episodes=5, aired_episodes=6)
def test_add_episode_rejects_mismatched_season(self):
s = Season(season_number=1)
ep = Episode(season_number=2, episode_number=1, title="x")
with pytest.raises(ValueError):
s.add_episode(ep)
def test_add_episode_replaces_same_number(self):
s = Season(season_number=1)
s.add_episode(self._ep(1))
s.add_episode(Episode(season_number=1, episode_number=1, title="Replaced"))
assert s.episodes[EpisodeNumber(1)].title == "Replaced"
def test_str_uses_name_when_present(self):
s = Season(season_number=1, name="Pilot Season")
assert "Pilot Season" in str(s)
# ── Completeness vs aired ──────────────────────────────────────────
def test_is_complete_unknown_aired_is_false(self):
# Conservative: no aired count → cannot claim complete
s = Season(season_number=1)
s.add_episode(self._ep(1))
assert s.is_complete() is False
def test_is_complete_when_owning_all_aired(self):
s = Season(season_number=1, aired_episodes=3)
for i in (1, 2, 3):
s.add_episode(self._ep(i))
assert s.is_complete() is True
def test_is_complete_zero_aired_is_trivially_true(self):
s = Season(season_number=1, aired_episodes=0)
assert s.is_complete() is True
def test_partial_when_missing_aired_episodes(self):
s = Season(season_number=1, aired_episodes=3)
s.add_episode(self._ep(1))
assert s.is_complete() is False
def test_is_fully_aired(self):
s = Season(season_number=1, expected_episodes=10, aired_episodes=10)
assert s.is_fully_aired() is True
def test_is_fully_aired_false_when_in_flight(self):
s = Season(season_number=1, expected_episodes=10, aired_episodes=4)
assert s.is_fully_aired() is False
def test_is_fully_aired_false_with_unknowns(self):
assert Season(season_number=1).is_fully_aired() is False
def test_missing_episodes_when_partial(self):
s = Season(season_number=1, aired_episodes=5)
s.add_episode(self._ep(1))
s.add_episode(self._ep(3))
missing = [n.value for n in s.missing_episodes()]
assert missing == [2, 4, 5]
def test_missing_episodes_empty_when_complete(self):
s = Season(season_number=1, aired_episodes=2)
s.add_episode(self._ep(1))
s.add_episode(self._ep(2))
assert s.missing_episodes() == []
def test_missing_episodes_empty_when_unknown_aired(self):
# Without an aired count we cannot reason about gaps
s = Season(season_number=1)
s.add_episode(self._ep(2))
assert s.missing_episodes() == []
# ---------------------------------------------------------------------------
# Episode entity
# TVShow aggregate root
# ---------------------------------------------------------------------------
class TestEpisode:
class TestTVShow:
def _show(self, **kwargs) -> TVShow:
defaults = dict(
imdb_id="tt0903747",
title="Breaking Bad",
status="ended",
)
defaults.update(kwargs)
return TVShow(**defaults)
# ── Construction & coercion ────────────────────────────────────────
def test_basic_creation(self):
e = Episode(
show_imdb_id="tt0903747",
season_number=1,
episode_number=1,
title="Pilot",
)
assert e.title == "Pilot"
show = self._show(expected_seasons=5)
assert show.title == "Breaking Bad"
assert show.expected_seasons == 5
assert show.seasons == {}
assert show.seasons_count == 0
def test_get_filename_format(self):
e = Episode(
show_imdb_id="tt0903747",
season_number=1,
episode_number=5,
title="Gray Matter",
)
filename = e.get_filename()
assert filename.startswith("S01E05")
assert "Gray.Matter" in filename
def test_coerces_string_imdb_id(self):
assert isinstance(self._show().imdb_id, ImdbId)
def test_has_file_false_when_no_path(self):
e = Episode(
show_imdb_id="tt0903747",
season_number=1,
episode_number=1,
title="Pilot",
)
assert not e.has_file()
assert not e.is_downloaded()
def test_coerces_string_status(self):
assert self._show(status="ongoing").status == ShowStatus.ONGOING
def test_str_format(self):
e = Episode(
show_imdb_id="tt0903747",
season_number=2,
episode_number=3,
title="Bit by a Dead Bee",
)
s = str(e)
assert "S02E03" in s
assert "Bit by a Dead Bee" in s
def test_is_ongoing_and_is_ended(self):
assert self._show(status="ongoing").is_ongoing()
assert self._show(status="ended").is_ended()
def test_negative_expected_seasons_raises(self):
with pytest.raises(ValueError):
self._show(expected_seasons=-1)
def test_invalid_imdb_id_type_raises(self):
with pytest.raises(ValueError):
TVShow(imdb_id=12345, title="X", status="ended") # type: ignore
def test_get_folder_name_replaces_spaces(self):
assert self._show(title="Breaking Bad").get_folder_name() == "Breaking.Bad"
def test_get_folder_name_strips_special_chars(self):
name = self._show(title="It's Always Sunny").get_folder_name()
assert "'" not in name
def test_str_repr(self):
show = self._show()
assert "Breaking Bad" in str(show)
assert "tt0903747" in repr(show)
# ── add_episode — the only sanctioned mutation ─────────────────────
def test_add_episode_creates_missing_season(self):
show = self._show()
show.add_episode(Episode(season_number=1, episode_number=1, title="Pilot"))
assert SeasonNumber(1) in show.seasons
assert show.seasons_count == 1
assert show.episode_count == 1
def test_add_episode_reuses_existing_season(self):
show = self._show()
show.add_episode(Episode(season_number=1, episode_number=1, title="A"))
show.add_episode(Episode(season_number=1, episode_number=2, title="B"))
assert show.seasons_count == 1
assert show.episode_count == 2
def test_add_season_replaces_existing(self):
show = self._show()
s1 = Season(season_number=1, aired_episodes=10)
show.add_season(s1)
s1bis = Season(season_number=1, aired_episodes=5)
show.add_season(s1bis)
assert show.seasons[SeasonNumber(1)] is s1bis
# ── Collection status ──────────────────────────────────────────────
def test_collection_status_empty(self):
assert self._show().collection_status() == CollectionStatus.EMPTY
def test_collection_status_partial_missing_episode(self):
show = self._show()
s = Season(season_number=1, aired_episodes=3)
s.add_episode(Episode(season_number=1, episode_number=1, title="x"))
show.add_season(s)
assert show.collection_status() == CollectionStatus.PARTIAL
def test_collection_status_complete(self):
show = self._show(expected_seasons=1)
s = Season(season_number=1, aired_episodes=2)
for n in (1, 2):
s.add_episode(Episode(season_number=1, episode_number=n, title=f"e{n}"))
show.add_season(s)
assert show.collection_status() == CollectionStatus.COMPLETE
def test_collection_status_partial_when_seasons_missing(self):
# Seasons we own are complete, but expected_seasons says more exist.
show = self._show(expected_seasons=2)
s = Season(season_number=1, aired_episodes=1)
s.add_episode(Episode(season_number=1, episode_number=1, title="x"))
show.add_season(s)
assert show.collection_status() == CollectionStatus.PARTIAL
def test_is_complete_series_requires_ended_and_complete(self):
show = self._show(status="ongoing", expected_seasons=1)
s = Season(season_number=1, aired_episodes=1)
s.add_episode(Episode(season_number=1, episode_number=1, title="x"))
show.add_season(s)
# Ongoing → never "complete series" even if collection is COMPLETE
assert show.is_complete_series() is False
show.status = ShowStatus.ENDED
assert show.is_complete_series() is True
# ── missing_episodes traversal ─────────────────────────────────────
def test_missing_episodes_walks_seasons_in_order(self):
show = self._show()
s2 = Season(season_number=2, aired_episodes=2)
s1 = Season(season_number=1, aired_episodes=3)
s1.add_episode(Episode(season_number=1, episode_number=2, title="x"))
show.add_season(s2)
show.add_season(s1)
missing = [(s.value, e.value) for s, e in show.missing_episodes()]
assert missing == [(1, 1), (1, 3), (2, 1), (2, 2)]