chore: sprint cleanup — language unification, parser unification, fossils removal

Several weeks of work accumulated without being committed. Grouped here for
clarity; see CHANGELOG.md [Unreleased] for the user-facing summary.

Highlights
----------

P1 #2 — ISO 639-2/B canonical migration
- New Language VO + LanguageRegistry (alfred/domain/shared/knowledge/).
- iso_languages.yaml as single source of truth for language codes.
- SubtitleKnowledgeBase now delegates lookup to LanguageRegistry; subtitles.yaml
  only declares subtitle-specific tokens (vostfr, vf, vff, …).
- SubtitlePreferences default → ["fre", "eng"]; subtitle filenames written as
  {iso639_2b}.srt (legacy fr.srt still read via alias).
- Scanner: dropped _LANG_KEYWORDS / _SDH_TOKENS / _FORCED_TOKENS /
  SUBTITLE_EXTENSIONS hardcoded dicts.
- Fixed: 'hi' token no longer marks SDH (conflicted with Hindi alias).
- Added settings.min_movie_size_bytes (was a module constant).

P1 #3 — Release parser unification + data-driven tokenizer
- parse_release() is now the single source of truth for release-name parsing.
- alfred/knowledge/release/separators.yaml declares the token separators used
  by the tokenizer (., space, [, ], (, ), _). New conventions can be added
  without code changes.
- Tokenizer now splits on any configured separator instead of name.split('.').
  Releases like 'The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]' parse via
  the direct path without sanitization fallback.
- Site-tag extraction always runs first; well-formedness only rejects truly
  forbidden chars.
- _parse_season_episode() extended with NxNN / NxNNxNN alt forms.
- Removed dead helpers: _sanitize, _normalize.

Domain cleanup
- Deleted fossil services with zero production callers:
    alfred/domain/movies/services.py
    alfred/domain/tv_shows/services.py
    alfred/domain/subtitles/services.py (replaced by subtitles/services/ package)
    alfred/domain/subtitles/repositories.py
- Split monolithic subtitle services into a package (identifier, matcher,
  placer, pattern_detector, utils) + dedicated knowledge/ package.
- MediaInfo split into dedicated package (alfred/domain/shared/media/:
  audio, video, subtitle, info, matching).

Persistence cleanup
- Removed dead JSON repositories (movie/subtitle/tvshow_repository.py).

Tests
- Major expansion of the test suite organized to mirror the source tree.
- Removed obsolete *_edge_cases test files superseded by structured tests.
- Suite: 990 passed, 8 skipped.

Misc
- .gitignore: exclude env_backup/ and *.bak.
- Adjustments across agent/llm, app.py, application/filesystem, and
  infrastructure/filesystem to align with the new domain layout.
This commit is contained in:
2026-05-17 23:38:00 +02:00
parent ba6f016d49
commit e07c9ec77b
99 changed files with 8833 additions and 6533 deletions
+142
View File
@@ -0,0 +1,142 @@
"""Tests for ``alfred.domain.shared.media`` — pure ffprobe dataclasses.
Exercises:
- ``AudioTrack`` / ``SubtitleTrack`` / ``VideoTrack`` — simple dataclass construction.
- ``VideoTrack.resolution`` — width-priority resolution detection (handles
widescreen/scope crops where width > height bucket), with height fallback
when width is missing.
- ``MediaInfo.resolution`` — delegates to the primary video track.
- ``MediaInfo.audio_languages`` — order-preserving deduplication.
- ``MediaInfo.is_multi_audio`` — multi-language detection.
"""
from __future__ import annotations
import pytest
from alfred.domain.shared.media import AudioTrack, MediaInfo, SubtitleTrack, VideoTrack
class TestTracks:
def test_audio_track_defaults(self):
t = AudioTrack(index=0, codec="aac", channels=2, channel_layout="stereo",
language="eng")
assert t.is_default is False
def test_subtitle_track_defaults(self):
t = SubtitleTrack(index=2, codec="subrip", language="fre")
assert t.is_default is False
assert t.is_forced is False
def test_video_track_defaults(self):
v = VideoTrack(index=0, codec="hevc", width=1920, height=1080)
assert v.is_default is False
class TestVideoTrackResolution:
def test_no_dimensions(self):
assert VideoTrack(index=0, codec=None, width=None, height=None).resolution is None
@pytest.mark.parametrize(
"w,expected",
[
(3840, "2160p"), # UHD lower bound
(3996, "2160p"), # cinema 4K
(1920, "1080p"),
(1280, "720p"),
(720, "576p"),
(640, "480p"),
],
)
def test_width_priority(self, w, expected):
assert VideoTrack(index=0, codec=None, width=w, height=1080).resolution == expected
def test_widescreen_scope_crop(self):
# 1920x960 (scope crop) → still 1080p because width-priority
assert VideoTrack(index=0, codec=None, width=1920, height=960).resolution == "1080p"
@pytest.mark.parametrize(
"h,expected",
[
(2160, "2160p"),
(1080, "1080p"),
(720, "720p"),
(576, "576p"),
(480, "480p"),
],
)
def test_height_fallback_when_width_missing(self, h, expected):
assert VideoTrack(index=0, codec=None, width=None, height=h).resolution == expected
def test_width_below_buckets_falls_to_height(self):
# width=320 falls below every bucket; falls back to f"{h}p"
assert VideoTrack(index=0, codec=None, width=320, height=240).resolution == "240p"
def test_width_only_below_buckets(self):
# width=200, no height → f"{w}w" sentinel
result = VideoTrack(index=0, codec=None, width=200, height=None).resolution
assert result == "200w"
class TestMediaInfoResolutionDelegation:
def test_no_video_track(self):
assert MediaInfo().resolution is None
def test_delegates_to_primary_video(self):
m = MediaInfo(
video_tracks=[VideoTrack(index=0, codec="hevc", width=1920, height=1080)]
)
assert m.resolution == "1080p"
assert m.width == 1920
assert m.height == 1080
assert m.video_codec == "hevc"
def test_multiple_video_tracks_uses_first(self):
m = MediaInfo(
video_tracks=[
VideoTrack(index=0, codec="hevc", width=3840, height=2160),
VideoTrack(index=1, codec="mjpeg", width=320, height=240), # cover art
]
)
assert m.resolution == "2160p"
class TestAudioLanguages:
def test_empty(self):
assert MediaInfo().audio_languages == []
def test_dedup_preserves_order(self):
m = MediaInfo(
audio_tracks=[
AudioTrack(0, "eac3", 6, "5.1", "eng"),
AudioTrack(1, "ac3", 6, "5.1", "fre"),
AudioTrack(2, "ac3", 2, "stereo", "eng"), # duplicate eng
AudioTrack(3, "aac", 2, "stereo", None), # ignored
]
)
assert m.audio_languages == ["eng", "fre"]
def test_all_none_languages(self):
m = MediaInfo(
audio_tracks=[
AudioTrack(0, "aac", 2, "stereo", None),
AudioTrack(1, "aac", 2, "stereo", None),
]
)
assert m.audio_languages == []
def test_is_multi_audio_false_single_lang(self):
m = MediaInfo(
audio_tracks=[AudioTrack(0, "aac", 2, "stereo", "eng")]
)
assert m.is_multi_audio is False
def test_is_multi_audio_true(self):
m = MediaInfo(
audio_tracks=[
AudioTrack(0, "aac", 2, "stereo", "eng"),
AudioTrack(1, "aac", 2, "stereo", "fre"),
]
)
assert m.is_multi_audio is True
+283
View File
@@ -0,0 +1,283 @@
"""Tests for ``alfred.domain.release`` — release-name parser.
Covers the public surface used by the resolver / move pipeline:
- ``parse_release`` — well-formed scene names (TV episodes, season packs,
movies), site-tagged names, malformed names recovered via sanitization,
and irrecoverable names that fall back to ``media_type="unknown"``.
- ``ParsedRelease`` — derived properties (``is_season_pack``,
``show_folder_name``, ``season_folder_name``, ``episode_filename``,
``movie_folder_name``, ``movie_filename``) including the Windows-forbidden
character sanitizer and the episode-stripping helper for season folders.
These tests exercise the parser end-to-end through real YAML knowledge
files; no monkeypatching of the knowledge layer is performed.
"""
from __future__ import annotations
import pytest
from alfred.domain.release.services import parse_release
from alfred.domain.release.value_objects import ParsedRelease
class TestParseTVEpisode:
"""Single-episode TV releases."""
def test_basic_tv_episode(self):
r = parse_release("Oz.S03E01.1080p.WEBRip.x265-KONTRAST")
assert r.title == "Oz"
assert r.season == 3
assert r.episode == 1
assert r.episode_end is None
assert r.quality == "1080p"
assert r.source == "WEBRip"
assert r.codec == "x265"
assert r.group == "KONTRAST"
assert r.media_type == "tv_show"
assert r.parse_path == "direct"
assert r.is_season_pack is False
def test_multi_episode(self):
r = parse_release("Archer.S14E09E10.1080p.WEB.x265-GRP")
assert r.season == 14
assert r.episode == 9
assert r.episode_end == 10
def test_nxnn_alt_form(self):
# Alt season/episode form: 1x05 instead of S01E05.
r = parse_release("Some.Show.1x05.720p.HDTV.x264-GRP")
assert r.season == 1
assert r.episode == 5
assert r.episode_end is None
assert r.media_type == "tv_show"
def test_nxnnxnn_multi_episode_alt_form(self):
r = parse_release("Some.Show.2x07x08.1080p.WEB.x265-GRP")
assert r.season == 2
assert r.episode == 7
assert r.episode_end == 8
def test_season_pack(self):
r = parse_release("Oz.S03.1080p.WEBRip.x265-KONTRAST")
assert r.season == 3
assert r.episode is None
assert r.is_season_pack is True
assert r.media_type == "tv_show"
class TestParseMovie:
"""Movie releases."""
def test_basic_movie(self):
r = parse_release("Inception.2010.1080p.BluRay.x264-GROUP")
assert r.title == "Inception"
assert r.year == 2010
assert r.season is None
assert r.episode is None
assert r.quality == "1080p"
assert r.source == "BluRay"
assert r.codec == "x264"
assert r.group == "GROUP"
assert r.media_type == "movie"
def test_movie_multi_word_title(self):
r = parse_release("The.Dark.Knight.2008.2160p.UHD.BluRay.x265-TERMINAL")
assert r.title == "The.Dark.Knight"
assert r.year == 2008
assert r.quality == "2160p"
def test_movie_without_year_still_movie_if_tech_present(self):
r = parse_release("UntitledFilm.1080p.WEBRip.x264-GRP")
# No season, no year, but tech markers → still movie
assert r.media_type == "movie"
assert r.year is None
class TestParseEdgeCases:
"""Site tags, malformed names, and unknown media types."""
def test_site_tag_prefix_stripped(self):
r = parse_release("[ OxTorrent.vc ] The.Title.S01E01.1080p.WEB.x265-GRP")
assert r.site_tag == "OxTorrent.vc"
assert r.parse_path == "sanitized"
assert r.season == 1
assert r.episode == 1
def test_site_tag_suffix_stripped(self):
r = parse_release("The.Title.S01E01.1080p.WEB.x265-NTb[TGx]")
assert r.site_tag == "TGx"
# Suffix-tagged names are well-formed (only [] in tag → after strip clean)
assert r.season == 1
def test_irrecoverably_malformed(self):
# @ is a forbidden char and not stripped by _sanitize → stays malformed
r = parse_release("foo@bar@baz")
assert r.media_type == "unknown"
assert r.parse_path == "ai"
assert r.group == "UNKNOWN"
def test_empty_unknown_when_no_evidence(self):
r = parse_release("Some.Random.Title")
# No season, no year, no tech markers → unknown
assert r.media_type == "unknown"
def test_missing_group_defaults_to_unknown(self):
r = parse_release("Movie.2020.1080p.WEBRip.x265")
# No "-GROUP" suffix → group = "UNKNOWN"
assert r.group == "UNKNOWN"
def test_yts_bracket_release(self):
# YTS-style: spaces, parens for year, multiple bracketed tech tokens.
# The tokenizer must handle ' ', '(', ')', '[', ']' transparently.
r = parse_release("The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]")
assert r.title == "The.Father"
assert r.year == 2020
assert r.quality == "1080p"
assert r.source == "WEBRip"
assert r.audio_channels == "5.1"
assert r.media_type == "movie"
def test_human_friendly_spaces(self):
# Spaces as separators (no brackets).
r = parse_release("Inception 2010 1080p BluRay x264-GROUP")
assert r.title == "Inception"
assert r.year == 2010
assert r.quality == "1080p"
assert r.codec == "x264"
assert r.group == "GROUP"
assert r.media_type == "movie"
def test_underscore_separators(self):
# Old usenet style: underscores between tokens.
r = parse_release("Some_Show_S01E01_1080p_WEB_x265-GRP")
assert r.season == 1
assert r.episode == 1
assert r.quality == "1080p"
assert r.group == "GRP"
class TestParseAudioVideoEdition:
"""Audio, video metadata, edition extraction."""
def test_audio_codec_and_channels(self):
r = parse_release("Movie.2020.1080p.BluRay.DTS.5.1.x264-GRP")
assert r.audio_channels == "5.1"
def test_language_token(self):
r = parse_release("Movie.2020.MULTI.1080p.WEBRip.x265-GRP")
assert "MULTI" in r.languages
def test_edition_token(self):
r = parse_release("Movie.2020.UNRATED.1080p.BluRay.x264-GRP")
assert r.edition == "UNRATED"
class TestParsedReleaseFolderNames:
"""Helpers that build filesystem-safe folder/filenames."""
def _parsed_tv(self) -> ParsedRelease:
return parse_release("Oz.S03E01.1080p.WEBRip.x265-KONTRAST")
def _parsed_movie(self) -> ParsedRelease:
return parse_release("Inception.2010.1080p.BluRay.x264-GROUP")
def test_show_folder_name(self):
r = self._parsed_tv()
assert r.show_folder_name("Oz", 1997) == "Oz.1997.1080p.WEBRip.x265-KONTRAST"
def test_show_folder_name_strips_windows_chars(self):
r = self._parsed_tv()
# Colons and question marks are Windows-forbidden — must be stripped.
result = r.show_folder_name("Oz: The Series?", 1997)
assert ":" not in result
assert "?" not in result
def test_season_folder_name_strips_episode(self):
r = self._parsed_tv()
# Episode token Exx is stripped, Sxx stays
result = r.season_folder_name()
assert "S03" in result
assert "E01" not in result
def test_season_folder_name_multi_episode(self):
r = parse_release("Archer.S14E09E10E11.1080p.WEB.x265-GRP")
result = r.season_folder_name()
assert "S14" in result
assert "E09" not in result
assert "E10" not in result
assert "E11" not in result
def test_episode_filename_with_title(self):
r = self._parsed_tv()
fname = r.episode_filename("The Routine", "mkv")
assert fname.endswith(".mkv")
assert "S03E01" in fname
assert "The.Routine" in fname
assert "KONTRAST" in fname
def test_episode_filename_without_title(self):
r = self._parsed_tv()
fname = r.episode_filename(None, "mkv")
assert fname.endswith(".mkv")
assert "S03E01" in fname
def test_episode_filename_strips_ext_dot(self):
r = self._parsed_tv()
# Whether the caller passes "mkv" or ".mkv", we get a single dot.
a = r.episode_filename(None, "mkv")
b = r.episode_filename(None, ".mkv")
assert a == b
assert "..mkv" not in a
def test_movie_folder_name(self):
r = self._parsed_movie()
assert (
r.movie_folder_name("Inception", 2010)
== "Inception.2010.1080p.BluRay.x264-GROUP"
)
def test_movie_filename(self):
r = self._parsed_movie()
assert (
r.movie_filename("Inception", 2010, "mkv")
== "Inception.2010.1080p.BluRay.x264-GROUP.mkv"
)
class TestParsedReleaseInvariants:
"""Structural invariants of ParsedRelease."""
def test_raw_is_preserved(self):
raw = "Oz.S03E01.1080p.WEBRip.x265-KONTRAST"
r = parse_release(raw)
assert r.raw == raw
def test_languages_defaults_to_empty_list_not_none(self):
r = parse_release("Movie.2020.1080p.BluRay.x264-GRP")
# __post_init__ ensures languages is a list, never None
assert r.languages == []
def test_tech_string_joined(self):
r = parse_release("Movie.2020.1080p.BluRay.x264-GRP")
assert r.tech_string == "1080p.BluRay.x264"
def test_tech_string_partial(self):
# Codec-only release (no quality/source): tech_string == codec
r = parse_release("Show.S01E01.x265-GRP")
assert r.tech_string == "x265"
assert r.codec == "x265"
assert r.quality is None
assert r.source is None
@pytest.mark.parametrize(
"name,expected_type",
[
("Show.S01E01.1080p.WEB.x265-GRP", "tv_show"),
("Movie.2020.1080p.BluRay.x264-GRP", "movie"),
("Random.Title.With.Nothing", "unknown"),
],
)
def test_media_type_inference(self, name, expected_type):
assert parse_release(name).media_type == expected_type
-504
View File
@@ -1,504 +0,0 @@
"""
Tests for alfred.domain.release.release_parser
Real-data cases sourced from /mnt/testipool/downloads/.
Covers: parsing, normalisation, naming methods, edge cases.
"""
from alfred.domain.release import parse_release
from alfred.domain.release.services import _normalise
from alfred.domain.release.value_objects import (
_sanitise_for_fs,
_strip_episode_from_normalised,
)
# ---------------------------------------------------------------------------
# _normalise
# ---------------------------------------------------------------------------
class TestNormalise:
def test_dots_unchanged(self):
assert (
_normalise("Oz.S01.1080p.WEBRip.x265-KONTRAST")
== "Oz.S01.1080p.WEBRip.x265-KONTRAST"
)
def test_spaces_become_dots(self):
assert (
_normalise("Oz S01 1080p WEBRip x265-KONTRAST")
== "Oz.S01.1080p.WEBRip.x265-KONTRAST"
)
def test_double_dots_collapsed(self):
assert _normalise("Oz..S01..1080p") == "Oz.S01.1080p"
def test_leading_trailing_dots_stripped(self):
assert _normalise(".Oz.S01.") == "Oz.S01"
def test_mixed_spaces_and_dots(self):
# "Archer 2009 S14E09E10E11 Into the Cold 1080p HULU WEB-DL DDP5 1 H 264-NTb"
result = _normalise(
"Archer 2009 S14E09E10E11 Into the Cold 1080p HULU WEB-DL DDP5 1 H 264-NTb"
)
assert " " not in result
assert ".." not in result
# ---------------------------------------------------------------------------
# _sanitise_for_fs
# ---------------------------------------------------------------------------
class TestSanitiseForFs:
def test_clean_string_unchanged(self):
assert _sanitise_for_fs("Oz.S01.1080p-KONTRAST") == "Oz.S01.1080p-KONTRAST"
def test_removes_question_mark(self):
assert _sanitise_for_fs("What's Up?") == "What's Up"
def test_removes_colon(self):
assert _sanitise_for_fs("He Said: She Said") == "He Said She Said"
def test_removes_all_forbidden(self):
assert _sanitise_for_fs('a?b:c*d"e<f>g|h\\i') == "abcdefghi"
def test_apostrophe_kept(self):
# apostrophe is not in the forbidden set
assert _sanitise_for_fs("What's Up") == "What's Up"
def test_ellipsis_kept(self):
assert _sanitise_for_fs("What If...") == "What If..."
# ---------------------------------------------------------------------------
# _strip_episode_from_normalised
# ---------------------------------------------------------------------------
class TestStripEpisode:
def test_strips_single_episode(self):
assert (
_strip_episode_from_normalised("Oz.S01E01.1080p.WEBRip.x265-KONTRAST")
== "Oz.S01.1080p.WEBRip.x265-KONTRAST"
)
def test_strips_multi_episode(self):
assert (
_strip_episode_from_normalised("Archer.S14E09E10E11.1080p.HULU.WEB-DL-NTb")
== "Archer.S14.1080p.HULU.WEB-DL-NTb"
)
def test_season_pack_unchanged(self):
assert (
_strip_episode_from_normalised("Oz.S01.1080p.WEBRip.x265-KONTRAST")
== "Oz.S01.1080p.WEBRip.x265-KONTRAST"
)
def test_case_insensitive(self):
assert (
_strip_episode_from_normalised("oz.s01e01.1080p-KONTRAST")
== "oz.s01.1080p-KONTRAST"
)
# ---------------------------------------------------------------------------
# parse_release — Season packs (dots)
# ---------------------------------------------------------------------------
class TestSeasonPackDots:
"""Real cases: Oz.S01-S06 KONTRAST, Archer S03 EDGE2020, etc."""
def test_oz_s01_kontrast(self):
p = parse_release("Oz.S01.1080p.WEBRip.x265-KONTRAST")
assert p.title == "Oz"
assert p.season == 1
assert p.episode is None
assert p.quality == "1080p"
assert p.source == "WEBRip"
assert p.codec == "x265"
assert p.group == "KONTRAST"
assert p.is_season_pack
assert not p.is_movie
def test_fallout_s02_kontrast(self):
p = parse_release("Fallout.2024.S02.1080p.WEBRip.x265-KONTRAST")
assert p.title == "Fallout"
assert p.year == 2024
assert p.season == 2
assert p.episode is None
assert p.group == "KONTRAST"
def test_archer_s03_edge2020(self):
p = parse_release("Archer.2009.S03.1080p.BluRay.DDP.5.1.x265-EDGE2020")
assert p.title == "Archer"
assert p.year == 2009
assert p.season == 3
assert p.quality == "1080p"
assert p.source == "BluRay"
assert p.codec == "x265"
assert p.group == "EDGE2020"
def test_fargo_s05_hulu_webdl(self):
p = parse_release("Fargo.S05.1080p.HULU.WEB-DL.x265.10bit-Protozoan")
assert p.title == "Fargo"
assert p.season == 5
assert p.quality == "1080p"
assert p.group == "Protozoan"
def test_xfiles_s01_bluray_rarbg(self):
p = parse_release("The.X-Files.S01.1080p.BluRay.x265-RARBG")
assert p.title == "The.X-Files"
assert p.season == 1
assert p.source == "BluRay"
assert p.group == "RARBG"
def test_gilmore_girls_s01_s07_repack(self):
p = parse_release(
"Gilmore.Girls.Complete.S01-S07.REPACK.1080p.WEB-DL.x265.10bit.HEVC-MONOLITH"
)
# Season range — we parse the first season number found
assert p.season == 1
assert p.group == "MONOLITH"
def test_plot_against_america_4k(self):
p = parse_release(
"The.Plot.Against.America.S01.2160p.MAX.WEB-DL.x265.10bit.HDR.DDP5.1.x265-SH3LBY"
)
assert p.title == "The.Plot.Against.America"
assert p.season == 1
assert p.quality == "2160p"
assert p.group == "SH3LBY"
def test_foundation_with_year_in_title(self):
p = parse_release("Foundation.2021.S01.1080p.WEBRip.x265-RARBG")
assert p.title == "Foundation"
assert p.year == 2021
assert p.season == 1
assert p.group == "RARBG"
def test_gen_v_s02(self):
p = parse_release("Gen.V.S02.1080p.WEBRip.x265-KONTRAST")
assert p.title == "Gen.V"
assert p.season == 2
assert p.group == "KONTRAST"
# ---------------------------------------------------------------------------
# parse_release — Single episodes (dots)
# ---------------------------------------------------------------------------
class TestSingleEpisodeDots:
"""Real cases: Fallout S02Exx ELiTE, Mare of Easttown PSA, etc."""
def test_fallout_s02e01_elite(self):
p = parse_release("Fallout.2024.S02E01.1080p.x265-ELiTE")
assert p.title == "Fallout"
assert p.year == 2024
assert p.season == 2
assert p.episode == 1
assert p.episode_end is None
assert p.group == "ELiTE"
assert not p.is_season_pack
def test_mare_of_easttown_with_episode_title_in_filename(self):
# Episode filenames often embed the title — we parse the release folder name
p = parse_release("Mare.of.Easttown.S01.1080p.10bit.WEBRip.6CH.x265.HEVC-PSA")
assert p.title == "Mare.of.Easttown"
assert p.season == 1
assert p.group == "PSA"
def test_it_welcome_to_derry_s01e01(self):
p = parse_release("IT.Welcome.to.Derry.S01E01.1080p.x265-ELiTE")
assert p.title == "IT.Welcome.to.Derry"
assert p.season == 1
assert p.episode == 1
assert p.group == "ELiTE"
def test_landman_s02e01(self):
p = parse_release("Landman.S02E01.1080p.x265-ELiTE")
assert p.title == "Landman"
assert p.season == 2
assert p.episode == 1
def test_prodiges_episode_with_number_in_title(self):
# "Prodiges.S12E01.1ere.demi-finale..." — accented chars in episode title
p = parse_release("Prodiges.S12E01.1080p.WEB.H264-THESYNDiCATE")
assert p.title == "Prodiges"
assert p.season == 12
assert p.episode == 1
assert p.group == "THESYNDiCATE"
# ---------------------------------------------------------------------------
# parse_release — Multi-episode
# ---------------------------------------------------------------------------
class TestMultiEpisode:
def test_archer_triple_episode(self):
# "Archer 2009 S14E09E10E11 Into the Cold 1080p HULU WEB-DL DDP5 1 H 264-NTb"
p = parse_release(
"Archer.2009.S14E09E10E11.Into.the.Cold.1080p.HULU.WEB-DL.DDP5.1.H.264-NTb"
)
assert p.season == 14
assert p.episode == 9
assert p.episode_end == 10 # only first E-pair captured by regex group 2+3
# ---------------------------------------------------------------------------
# parse_release — Movies
# ---------------------------------------------------------------------------
class TestMovies:
def test_another_round_yts(self):
# "Another Round (2020) [1080p] [BluRay] [YTS.MX]" → normalised
p = parse_release("Another.Round.2020.1080p.BluRay.x264-YTS")
assert p.is_movie
assert p.title == "Another.Round"
assert p.year == 2020
assert p.quality == "1080p"
assert p.source == "BluRay"
assert p.group == "YTS"
def test_godzilla_minus_one(self):
p = parse_release("Godzilla.Minus.One.2023.1080p.BluRay.x265.10bit.AAC5.1-YTS")
assert p.title == "Godzilla.Minus.One"
assert p.year == 2023
assert p.is_movie
assert p.group == "YTS"
def test_deadwood_movie_2019(self):
p = parse_release("Deadwood.The.Movie.2019.1080p.BluRay.x265-RARBG")
assert p.year == 2019
assert p.is_movie
assert p.group == "RARBG"
def test_revolver_2005_bluray(self):
p = parse_release("Revolver.2005.1080p.BluRay.x265-RARBG")
assert p.title == "Revolver"
assert p.year == 2005
assert p.is_movie
def test_the_xfiles_movie_1998(self):
p = parse_release("The.X.Files.1998.1080p.BluRay.x265-RARBG")
assert p.year == 1998
assert p.is_movie
assert p.group == "RARBG"
def test_movie_no_group(self):
p = parse_release("Jurassic.Park.1993.1080p.BluRay.x265")
assert p.is_movie
assert p.year == 1993
assert p.group == "UNKNOWN"
def test_multi_language_movie(self):
p = parse_release("Jumanji.1995.MULTi.1080p.DSNP.WEB.H265-THESYNDiCATE")
assert p.year == 1995
assert p.group == "THESYNDiCATE"
# ---------------------------------------------------------------------------
# parse_release — Space-separated (no dots)
# ---------------------------------------------------------------------------
class TestSpaceSeparated:
def test_oz_spaces(self):
p = parse_release("Oz S01 1080p WEBRip x265-KONTRAST")
assert p.title == "Oz"
assert p.season == 1
assert p.quality == "1080p"
assert p.group == "KONTRAST"
def test_archer_spaces(self):
p = parse_release(
"Archer 2009 S14E09E10E11 Into the Cold 1080p HULU WEB-DL DDP5 1 H 264-NTb"
)
assert p.season == 14
assert p.episode == 9
assert p.group == "NTb"
# ---------------------------------------------------------------------------
# parse_release — tech_string
# ---------------------------------------------------------------------------
class TestTechString:
def test_full_tech(self):
p = parse_release("Oz.S01.1080p.WEBRip.x265-KONTRAST")
assert p.tech_string == "1080p.WEBRip.x265"
def test_tech_string_used_in_folder_name(self):
p = parse_release("Oz.S01.1080p.WEBRip.x265-KONTRAST")
folder = p.show_folder_name("Oz", 1997)
assert "1080p.WEBRip.x265" in folder
def test_no_tech_fallback(self):
p = parse_release("SomeShow.S01")
# tech_string is empty, show_folder_name uses "Unknown"
folder = p.show_folder_name("SomeShow", 2020)
assert "Unknown" in folder
def test_4k_hdr(self):
p = parse_release(
"The.Plot.Against.America.S01.2160p.MAX.WEB-DL.x265.10bit.HDR.DDP5.1-SH3LBY"
)
assert p.quality == "2160p"
# ---------------------------------------------------------------------------
# ParsedRelease — naming methods
# ---------------------------------------------------------------------------
class TestNamingMethods:
def test_show_folder_name(self):
p = parse_release("Oz.S01.1080p.WEBRip.x265-KONTRAST")
assert p.show_folder_name("Oz", 1997) == "Oz.1997.1080p.WEBRip.x265-KONTRAST"
def test_show_folder_name_sanitises_title(self):
p = parse_release("Oz.S01.1080p.WEBRip.x265-KONTRAST")
# Colon in TMDB title should be stripped, spaces become dots
folder = p.show_folder_name("Star Wars: Andor", 2022)
assert ":" not in folder
assert "Star.Wars.Andor" in folder
def test_season_folder_name_from_season_pack(self):
p = parse_release("Oz.S01.1080p.WEBRip.x265-KONTRAST")
assert p.season_folder_name() == "Oz.S01.1080p.WEBRip.x265-KONTRAST"
def test_season_folder_name_strips_episode(self):
p = parse_release("Fallout.2024.S02E01.1080p.x265-ELiTE")
assert p.season_folder_name() == "Fallout.2024.S02.1080p.x265-ELiTE"
def test_episode_filename_with_title(self):
p = parse_release("Oz.S01.1080p.WEBRip.x265-KONTRAST")
fname = p.episode_filename("The Routine", ".mkv")
assert fname == "Oz.S01.The.Routine.1080p.WEBRip.x265-KONTRAST.mkv"
def test_episode_filename_with_episode_number(self):
p = parse_release("Fallout.2024.S02E01.1080p.x265-ELiTE")
fname = p.episode_filename("The Beginning", ".mkv")
assert fname == "Fallout.S02E01.The.Beginning.1080p.x265-ELiTE.mkv"
def test_episode_filename_without_episode_title(self):
p = parse_release("Oz.S01E01.1080p.WEBRip.x265-KONTRAST")
fname = p.episode_filename(None, ".mp4")
assert fname == "Oz.S01E01.1080p.WEBRip.x265-KONTRAST.mp4"
def test_episode_filename_sanitises_episode_title(self):
p = parse_release("Oz.S01E01.1080p.WEBRip.x265-KONTRAST")
fname = p.episode_filename("What's Up?", ".mkv")
assert "?" not in fname
assert "What's.Up" in fname
def test_episode_filename_strips_leading_dot_from_ext(self):
p = parse_release("Oz.S01E01.1080p.WEBRip.x265-KONTRAST")
fname_with = p.episode_filename(None, ".mkv")
fname_without = p.episode_filename(None, "mkv")
assert fname_with == fname_without
def test_movie_folder_name(self):
p = parse_release("Another.Round.2020.1080p.BluRay.x264-YTS")
assert (
p.movie_folder_name("Another Round", 2020)
== "Another.Round.2020.1080p.BluRay.x264-YTS"
)
def test_movie_filename(self):
p = parse_release("Another.Round.2020.1080p.BluRay.x264-YTS")
fname = p.movie_filename("Another Round", 2020, ".mp4")
assert fname == "Another.Round.2020.1080p.BluRay.x264-YTS.mp4"
def test_movie_folder_same_as_show_folder(self):
p = parse_release("Revolver.2005.1080p.BluRay.x265-RARBG")
assert p.movie_folder_name("Revolver", 2005) == p.show_folder_name(
"Revolver", 2005
)
# ---------------------------------------------------------------------------
# ParsedRelease — is_movie / is_season_pack
# ---------------------------------------------------------------------------
class TestMediaTypeFlags:
def test_season_pack_is_not_movie(self):
p = parse_release("Oz.S01.1080p.WEBRip.x265-KONTRAST")
assert not p.is_movie
assert p.is_season_pack
def test_single_episode_is_not_season_pack(self):
p = parse_release("Oz.S01E01.1080p.WEBRip.x265-KONTRAST")
assert not p.is_movie
assert not p.is_season_pack
def test_movie_is_not_season_pack(self):
p = parse_release("Revolver.2005.1080p.BluRay.x265-RARBG")
assert p.is_movie
assert not p.is_season_pack
def test_no_season_no_year_treated_as_movie(self):
# No S/E marker → is_movie = True
p = parse_release("SomeContent.1080p.WEBRip.x265-GROUP")
assert p.is_movie
# ---------------------------------------------------------------------------
# Tricky real-world releases
# ---------------------------------------------------------------------------
class TestRealWorldEdgeCases:
def test_angel_integrale_multi(self):
# "Angel.1999.INTEGRALE.MULTI.1080p.WEBRip.10bits.x265.DD-Jarod"
p = parse_release(
"Angel.1999.INTEGRALE.MULTI.1080p.WEBRip.10bits.x265.DD-Jarod"
)
assert p.year == 1999
assert p.quality == "1080p"
assert p.source == "WEBRip"
def test_group_unknown_when_no_dash(self):
p = parse_release("Oz.S01.1080p.WEBRip.x265")
assert p.group == "UNKNOWN"
def test_normalised_stored_on_parsed(self):
p = parse_release("Oz S01 1080p WEBRip x265-KONTRAST")
assert p.normalised == "Oz.S01.1080p.WEBRip.x265-KONTRAST"
def test_raw_stored_as_is(self):
raw = "Oz S01 1080p WEBRip x265-KONTRAST"
p = parse_release(raw)
assert p.raw == raw
def test_hevc_codec(self):
# "Mare.of.Easttown.S01.1080p.10bit.WEBRip.6CH.x265.HEVC-PSA"
p = parse_release("Mare.of.Easttown.S01.1080p.10bit.WEBRip.6CH.x265.HEVC-PSA")
assert p.codec in ("x265", "HEVC")
assert p.group == "PSA"
def test_xfiles_hyphen_in_title(self):
p = parse_release("The.X-Files.S01.1080p.BluRay.x265-RARBG")
# Title should preserve the hyphen
assert "X-Files" in p.title
def test_foundation_s02_no_year(self):
# Foundation.S02 has no year in release name — year is None
p = parse_release("Foundation.S02.1080p.x265-ELiTE")
assert p.year is None
assert p.season == 2
assert p.group == "ELiTE"
def test_slow_horses_two_groups_same_show(self):
# Same show, different groups across seasons
s01 = parse_release("Slow.Horses.S01.1080p.WEBRip.x265-RARBG")
s04 = parse_release("Slow.Horses.S04.1080p.WEBRip.x265-KONTRAST")
assert s01.title == s04.title == "Slow.Horses"
assert s01.group == "RARBG"
assert s04.group == "KONTRAST"
+345
View File
@@ -0,0 +1,345 @@
"""Tests for ``alfred.domain.subtitles.services.identifier``.
Coverage:
- ``TestTokenize`` — ``_tokenize`` strips parentheses and splits on
``[.\\s_-]``; ``_tokenize_suffix`` peels the episode stem prefix.
- ``TestCountEntries`` — last-cue-number heuristic for SRT files.
- ``TestEmbedded`` — ffprobe is mocked; dispositions map to SDH/FORCED
/ STANDARD; non-existent file → empty list; ffprobe error → empty.
- ``TestAdjacent`` — adjacent strategy: only known extensions, excludes
the video file itself.
- ``TestFlat`` — Subs/ folder adjacent or at release root.
- ``TestEpisodeSubfolder`` — Subs/{stem}/*.srt; tokens after prefix.
- ``TestClassify`` — language + type token detection, confidence math.
- ``TestSizeDisambiguation`` — size_and_count post-processing rules
(2-track → standard+sdh; 3+ → forced + standard + sdh).
"""
from __future__ import annotations
from unittest.mock import patch
import pytest
from alfred.domain.subtitles.entities import SubtitleCandidate
from alfred.domain.subtitles.knowledge.base import SubtitleKnowledgeBase
from alfred.domain.subtitles.services.identifier import (
SubtitleIdentifier,
_count_entries,
_tokenize,
_tokenize_suffix,
)
from alfred.domain.subtitles.value_objects import (
ScanStrategy,
SubtitleLanguage,
SubtitlePattern,
SubtitleType,
TypeDetectionMethod,
)
@pytest.fixture(scope="module")
def kb():
return SubtitleKnowledgeBase()
@pytest.fixture
def identifier(kb):
return SubtitleIdentifier(kb)
def _pattern(strategy: ScanStrategy, root_folder: str | None = None,
detection: TypeDetectionMethod = TypeDetectionMethod.TOKEN_IN_NAME) -> SubtitlePattern:
return SubtitlePattern(
id=f"test-{strategy.value}",
description="",
scan_strategy=strategy,
root_folder=root_folder,
type_detection=detection,
)
# --------------------------------------------------------------------------- #
# _tokenize / _tokenize_suffix #
# --------------------------------------------------------------------------- #
class TestTokenize:
def test_basic_dotted(self):
assert _tokenize("Show.S01E01.French") == ["show", "s01e01", "french"]
def test_mixed_separators(self):
assert _tokenize("Show_S01-E01 French") == [
"show", "s01", "e01", "french"
]
def test_strips_parenthesized(self):
assert _tokenize("episode (Brazil).French") == ["episode", "french"]
def test_empty_string(self):
assert _tokenize("") == []
def test_suffix_strips_episode_prefix(self):
out = _tokenize_suffix("Show.S01E01.English", "Show.S01E01")
assert out == ["english"]
def test_suffix_falls_back_when_no_prefix(self):
# filename doesn't start with episode_stem → full tokenize.
out = _tokenize_suffix("Other.srt", "Show.S01E01")
assert "other" in out
def test_suffix_falls_back_when_suffix_is_empty(self):
# Suffix would tokenize to nothing → fall back to full stem.
out = _tokenize_suffix("Show.S01E01", "Show.S01E01")
# full tokenize of "Show.S01E01" → ['show', 's01e01']
assert out == ["show", "s01e01"]
# --------------------------------------------------------------------------- #
# _count_entries #
# --------------------------------------------------------------------------- #
class TestCountEntries:
def test_last_cue_number(self, tmp_path):
srt = tmp_path / "x.srt"
srt.write_text(
"1\n00:00:01,000 --> 00:00:02,000\nHello\n\n"
"2\n00:00:03,000 --> 00:00:04,000\nWorld\n\n"
"42\n00:00:05,000 --> 00:00:06,000\nLast\n",
encoding="utf-8",
)
assert _count_entries(srt) == 42
def test_missing_file_returns_zero(self, tmp_path):
assert _count_entries(tmp_path / "nope.srt") == 0
def test_empty_file_returns_zero(self, tmp_path):
f = tmp_path / "x.srt"
f.write_text("")
assert _count_entries(f) == 0
# --------------------------------------------------------------------------- #
# Embedded scan #
# --------------------------------------------------------------------------- #
class TestEmbedded:
def test_missing_file_returns_empty(self, identifier, tmp_path):
assert identifier._scan_embedded(tmp_path / "missing.mkv") == []
def test_ffprobe_failure_returns_empty(self, identifier, tmp_path):
video = tmp_path / "v.mkv"
video.write_bytes(b"")
with patch(
"alfred.domain.subtitles.services.identifier.subprocess.run",
side_effect=FileNotFoundError("no ffprobe"),
):
assert identifier._scan_embedded(video) == []
def test_disposition_to_subtitle_type(self, identifier, tmp_path):
video = tmp_path / "v.mkv"
video.write_bytes(b"")
fake_output = (
'{"streams":['
'{"tags":{"language":"eng"},"disposition":{"hearing_impaired":1}},'
'{"tags":{"language":"fre"},"disposition":{"forced":1}},'
'{"tags":{"language":"spa"},"disposition":{}},'
'{"tags":{},"disposition":{}}'
"]}"
)
class FakeResult:
stdout = fake_output
with patch(
"alfred.domain.subtitles.services.identifier.subprocess.run",
return_value=FakeResult(),
):
tracks = identifier._scan_embedded(video)
assert len(tracks) == 4
assert tracks[0].subtitle_type == SubtitleType.SDH
assert tracks[0].language.code == "eng"
assert tracks[1].subtitle_type == SubtitleType.FORCED
assert tracks[1].language.code == "fre"
assert tracks[2].subtitle_type == SubtitleType.STANDARD
assert tracks[3].language is None # no language tag
for t in tracks:
assert t.is_embedded is True
# --------------------------------------------------------------------------- #
# Adjacent / Flat / Episode subfolder discovery #
# --------------------------------------------------------------------------- #
class TestAdjacent:
def test_finds_only_known_subtitle_extensions(self, identifier, tmp_path):
video = tmp_path / "Show.S01E01.mkv"
video.write_bytes(b"")
(tmp_path / "Show.S01E01.English.srt").write_text("")
(tmp_path / "Show.S01E01.French.ass").write_text("")
# Non-subtitle files must be ignored.
(tmp_path / "Show.S01E01.nfo").write_text("")
(tmp_path / "cover.jpg").write_bytes(b"")
result = identifier._find_adjacent(video)
names = sorted(p.name for p in result)
assert names == ["Show.S01E01.English.srt", "Show.S01E01.French.ass"]
def test_excludes_the_video_file(self, identifier, tmp_path):
# An adjacent file with the *same stem* as the video would be the
# video itself (e.g. a .mkv named like the .srt). Not expected here,
# but the implementation guards via `p.stem != video.stem`.
video = tmp_path / "Show.S01E01.mkv"
video.write_bytes(b"")
(tmp_path / "Show.S01E01.srt").write_text("") # same stem
# Same stem → excluded; only subs with a different stem are returned.
assert identifier._find_adjacent(video) == []
class TestFlat:
def test_subs_folder_adjacent(self, identifier, tmp_path):
video = tmp_path / "Show.S01E01.mkv"
video.write_bytes(b"")
subs = tmp_path / "Subs"
subs.mkdir()
(subs / "English.srt").write_text("")
result = identifier._find_flat(video, "Subs")
assert len(result) == 1
def test_subs_folder_at_release_root_fallback(self, identifier, tmp_path):
season = tmp_path / "Season.1"
season.mkdir()
video = season / "Show.S01E01.mkv"
video.write_bytes(b"")
subs = tmp_path / "Subs"
subs.mkdir()
(subs / "English.srt").write_text("")
result = identifier._find_flat(video, "Subs")
assert len(result) == 1
def test_no_subs_folder_returns_empty(self, identifier, tmp_path):
video = tmp_path / "v.mkv"
video.write_bytes(b"")
assert identifier._find_flat(video, "Subs") == []
class TestEpisodeSubfolder:
def test_found_and_stem_returned(self, identifier, tmp_path):
video = tmp_path / "Show.S01E01.mkv"
video.write_bytes(b"")
subs = tmp_path / "Subs" / "Show.S01E01"
subs.mkdir(parents=True)
(subs / "2_English.srt").write_text("")
files, stem = identifier._find_episode_subfolder(video, "Subs")
assert len(files) == 1
assert stem == "Show.S01E01"
def test_not_found(self, identifier, tmp_path):
video = tmp_path / "Show.S01E01.mkv"
video.write_bytes(b"")
files, stem = identifier._find_episode_subfolder(video, "Subs")
assert files == []
assert stem == "Show.S01E01"
# --------------------------------------------------------------------------- #
# Classification #
# --------------------------------------------------------------------------- #
class TestClassify:
def test_classifies_language_and_format(self, identifier, tmp_path):
f = tmp_path / "Show.S01E01.English.srt"
f.write_text("1\n00:00:01,000 --> 00:00:02,000\nHi\n")
track = identifier._classify_single(f)
assert track.language.code == "eng"
assert track.format.id == "srt"
assert track.confidence > 0
assert track.is_embedded is False
def test_classifies_type_token(self, identifier, tmp_path):
f = tmp_path / "Show.S01E01.English.sdh.srt"
f.write_text("")
track = identifier._classify_single(f)
assert track.subtitle_type == SubtitleType.SDH
def test_unknown_tokens_lower_confidence(self, identifier, tmp_path):
f = tmp_path / "Show.S01E01.gibberish.srt"
f.write_text("")
track = identifier._classify_single(f)
# No lang/type recognized → confidence is 0 or very low.
assert track.language is None
assert track.confidence < 0.5
def test_episode_stem_prefix_stripped(self, identifier, tmp_path):
f = tmp_path / "Show.S01E01.English.srt"
f.write_text("")
track = identifier._classify_single(f, episode_stem="Show.S01E01")
# Only "english" remains as meaningful token → confidence == 1.0
assert track.language.code == "eng"
assert track.confidence == 1.0
# --------------------------------------------------------------------------- #
# size_and_count post-processing #
# --------------------------------------------------------------------------- #
class TestSizeDisambiguation:
@pytest.fixture
def pattern_size(self):
return _pattern(
ScanStrategy.FLAT,
root_folder="Subs",
detection=TypeDetectionMethod.SIZE_AND_COUNT,
)
def _track(self, lang_code: str, entries: int) -> SubtitleCandidate:
return SubtitleCandidate(
language=SubtitleLanguage(code=lang_code, tokens=[lang_code]),
format=None,
subtitle_type=SubtitleType.UNKNOWN,
entry_count=entries,
)
def test_two_tracks_split_into_standard_and_sdh(self, identifier, pattern_size):
t1 = self._track("eng", 800)
t2 = self._track("eng", 1200)
result = identifier._disambiguate_by_size([t1, t2])
# Sorted ascending → smaller=standard, larger=sdh
types = sorted([t.subtitle_type for t in result], key=lambda s: s.value)
assert SubtitleType.STANDARD in types
assert SubtitleType.SDH in types
def test_three_tracks_split_into_forced_standard_sdh(self, identifier):
t_small = self._track("eng", 50)
t_mid = self._track("eng", 600)
t_large = self._track("eng", 1200)
result = identifier._disambiguate_by_size([t_large, t_small, t_mid])
# Sorted ascending → smallest=forced, middle=standard, largest=sdh
by_count = sorted(result, key=lambda t: t.entry_count)
assert by_count[0].subtitle_type == SubtitleType.FORCED
assert by_count[1].subtitle_type == SubtitleType.STANDARD
assert by_count[2].subtitle_type == SubtitleType.SDH
def test_single_track_untouched(self, identifier):
t = self._track("eng", 800)
result = identifier._disambiguate_by_size([t])
assert result == [t]
assert t.subtitle_type == SubtitleType.UNKNOWN
def test_different_languages_grouped_independently(self, identifier):
# Two eng + one fra → fra is alone, eng pair gets split.
eng_small = self._track("eng", 800)
eng_large = self._track("eng", 1500)
fra_solo = self._track("fra", 1000)
result = identifier._disambiguate_by_size([eng_small, eng_large, fra_solo])
# fra solo stays UNKNOWN
assert fra_solo.subtitle_type == SubtitleType.UNKNOWN
# eng pair gets STANDARD + SDH
assert eng_small.subtitle_type == SubtitleType.STANDARD
assert eng_large.subtitle_type == SubtitleType.SDH
+281
View File
@@ -0,0 +1,281 @@
"""Tests for ``alfred.domain.subtitles.knowledge`` (loader + base).
Covers:
- ``TestMerge`` — the internal ``_merge`` deep-merge function:
scalar override, dict merge, list extension+dedup.
- ``TestLoader`` — builtin loads alone, learned overlays add tokens,
learned-only pattern is picked up, missing files don't crash.
- ``TestKnowledgeBase`` — typed view: formats / languages /
type-token lookup, default rules, ``patterns_for_group``.
Uses ``monkeypatch`` to override the module-level ``_BUILTIN_ROOT`` and
``_LEARNED_ROOT`` constants so we can drive the loader from a temp dir.
"""
from __future__ import annotations
from pathlib import Path
import pytest
from alfred.domain.subtitles.knowledge import loader as loader_mod
from alfred.domain.subtitles.knowledge.base import SubtitleKnowledgeBase
from alfred.domain.subtitles.knowledge.loader import KnowledgeLoader, _merge
from alfred.domain.subtitles.value_objects import (
ScanStrategy,
SubtitleType,
TypeDetectionMethod,
)
# --------------------------------------------------------------------------- #
# _merge — pure dict merger #
# --------------------------------------------------------------------------- #
class TestMerge:
def test_scalar_override(self):
assert _merge({"a": 1}, {"a": 2}) == {"a": 2}
def test_new_key_added(self):
assert _merge({"a": 1}, {"b": 2}) == {"a": 1, "b": 2}
def test_nested_dict_merged(self):
out = _merge({"a": {"x": 1}}, {"a": {"y": 2}})
assert out == {"a": {"x": 1, "y": 2}}
def test_list_extended_and_deduped(self):
out = _merge({"a": [1, 2]}, {"a": [2, 3]})
assert out == {"a": [1, 2, 3]}
def test_list_preserves_order(self):
out = _merge({"a": ["x", "y"]}, {"a": ["z", "x"]})
assert out == {"a": ["x", "y", "z"]}
def test_type_mismatch_override_wins(self):
# If shapes differ, override replaces wholesale.
out = _merge({"a": [1, 2]}, {"a": {"new": True}})
assert out == {"a": {"new": True}}
# --------------------------------------------------------------------------- #
# Loader helpers #
# --------------------------------------------------------------------------- #
def _write(path: Path, content: str) -> None:
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(content, encoding="utf-8")
@pytest.fixture
def isolated_loader(tmp_path: Path, monkeypatch):
"""Redirect _BUILTIN_ROOT and _LEARNED_ROOT to temp dirs."""
builtin = tmp_path / "builtin"
learned = tmp_path / "learned"
builtin.mkdir()
learned.mkdir()
monkeypatch.setattr(loader_mod, "_BUILTIN_ROOT", builtin)
monkeypatch.setattr(loader_mod, "_LEARNED_ROOT", learned)
return builtin, learned
class TestLoader:
def test_builtin_only(self, isolated_loader):
builtin, _ = isolated_loader
_write(
builtin / "subtitles.yaml",
"languages:\n fra:\n tokens: [fr, fre]\n",
)
ldr = KnowledgeLoader()
assert ldr.subtitles()["languages"]["fra"]["tokens"] == ["fr", "fre"]
def test_learned_adds_tokens_additively(self, isolated_loader):
builtin, learned = isolated_loader
_write(
builtin / "subtitles.yaml",
"languages:\n fra:\n tokens: [fr, fre]\n",
)
_write(
learned / "subtitles_learned.yaml",
"languages:\n fra:\n tokens: [vff, custom]\n",
)
ldr = KnowledgeLoader()
tokens = ldr.subtitles()["languages"]["fra"]["tokens"]
assert tokens == ["fr", "fre", "vff", "custom"]
def test_missing_files_dont_crash(self, isolated_loader):
# No files written → loader still produces empty structures.
ldr = KnowledgeLoader()
assert ldr.subtitles() == {}
assert ldr.patterns() == {}
assert ldr.release_groups() == {}
def test_builtin_pattern_loaded(self, isolated_loader):
builtin, _ = isolated_loader
_write(
builtin / "patterns" / "adjacent.yaml",
"id: adjacent\nscan_strategy: adjacent\ndescription: test\n",
)
ldr = KnowledgeLoader()
assert "adjacent" in ldr.patterns()
assert ldr.pattern("adjacent")["scan_strategy"] == "adjacent"
def test_learned_pattern_overlays_builtin(self, isolated_loader):
builtin, learned = isolated_loader
_write(
builtin / "patterns" / "p.yaml",
"id: p\nscan_strategy: flat\ndescription: old\n",
)
_write(
learned / "patterns" / "p.yaml",
"id: p\ndescription: new\n",
)
ldr = KnowledgeLoader()
# learned replaces scalar 'description', keeps scan_strategy from builtin
assert ldr.pattern("p")["description"] == "new"
assert ldr.pattern("p")["scan_strategy"] == "flat"
def test_learned_only_pattern_added(self, isolated_loader):
_, learned = isolated_loader
_write(
learned / "patterns" / "neo.yaml",
"id: neo\nscan_strategy: embedded\n",
)
ldr = KnowledgeLoader()
assert "neo" in ldr.patterns()
def test_release_group_case_insensitive_lookup(self, isolated_loader):
builtin, _ = isolated_loader
_write(
builtin / "release_groups" / "kontrast.yaml",
"name: KONTRAST\nknown_patterns: [adjacent]\n",
)
ldr = KnowledgeLoader()
# Stored under "KONTRAST" but case-insensitive match must work.
assert ldr.release_group("kontrast") is not None
assert ldr.release_group("Kontrast")["name"] == "KONTRAST"
assert ldr.release_group("unknown_group") is None
def test_pattern_id_falls_back_to_filename(self, isolated_loader):
# File without 'id' field — uses the stem.
builtin, _ = isolated_loader
_write(
builtin / "patterns" / "no_id.yaml",
"scan_strategy: adjacent\n",
)
ldr = KnowledgeLoader()
assert "no_id" in ldr.patterns()
# --------------------------------------------------------------------------- #
# SubtitleKnowledgeBase #
# --------------------------------------------------------------------------- #
class TestKnowledgeBase:
@pytest.fixture
def kb(self, isolated_loader):
builtin, _ = isolated_loader
_write(
builtin / "subtitles.yaml",
"""
formats:
srt:
extensions: [".srt"]
description: "SubRip"
ass:
extensions: [".ass", ".ssa"]
language_tokens:
fre: ["vostfr"]
types:
sdh:
tokens: ["sdh", "cc"]
forced:
tokens: ["forced"]
defaults:
languages: ["fre"]
formats: ["srt"]
types: ["standard"]
format_priority: ["srt"]
min_confidence: 0.8
""",
)
_write(
builtin / "patterns" / "adj.yaml",
"id: adj\nscan_strategy: adjacent\ndescription: d\n",
)
_write(
builtin / "patterns" / "bad.yaml",
# invalid scan_strategy → skipped at build time
"id: bad\nscan_strategy: not_a_real_strategy\n",
)
_write(
builtin / "release_groups" / "group_a.yaml",
"name: GroupA\nknown_patterns: [adj]\n",
)
return SubtitleKnowledgeBase()
def test_formats_loaded(self, kb):
formats = kb.formats()
assert "srt" in formats and "ass" in formats
assert kb.format_for_extension(".srt").id == "srt"
assert kb.format_for_extension(".ssa").id == "ass"
assert kb.format_for_extension(".unknown") is None
def test_known_extensions_aggregates(self, kb):
exts = kb.known_extensions()
assert ".srt" in exts and ".ass" in exts and ".ssa" in exts
def test_language_for_token(self, kb):
# Canonical ISO 639-2/B codes are sourced from LanguageRegistry.
assert kb.language_for_token("french").code == "fre"
assert kb.language_for_token("FR").code == "fre"
assert kb.language_for_token("xxx") is None
assert kb.is_known_lang_token("eng") is True
assert kb.is_known_lang_token("ghost") is False
def test_subtitle_specific_token_recognized(self, kb):
# ``vostfr`` is subtitle-specific and lives in subtitles.yaml's
# ``language_tokens`` block — still resolves to canonical "fre".
assert kb.language_for_token("vostfr").code == "fre"
def test_type_for_token(self, kb):
assert kb.type_for_token("sdh") == SubtitleType.SDH
assert kb.type_for_token("FORCED") == SubtitleType.FORCED
assert kb.type_for_token("nope") is None
# 'hi' must NOT be a SDH token any more (it collides with Hindi).
assert kb.is_known_type_token("hi") is False
assert kb.is_known_type_token("cc") is True
def test_default_rules(self, kb):
r = kb.default_rules()
assert r.preferred_languages == ["fre"]
assert r.preferred_formats == ["srt"]
assert r.min_confidence == 0.8
def test_patterns_valid_kept_invalid_skipped(self, kb):
patterns = kb.patterns()
assert "adj" in patterns
# 'bad' had an invalid scan_strategy → quietly dropped.
assert "bad" not in patterns
def test_pattern_typed_view(self, kb):
p = kb.pattern("adj")
assert p.scan_strategy == ScanStrategy.ADJACENT
assert p.type_detection == TypeDetectionMethod.TOKEN_IN_NAME
def test_patterns_for_group(self, kb):
ps = kb.patterns_for_group("GroupA")
assert len(ps) == 1 and ps[0].id == "adj"
assert kb.patterns_for_group("unknown") == []
def test_reload_picks_up_changes(self, kb, isolated_loader):
# Add a new pattern, reload, check it's visible.
builtin, _ = isolated_loader
_write(
builtin / "patterns" / "new.yaml",
"id: new\nscan_strategy: flat\n",
)
kb.reload()
assert "new" in kb.patterns()
+220
View File
@@ -0,0 +1,220 @@
"""Tests for ``alfred.domain.subtitles.services.matcher.SubtitleMatcher``.
The matcher filters classified subtitle tracks against effective rules,
returning ``(matched, unresolved)``. Coverage:
- ``TestUnresolved`` — None language or low confidence → unresolved.
- ``TestLanguageFilter`` / ``TestFormatFilter`` / ``TestTypeFilter`` —
rule-based exclusion.
- ``TestEmbeddedTracks`` — embedded tracks are skipped entirely.
- ``TestFormatPriority`` — conflict between two same-(lang, type) tracks
is resolved by ``format_priority``.
- ``TestNoConflict`` — different (lang, type) keys never collide.
Uses lightweight, hand-built value objects — no KB dependency.
"""
from __future__ import annotations
import pytest
from alfred.domain.subtitles.entities import SubtitleCandidate
from alfred.domain.subtitles.services.matcher import SubtitleMatcher
from alfred.domain.subtitles.value_objects import (
SubtitleFormat,
SubtitleLanguage,
SubtitleMatchingRules,
SubtitleType,
)
SRT = SubtitleFormat(id="srt", extensions=[".srt"])
ASS = SubtitleFormat(id="ass", extensions=[".ass"])
FRA = SubtitleLanguage(code="fra", tokens=["fr"])
ENG = SubtitleLanguage(code="eng", tokens=["en"])
SPA = SubtitleLanguage(code="spa", tokens=["es"])
def _track(
lang: SubtitleLanguage | None = FRA,
fmt: SubtitleFormat | None = SRT,
stype: SubtitleType = SubtitleType.STANDARD,
confidence: float = 1.0,
is_embedded: bool = False,
) -> SubtitleCandidate:
return SubtitleCandidate(
language=lang,
format=fmt,
subtitle_type=stype,
is_embedded=is_embedded,
confidence=confidence,
)
@pytest.fixture
def matcher():
return SubtitleMatcher()
# --------------------------------------------------------------------------- #
# Unresolved #
# --------------------------------------------------------------------------- #
class TestUnresolved:
def test_none_language_unresolved(self, matcher):
t = _track(lang=None)
rules = SubtitleMatchingRules(min_confidence=0.7)
matched, unresolved = matcher.match([t], rules)
assert matched == []
assert unresolved == [t]
def test_low_confidence_unresolved(self, matcher):
t = _track(confidence=0.3)
rules = SubtitleMatchingRules(min_confidence=0.7)
matched, unresolved = matcher.match([t], rules)
assert matched == []
assert unresolved == [t]
def test_threshold_exact_passes(self, matcher):
t = _track(confidence=0.7)
rules = SubtitleMatchingRules(
min_confidence=0.7, preferred_languages=["fra"]
)
matched, unresolved = matcher.match([t], rules)
assert matched == [t]
# --------------------------------------------------------------------------- #
# Filters #
# --------------------------------------------------------------------------- #
class TestLanguageFilter:
def test_preferred_languages_filters_out(self, matcher):
t_eng = _track(lang=ENG)
rules = SubtitleMatchingRules(
preferred_languages=["fra"], min_confidence=0.0
)
matched, _ = matcher.match([t_eng], rules)
assert matched == []
def test_preferred_language_match_passes(self, matcher):
t_fra = _track(lang=FRA)
rules = SubtitleMatchingRules(
preferred_languages=["fra"], min_confidence=0.0
)
matched, _ = matcher.match([t_fra], rules)
assert matched == [t_fra]
def test_empty_preferred_allows_all(self, matcher):
t_fra = _track(lang=FRA)
t_eng = _track(lang=ENG)
rules = SubtitleMatchingRules(min_confidence=0.0)
matched, _ = matcher.match([t_fra, t_eng], rules)
# No language filter → both pass (different keys → no conflict).
assert len(matched) == 2
class TestFormatFilter:
def test_format_outside_preferred_filtered(self, matcher):
t = _track(fmt=ASS)
rules = SubtitleMatchingRules(
preferred_formats=["srt"], min_confidence=0.0
)
matched, _ = matcher.match([t], rules)
assert matched == []
def test_no_format_attribute_filtered_when_pref_set(self, matcher):
t = _track(fmt=None)
rules = SubtitleMatchingRules(
preferred_formats=["srt"], min_confidence=0.0
)
matched, _ = matcher.match([t], rules)
assert matched == []
class TestTypeFilter:
def test_disallowed_type_excluded(self, matcher):
t = _track(stype=SubtitleType.SDH)
rules = SubtitleMatchingRules(
allowed_types=["standard", "forced"], min_confidence=0.0
)
matched, _ = matcher.match([t], rules)
assert matched == []
def test_allowed_type_passes(self, matcher):
t = _track(stype=SubtitleType.STANDARD)
rules = SubtitleMatchingRules(
allowed_types=["standard"], min_confidence=0.0
)
matched, _ = matcher.match([t], rules)
assert matched == [t]
# --------------------------------------------------------------------------- #
# Embedded handling #
# --------------------------------------------------------------------------- #
class TestEmbeddedTracks:
def test_embedded_track_skipped_entirely(self, matcher):
e = _track(is_embedded=True)
rules = SubtitleMatchingRules(min_confidence=0.0)
matched, unresolved = matcher.match([e], rules)
# Embedded tracks are not the matcher's concern.
assert matched == []
assert unresolved == []
# --------------------------------------------------------------------------- #
# Conflict resolution #
# --------------------------------------------------------------------------- #
class TestFormatPriority:
def test_higher_priority_format_wins(self, matcher):
# Same (lang, type) but different formats → priority decides.
t_srt = _track(fmt=SRT)
t_ass = _track(fmt=ASS)
rules = SubtitleMatchingRules(
min_confidence=0.0,
format_priority=["srt", "ass"],
)
matched, _ = matcher.match([t_ass, t_srt], rules)
assert len(matched) == 1
assert matched[0].format.id == "srt"
def test_first_seen_kept_when_no_priority(self, matcher):
t_srt = _track(fmt=SRT)
t_ass = _track(fmt=ASS)
rules = SubtitleMatchingRules(min_confidence=0.0)
matched, _ = matcher.match([t_ass, t_srt], rules)
# No priority → ass came first → kept.
assert len(matched) == 1
assert matched[0].format.id == "ass"
def test_priority_order_reversed(self, matcher):
t_srt = _track(fmt=SRT)
t_ass = _track(fmt=ASS)
rules = SubtitleMatchingRules(
min_confidence=0.0,
format_priority=["ass", "srt"],
)
matched, _ = matcher.match([t_srt, t_ass], rules)
assert matched[0].format.id == "ass"
class TestNoConflict:
def test_different_languages_both_kept(self, matcher):
t_fra = _track(lang=FRA)
t_eng = _track(lang=ENG)
rules = SubtitleMatchingRules(min_confidence=0.0)
matched, _ = matcher.match([t_fra, t_eng], rules)
assert len(matched) == 2
def test_different_types_both_kept(self, matcher):
t_std = _track(stype=SubtitleType.STANDARD)
t_sdh = _track(stype=SubtitleType.SDH)
rules = SubtitleMatchingRules(min_confidence=0.0)
matched, _ = matcher.match([t_std, t_sdh], rules)
assert len(matched) == 2
@@ -0,0 +1,190 @@
"""Tests for ``alfred.domain.subtitles.services.pattern_detector.PatternDetector``.
The detector inspects a release folder and returns the best-matching known
pattern + a confidence score.
Coverage:
- ``TestEmbeddedDetection`` — ffprobe is mocked; ``embedded`` pattern wins
when no external subs and ffprobe reports tracks.
- ``TestAdjacentDetection`` — .srt next to the video → ``adjacent``.
- ``TestFlatSubsFolder`` — ``Subs/*.srt`` → ``subs_flat``.
- ``TestEpisodeSubfolder`` — ``Subs/{ep}/*.srt`` → ``episode_subfolder``.
- ``TestNothingFound`` — empty release returns no pattern.
- ``TestDescribe`` — human-readable description mentions the right cues.
Uses the real ``SubtitleKnowledgeBase`` (loaded from the live builtin
``patterns/`` folder) since rebuilding all four patterns by hand would
just duplicate fixture state.
"""
from __future__ import annotations
from pathlib import Path
from unittest.mock import patch
import pytest
from alfred.domain.subtitles.knowledge.base import SubtitleKnowledgeBase
from alfred.domain.subtitles.services.pattern_detector import PatternDetector
@pytest.fixture(scope="module")
def kb():
return SubtitleKnowledgeBase()
@pytest.fixture
def detector(kb):
return PatternDetector(kb)
def _make_video(folder: Path, name: str = "Show.S01E01.mkv") -> Path:
v = folder / name
v.write_bytes(b"")
return v
# --------------------------------------------------------------------------- #
# Embedded #
# --------------------------------------------------------------------------- #
class TestEmbeddedDetection:
def test_embedded_only(self, detector, tmp_path):
# Folder has video but no external .srt files anywhere.
video = _make_video(tmp_path)
with patch.object(
PatternDetector, "_has_embedded_subtitles", return_value=True
):
result = detector.detect(tmp_path, video)
assert result["detected"] is not None
assert result["detected"].id == "embedded"
assert result["confidence"] > 0
assert "embedded" in result["description"].lower()
# --------------------------------------------------------------------------- #
# Adjacent #
# --------------------------------------------------------------------------- #
class TestAdjacentDetection:
def test_srt_next_to_video(self, detector, tmp_path):
video = _make_video(tmp_path)
(tmp_path / "Show.S01E01.English.srt").write_text("")
(tmp_path / "Show.S01E01.French.srt").write_text("")
with patch.object(
PatternDetector, "_has_embedded_subtitles", return_value=False
):
result = detector.detect(tmp_path, video)
assert result["detected"] is not None
assert result["detected"].id == "adjacent"
assert "adjacent" in result["description"]
# --------------------------------------------------------------------------- #
# Subs flat folder #
# --------------------------------------------------------------------------- #
class TestFlatSubsFolder:
def test_flat_subs_folder_adjacent_to_video(self, detector, tmp_path):
video = _make_video(tmp_path)
subs = tmp_path / "Subs"
subs.mkdir()
(subs / "Show.S01E01.English.srt").write_text("")
(subs / "Show.S01E01.French.srt").write_text("")
with patch.object(
PatternDetector, "_has_embedded_subtitles", return_value=False
):
result = detector.detect(tmp_path, video)
assert result["detected"] is not None
assert result["detected"].id == "subs_flat"
assert "flat" in result["description"]
def test_flat_subs_folder_at_release_root(self, detector, tmp_path):
# Sample video lives one level deep; Subs/ is at the release root.
season_dir = tmp_path / "Season.01"
season_dir.mkdir()
video = _make_video(season_dir)
subs = tmp_path / "Subs"
subs.mkdir()
(subs / "ep01.English.srt").write_text("")
with patch.object(
PatternDetector, "_has_embedded_subtitles", return_value=False
):
result = detector.detect(tmp_path, video)
assert result["detected"] is not None
assert result["detected"].id == "subs_flat"
# --------------------------------------------------------------------------- #
# Episode subfolder #
# --------------------------------------------------------------------------- #
class TestEpisodeSubfolder:
def test_per_episode_subfolder(self, detector, tmp_path):
video = _make_video(tmp_path, name="Show.S01E01.mkv")
subs = tmp_path / "Subs" / "Show.S01E01"
subs.mkdir(parents=True)
(subs / "2_English.srt").write_text("")
(subs / "3_French.srt").write_text("")
with patch.object(
PatternDetector, "_has_embedded_subtitles", return_value=False
):
result = detector.detect(tmp_path, video)
assert result["detected"] is not None
assert result["detected"].id == "episode_subfolder"
desc = result["description"]
assert "episode_subfolder" in desc
# Numeric-prefix cue should be reported.
assert "numeric prefix" in desc
# --------------------------------------------------------------------------- #
# Nothing #
# --------------------------------------------------------------------------- #
class TestNothingFound:
def test_empty_release_no_pattern(self, detector, tmp_path):
video = _make_video(tmp_path)
with patch.object(
PatternDetector, "_has_embedded_subtitles", return_value=False
):
result = detector.detect(tmp_path, video)
# No external subs and no embedded → adjacent strategy still scores
# 0.5 (no Subs folder bonus). Best pattern may exist or not depending
# on threshold (0.4). Either way the description must reflect emptiness.
assert "no external subtitle files" in result["description"]
# --------------------------------------------------------------------------- #
# Describe #
# --------------------------------------------------------------------------- #
class TestDescribe:
def test_describe_includes_language_token_cue(self, detector, tmp_path):
video = _make_video(tmp_path)
subs = tmp_path / "Subs"
subs.mkdir()
(subs / "ep01.English.srt").write_text("")
with patch.object(
PatternDetector, "_has_embedded_subtitles", return_value=False
):
result = detector.detect(tmp_path, video)
assert "language tokens" in result["description"]
def test_describe_combines_external_and_embedded(self, detector, tmp_path):
video = _make_video(tmp_path)
(tmp_path / "Show.S01E01.English.srt").write_text("")
with patch.object(
PatternDetector, "_has_embedded_subtitles", return_value=True
):
result = detector.detect(tmp_path, video)
desc = result["description"]
assert "adjacent" in desc
assert "embedded" in desc.lower()
+221
View File
@@ -0,0 +1,221 @@
"""Tests for ``alfred.domain.subtitles.services.placer.SubtitlePlacer``.
The placer hard-links subtitle files next to a destination video, naming
them ``{video_stem}.{lang}[.sdh|.forced].{ext}``.
Coverage:
- ``TestBuildDestName`` — name construction for standard / SDH / forced;
errors on missing language or format.
- ``TestPlace`` — happy path: link is created, ``PlacedTrack`` populated.
- ``TestSkipReasons`` — embedded, missing source, missing language/format,
destination already exists.
- ``TestOSError`` — ``os.link`` failures are captured as ``skipped``.
- ``TestPlaceResultCounts`` — ``placed_count`` / ``skipped_count`` properties.
"""
from __future__ import annotations
from pathlib import Path
from unittest.mock import patch
import pytest
from alfred.domain.subtitles.entities import SubtitleCandidate
from alfred.domain.subtitles.services.placer import (
PlacedTrack,
PlaceResult,
SubtitlePlacer,
_build_dest_name,
)
from alfred.domain.subtitles.value_objects import (
SubtitleFormat,
SubtitleLanguage,
SubtitleType,
)
SRT = SubtitleFormat(id="srt", extensions=[".srt"])
ASS = SubtitleFormat(id="ass", extensions=[".ass", ".ssa"])
FRA = SubtitleLanguage(code="fra", tokens=["fr"])
def _track(
file_path: Path | None,
*,
lang=FRA,
fmt=SRT,
stype=SubtitleType.STANDARD,
is_embedded: bool = False,
) -> SubtitleCandidate:
return SubtitleCandidate(
language=lang,
format=fmt,
subtitle_type=stype,
file_path=file_path,
is_embedded=is_embedded,
)
# --------------------------------------------------------------------------- #
# _build_dest_name #
# --------------------------------------------------------------------------- #
class TestBuildDestName:
def test_standard(self):
t = _track(None, stype=SubtitleType.STANDARD)
assert _build_dest_name(t, "Movie.2010") == "Movie.2010.fra.srt"
def test_sdh(self):
t = _track(None, stype=SubtitleType.SDH)
assert _build_dest_name(t, "Movie.2010") == "Movie.2010.fra.sdh.srt"
def test_forced(self):
t = _track(None, stype=SubtitleType.FORCED)
assert _build_dest_name(t, "Movie.2010") == "Movie.2010.fra.forced.srt"
def test_uses_first_extension_of_multi_ext_format(self):
t = _track(None, fmt=ASS)
# ASS has [.ass, .ssa] — first wins.
assert _build_dest_name(t, "x").endswith(".ass")
def test_missing_lang_raises(self):
t = _track(None, lang=None)
with pytest.raises(ValueError, match="language or format"):
_build_dest_name(t, "x")
def test_missing_format_raises(self):
t = _track(None, fmt=None)
with pytest.raises(ValueError, match="language or format"):
_build_dest_name(t, "x")
# --------------------------------------------------------------------------- #
# Place — happy path #
# --------------------------------------------------------------------------- #
@pytest.fixture
def placer():
return SubtitlePlacer()
class TestPlace:
def test_creates_hard_link_with_correct_name(self, placer, tmp_path):
src = tmp_path / "input.srt"
src.write_text("subs")
video = tmp_path / "lib" / "Movie.2010.mkv"
video.parent.mkdir()
video.write_bytes(b"")
track = _track(src)
result = placer.place([track], video)
assert result.placed_count == 1
assert result.skipped_count == 0
placed = result.placed[0]
assert placed.filename == "Movie.2010.fra.srt"
assert placed.destination.exists()
# Hard link → same inode as source.
assert placed.destination.stat().st_ino == src.stat().st_ino
def test_multiple_tracks_distinct_destinations(self, placer, tmp_path):
s1 = tmp_path / "a.srt"
s1.write_text("")
s2 = tmp_path / "b.srt"
s2.write_text("")
video = tmp_path / "lib" / "Movie.mkv"
video.parent.mkdir()
video.write_bytes(b"")
ENG = SubtitleLanguage(code="eng", tokens=["en"])
t1 = _track(s1, lang=FRA)
t2 = _track(s2, lang=ENG, stype=SubtitleType.SDH)
result = placer.place([t1, t2], video)
assert result.placed_count == 2
names = {p.filename for p in result.placed}
assert names == {"Movie.fra.srt", "Movie.eng.sdh.srt"}
# --------------------------------------------------------------------------- #
# Skip reasons #
# --------------------------------------------------------------------------- #
class TestSkipReasons:
def test_embedded_skipped(self, placer, tmp_path):
video = tmp_path / "Movie.mkv"
video.write_bytes(b"")
track = _track(None, is_embedded=True)
result = placer.place([track], video)
assert result.placed == []
assert len(result.skipped) == 1
assert "embedded" in result.skipped[0][1]
def test_missing_source_file(self, placer, tmp_path):
video = tmp_path / "Movie.mkv"
video.write_bytes(b"")
track = _track(tmp_path / "ghost.srt")
result = placer.place([track], video)
assert result.placed == []
assert "not found" in result.skipped[0][1]
def test_missing_lang_or_format_skipped(self, placer, tmp_path):
video = tmp_path / "Movie.mkv"
video.write_bytes(b"")
src = tmp_path / "x.srt"
src.write_text("")
track = _track(src, lang=None)
result = placer.place([track], video)
assert result.placed == []
assert "language or format" in result.skipped[0][1]
def test_destination_already_exists(self, placer, tmp_path):
src = tmp_path / "x.srt"
src.write_text("a")
video = tmp_path / "lib" / "Movie.mkv"
video.parent.mkdir()
video.write_bytes(b"")
# Pre-create destination
(video.parent / "Movie.fra.srt").write_text("preexisting")
track = _track(src)
result = placer.place([track], video)
assert result.placed == []
assert "already exists" in result.skipped[0][1]
# --------------------------------------------------------------------------- #
# OSError handling #
# --------------------------------------------------------------------------- #
class TestOSError:
def test_link_failure_captured_as_skipped(self, placer, tmp_path):
src = tmp_path / "x.srt"
src.write_text("")
video = tmp_path / "lib" / "Movie.mkv"
video.parent.mkdir()
video.write_bytes(b"")
track = _track(src)
with patch(
"alfred.domain.subtitles.services.placer.os.link",
side_effect=OSError("cross-device link"),
):
result = placer.place([track], video)
assert result.placed == []
assert "cross-device" in result.skipped[0][1]
# --------------------------------------------------------------------------- #
# PlaceResult counters #
# --------------------------------------------------------------------------- #
class TestPlaceResultCounts:
def test_counts(self):
# Synthesize a PlaceResult directly for property check.
pt = PlacedTrack(source=Path("/a"), destination=Path("/b"), filename="b")
st = _track(None, is_embedded=True)
r = PlaceResult(placed=[pt], skipped=[(st, "x")])
assert r.placed_count == 1
assert r.skipped_count == 1
+54 -37
View File
@@ -14,11 +14,12 @@ from alfred.domain.subtitles.scanner import (
class TestClassify:
def test_iso_lang_code(self, tmp_path):
def test_iso_lang_code_639_1_alias(self, tmp_path):
# ``fr`` is an alias of the canonical ISO 639-2/B code ``fre``.
p = tmp_path / "fr.srt"
p.write_text("")
lang, is_sdh, is_forced = _classify(p)
assert lang == "fr"
assert lang == "fre"
assert not is_sdh
assert not is_forced
@@ -26,35 +27,39 @@ class TestClassify:
p = tmp_path / "english.srt"
p.write_text("")
lang, _, _ = _classify(p)
assert lang == "en"
assert lang == "eng"
def test_french_keyword(self, tmp_path):
p = tmp_path / "Show.S01E01.French.srt"
p.write_text("")
lang, _, _ = _classify(p)
assert lang == "fr"
assert lang == "fre"
def test_vostfr_is_french(self, tmp_path):
p = tmp_path / "Show.S01E01.VOSTFR.srt"
p.write_text("")
lang, _, _ = _classify(p)
assert lang == "fr"
assert lang == "fre"
def test_sdh_token(self, tmp_path):
p = tmp_path / "fr.sdh.srt"
p = tmp_path / "fre.sdh.srt"
p.write_text("")
lang, is_sdh, _ = _classify(p)
assert lang == "fr"
assert lang == "fre"
assert is_sdh
def test_hi_alias_for_sdh(self, tmp_path):
def test_hi_no_longer_marks_sdh(self, tmp_path):
# ``hi`` is the ISO 639-1 alias for Hindi; it must not mark a file as
# SDH any more (regression of the previous collision between SDH and
# Hindi tokens). Use ``sdh`` / ``cc`` / ``hearing`` to flag SDH instead.
p = tmp_path / "en.hi.srt"
p.write_text("")
_, is_sdh, _ = _classify(p)
assert is_sdh
lang, is_sdh, _ = _classify(p)
assert lang == "eng"
assert not is_sdh
def test_forced_token(self, tmp_path):
p = tmp_path / "fr.forced.srt"
p = tmp_path / "fre.forced.srt"
p.write_text("")
_, _, is_forced = _classify(p)
assert is_forced
@@ -66,17 +71,17 @@ class TestClassify:
assert lang is None
def test_dot_separator(self, tmp_path):
p = tmp_path / "fr.sdh.srt"
p = tmp_path / "fre.sdh.srt"
p.write_text("")
lang, is_sdh, _ = _classify(p)
assert lang == "fr"
assert lang == "fre"
assert is_sdh
def test_hyphen_separator(self, tmp_path):
p = tmp_path / "fr-forced.srt"
p = tmp_path / "fre-forced.srt"
p.write_text("")
lang, _, is_forced = _classify(p)
assert lang == "fr"
assert lang == "fre"
assert is_forced
@@ -86,9 +91,9 @@ class TestClassify:
class TestSubtitleCandidateDestinationName:
def _make(self, lang="fr", is_sdh=False, is_forced=False, ext=".srt", path=None):
def _make(self, lang="fre", is_sdh=False, is_forced=False, ext=".srt", path=None):
return SubtitleCandidate(
source_path=path or Path("/fake/fr.srt"),
source_path=path or Path("/fake/fre.srt"),
language=lang,
is_sdh=is_sdh,
is_forced=is_forced,
@@ -96,19 +101,19 @@ class TestSubtitleCandidateDestinationName:
)
def test_standard(self):
assert self._make().destination_name == "fr.srt"
assert self._make().destination_name == "fre.srt"
def test_sdh(self):
assert self._make(is_sdh=True).destination_name == "fr.sdh.srt"
assert self._make(is_sdh=True).destination_name == "fre.sdh.srt"
def test_forced(self):
assert self._make(is_forced=True).destination_name == "fr.forced.srt"
assert self._make(is_forced=True).destination_name == "fre.forced.srt"
def test_ass_extension(self):
assert self._make(ext=".ass").destination_name == "fr.ass"
assert self._make(ext=".ass").destination_name == "fre.ass"
def test_english_standard(self):
assert self._make(lang="en").destination_name == "en.srt"
assert self._make(lang="eng").destination_name == "eng.srt"
# ---------------------------------------------------------------------------
@@ -119,7 +124,7 @@ class TestSubtitleCandidateDestinationName:
class TestSubtitleScanner:
def _scanner(self, languages=None, min_size_kb=0, keep_sdh=True, keep_forced=True):
return SubtitleScanner(
languages=languages or ["fr", "en"],
languages=languages or ["fre", "eng"],
min_size_kb=min_size_kb,
keep_sdh=keep_sdh,
keep_forced=keep_forced,
@@ -131,31 +136,43 @@ class TestSubtitleScanner:
return video
def test_finds_adjacent_subtitle(self, tmp_path):
video = self._video(tmp_path)
(tmp_path / "fre.srt").write_text("subtitle content")
candidates = self._scanner().scan(video)
assert len(candidates) == 1
assert candidates[0].language == "fre"
def test_finds_adjacent_subtitle_legacy_639_1(self, tmp_path):
# Reading existing media libraries: ``fr.srt`` is still recognized as
# French and classified canonically as ``fre`` — covers user libraries
# written before the ISO 639-2/B migration.
video = self._video(tmp_path)
(tmp_path / "fr.srt").write_text("subtitle content")
candidates = self._scanner().scan(video)
assert len(candidates) == 1
assert candidates[0].language == "fr"
assert candidates[0].language == "fre"
def test_finds_multiple_languages(self, tmp_path):
video = self._video(tmp_path)
(tmp_path / "fr.srt").write_text("fr subtitle")
(tmp_path / "en.srt").write_text("en subtitle")
(tmp_path / "fre.srt").write_text("fr subtitle")
(tmp_path / "eng.srt").write_text("en subtitle")
candidates = self._scanner().scan(video)
langs = {c.language for c in candidates}
assert langs == {"fr", "en"}
assert langs == {"fre", "eng"}
def test_scans_subs_subfolder(self, tmp_path):
video = self._video(tmp_path)
subs = tmp_path / "Subs"
subs.mkdir()
(subs / "fr.srt").write_text("subtitle")
(subs / "fre.srt").write_text("subtitle")
candidates = self._scanner().scan(video)
assert any(c.language == "fr" for c in candidates)
assert any(c.language == "fre" for c in candidates)
def test_filters_unknown_language(self, tmp_path):
video = self._video(tmp_path)
@@ -166,14 +183,14 @@ class TestSubtitleScanner:
def test_filters_wrong_language(self, tmp_path):
video = self._video(tmp_path)
(tmp_path / "de.srt").write_text("german subtitle")
(tmp_path / "ger.srt").write_text("german subtitle")
candidates = self._scanner(languages=["fr"]).scan(video)
candidates = self._scanner(languages=["fre"]).scan(video)
assert len(candidates) == 0
def test_filters_too_small_file(self, tmp_path):
video = self._video(tmp_path)
small = tmp_path / "fr.srt"
small = tmp_path / "fre.srt"
small.write_bytes(b"x") # 1 byte, well below any min_size_kb
candidates = self._scanner(min_size_kb=10).scan(video)
@@ -181,21 +198,21 @@ class TestSubtitleScanner:
def test_filters_sdh_when_not_wanted(self, tmp_path):
video = self._video(tmp_path)
(tmp_path / "fr.sdh.srt").write_text("sdh subtitle")
(tmp_path / "fre.sdh.srt").write_text("sdh subtitle")
candidates = self._scanner(keep_sdh=False).scan(video)
assert len(candidates) == 0
def test_filters_forced_when_not_wanted(self, tmp_path):
video = self._video(tmp_path)
(tmp_path / "fr.forced.srt").write_text("forced subtitle")
(tmp_path / "fre.forced.srt").write_text("forced subtitle")
candidates = self._scanner(keep_forced=False).scan(video)
assert len(candidates) == 0
def test_keeps_sdh_when_wanted(self, tmp_path):
video = self._video(tmp_path)
(tmp_path / "fr.sdh.srt").write_text("sdh subtitle")
(tmp_path / "fre.sdh.srt").write_text("sdh subtitle")
candidates = self._scanner(keep_sdh=True).scan(video)
assert len(candidates) == 1
@@ -203,8 +220,8 @@ class TestSubtitleScanner:
def test_ignores_non_subtitle_files(self, tmp_path):
video = self._video(tmp_path)
(tmp_path / "fr.nfo").write_text("nfo file")
(tmp_path / "fr.jpg").write_bytes(b"image")
(tmp_path / "fre.nfo").write_text("nfo file")
(tmp_path / "fre.jpg").write_bytes(b"image")
candidates = self._scanner().scan(video)
assert len(candidates) == 0
+277
View File
@@ -0,0 +1,277 @@
"""Tests for subtitle value objects, entities, and the ``utils`` service.
Targets the quick-win surface of the subtitle domain that was largely
uncovered:
- ``TestSubtitleFormat`` — extension matching (case-insensitive).
- ``TestSubtitleLanguage`` — token matching (case-insensitive).
- ``TestSubtitleCandidateDestName`` — ``destination_name`` property:
standard / SDH / forced naming, error on missing language or format.
- ``TestSubtitleCandidateRepr`` — debug repr for embedded vs external.
- ``TestMediaSubtitleMetadata`` — ``all_tracks`` / ``total_count`` /
``unresolved_tracks``.
- ``TestAvailableSubtitles`` — utility dedup by (lang, type).
- ``TestSubtitleRuleSet`` — scope inheritance + ``override`` mutation +
``to_dict`` shape.
All pure-Python — no I/O.
"""
from __future__ import annotations
from pathlib import Path
import pytest
from alfred.domain.subtitles.aggregates import SubtitleRuleSet
from alfred.domain.subtitles.entities import MediaSubtitleMetadata, SubtitleCandidate
from alfred.domain.subtitles.services.utils import available_subtitles
from alfred.domain.subtitles.value_objects import (
RuleScope,
SubtitleFormat,
SubtitleLanguage,
SubtitleType,
)
# --------------------------------------------------------------------------- #
# Value objects #
# --------------------------------------------------------------------------- #
class TestSubtitleFormat:
def test_matches_extension_case_insensitive(self):
fmt = SubtitleFormat(id="srt", extensions=[".srt"])
assert fmt.matches_extension(".srt")
assert fmt.matches_extension(".SRT")
assert not fmt.matches_extension(".ass")
def test_multiple_extensions(self):
fmt = SubtitleFormat(id="ass", extensions=[".ass", ".ssa"])
assert fmt.matches_extension(".ass")
assert fmt.matches_extension(".ssa")
assert fmt.matches_extension(".SSA")
assert not fmt.matches_extension(".srt")
class TestSubtitleLanguage:
def test_matches_token_case_insensitive(self):
lang = SubtitleLanguage(code="fra", tokens=["fr", "fre", "french"])
assert lang.matches_token("fr")
assert lang.matches_token("FRENCH")
assert lang.matches_token("French")
assert not lang.matches_token("eng")
# --------------------------------------------------------------------------- #
# SubtitleCandidate #
# --------------------------------------------------------------------------- #
SRT = SubtitleFormat(id="srt", extensions=[".srt"])
FRA = SubtitleLanguage(code="fra", tokens=["fr", "fre"])
class TestSubtitleCandidateDestName:
def test_standard(self):
t = SubtitleCandidate(
language=FRA, format=SRT, subtitle_type=SubtitleType.STANDARD
)
assert t.destination_name == "fra.srt"
def test_sdh(self):
t = SubtitleCandidate(language=FRA, format=SRT, subtitle_type=SubtitleType.SDH)
assert t.destination_name == "fra.sdh.srt"
def test_forced(self):
t = SubtitleCandidate(language=FRA, format=SRT, subtitle_type=SubtitleType.FORCED)
assert t.destination_name == "fra.forced.srt"
def test_unknown_treated_as_standard(self):
t = SubtitleCandidate(language=FRA, format=SRT, subtitle_type=SubtitleType.UNKNOWN)
# UNKNOWN doesn't add a suffix → same as standard.
assert t.destination_name == "fra.srt"
def test_missing_language_raises(self):
t = SubtitleCandidate(language=None, format=SRT)
with pytest.raises(ValueError, match="language or format missing"):
t.destination_name
def test_missing_format_raises(self):
t = SubtitleCandidate(language=FRA, format=None)
with pytest.raises(ValueError, match="language or format missing"):
t.destination_name
def test_extension_dot_stripped(self):
# Format extension is ".srt" — leading dot must not be duplicated.
t = SubtitleCandidate(language=FRA, format=SRT)
assert t.destination_name.endswith(".srt")
assert ".." not in t.destination_name
class TestSubtitleCandidateRepr:
def test_embedded_repr(self):
t = SubtitleCandidate(language=FRA, format=None, is_embedded=True, confidence=1.0)
r = repr(t)
assert "fra" in r
assert "embedded" in r
def test_external_repr_uses_filename(self, tmp_path):
f = tmp_path / "fr.srt"
f.write_text("")
t = SubtitleCandidate(
language=FRA, format=SRT, file_path=f, confidence=0.85
)
r = repr(t)
assert "fra" in r
assert "fr.srt" in r
assert "0.85" in r
def test_unresolved_repr(self):
t = SubtitleCandidate(language=None, format=None)
r = repr(t)
assert "?" in r
# --------------------------------------------------------------------------- #
# MediaSubtitleMetadata #
# --------------------------------------------------------------------------- #
class TestMediaSubtitleMetadata:
def test_empty(self):
m = MediaSubtitleMetadata(media_id=None, media_type="movie")
assert m.all_tracks == []
assert m.total_count == 0
assert m.unresolved_tracks == []
def test_aggregates_embedded_and_external(self):
e = SubtitleCandidate(language=FRA, format=None, is_embedded=True)
x = SubtitleCandidate(language=FRA, format=SRT, file_path=Path("/x.srt"))
m = MediaSubtitleMetadata(
media_id=None,
media_type="movie",
embedded_tracks=[e],
external_tracks=[x],
)
assert m.total_count == 2
assert m.all_tracks == [e, x]
def test_unresolved_tracks_only_external_with_none_lang(self):
# An embedded with None language must NOT appear in unresolved_tracks
# (the property only iterates external_tracks).
embedded_unknown = SubtitleCandidate(language=None, format=None, is_embedded=True)
external_known = SubtitleCandidate(
language=FRA, format=SRT, file_path=Path("/a.srt")
)
external_unknown = SubtitleCandidate(
language=None, format=SRT, file_path=Path("/b.srt")
)
m = MediaSubtitleMetadata(
media_id=None,
media_type="movie",
embedded_tracks=[embedded_unknown],
external_tracks=[external_known, external_unknown],
)
assert m.unresolved_tracks == [external_unknown]
# --------------------------------------------------------------------------- #
# available_subtitles utility #
# --------------------------------------------------------------------------- #
class TestAvailableSubtitles:
def test_dedup_by_lang_and_type(self):
ENG = SubtitleLanguage(code="eng", tokens=["en"])
tracks = [
SubtitleCandidate(language=FRA, format=SRT, subtitle_type=SubtitleType.STANDARD),
SubtitleCandidate(language=FRA, format=SRT, subtitle_type=SubtitleType.STANDARD),
SubtitleCandidate(language=FRA, format=SRT, subtitle_type=SubtitleType.SDH),
SubtitleCandidate(language=ENG, format=SRT, subtitle_type=SubtitleType.STANDARD),
]
result = available_subtitles(tracks)
keys = [(t.language.code, t.subtitle_type) for t in result]
assert keys == [
("fra", SubtitleType.STANDARD),
("fra", SubtitleType.SDH),
("eng", SubtitleType.STANDARD),
]
def test_none_language_treated_as_key(self):
# Tracks with no language form a single None-keyed bucket.
t1 = SubtitleCandidate(
language=None, format=SRT, subtitle_type=SubtitleType.UNKNOWN
)
t2 = SubtitleCandidate(
language=None, format=SRT, subtitle_type=SubtitleType.UNKNOWN
)
result = available_subtitles([t1, t2])
assert len(result) == 1
def test_empty(self):
assert available_subtitles([]) == []
# --------------------------------------------------------------------------- #
# SubtitleRuleSet inheritance #
# --------------------------------------------------------------------------- #
class TestSubtitleRuleSet:
def test_global_default_uses_kb_defaults(self):
rs = SubtitleRuleSet.global_default()
rules = rs.resolve()
# Loaded from subtitles.yaml — defaults must be non-empty.
assert rules.preferred_languages
assert rules.preferred_formats
assert 0 < rules.min_confidence <= 1
def test_override_persists(self):
rs = SubtitleRuleSet.global_default()
rs.override(languages=["eng"], min_confidence=0.9)
rules = rs.resolve()
assert rules.preferred_languages == ["eng"]
assert rules.min_confidence == 0.9
def test_override_partial_keeps_parent_for_unset_fields(self):
parent = SubtitleRuleSet.global_default()
child = SubtitleRuleSet(
scope=RuleScope(level="show", identifier="tt1"),
parent=parent,
)
child.override(languages=["jpn"])
rules = child.resolve()
assert rules.preferred_languages == ["jpn"]
# min_confidence not overridden at child or parent → falls back to defaults
assert rules.min_confidence == parent.resolve().min_confidence
def test_to_dict_only_emits_set_deltas(self):
rs = SubtitleRuleSet(scope=RuleScope(level="show", identifier="tt1"))
rs.override(languages=["fra"])
out = rs.to_dict()
assert out["scope"] == {"level": "show", "identifier": "tt1"}
assert out["override"] == {"languages": ["fra"]}
def test_to_dict_full_override(self):
rs = SubtitleRuleSet(scope=RuleScope(level="global"))
rs.override(
languages=["fra"],
formats=["srt"],
types=["standard"],
format_priority=["srt", "ass"],
min_confidence=0.8,
)
out = rs.to_dict()
ov = out["override"]
assert ov["languages"] == ["fra"]
assert ov["formats"] == ["srt"]
assert ov["types"] == ["standard"]
assert ov["format_priority"] == ["srt", "ass"]
assert ov["min_confidence"] == 0.8
def test_min_confidence_zero_is_respected(self):
# `_min_confidence or base.min_confidence` would be a bug here — the
# code uses `is not None` explicitly. Verify 0.0 doesn't fall back.
rs = SubtitleRuleSet.global_default()
rs.override(min_confidence=0.0)
assert rs.resolve().min_confidence == 0.0
+341 -107
View File
@@ -1,10 +1,40 @@
"""Tests for TV Show domain — entities and value objects."""
"""Tests for the TV Show domain — entities, value objects, aggregate behavior.
Rewritten for the post-refactor aggregate:
* ``TVShow`` is the root, owning ``seasons: dict[SeasonNumber, Season]``.
* ``Season`` owns ``episodes: dict[EpisodeNumber, Episode]`` and tracks
``expected_episodes`` + ``aired_episodes``.
* ``Episode`` carries ``audio_tracks`` + ``subtitle_tracks`` and exposes
language helpers following contract C+ (``str`` direct compare, ``Language``
cross-format).
* No back-references on Season/Episode — they are reached through the root.
* Sole sanctioned mutation entry point: ``TVShow.add_episode(ep)``.
Coverage:
* ``TestShowStatus`` — including the extended TMDB string mapping.
* ``TestSeasonNumber`` / ``TestEpisodeNumber`` — value-object validation.
* ``TestEpisode`` — basic shape, file presence, audio/subtitle helpers.
* ``TestSeason`` — episode insertion, completeness vs aired, missing list.
* ``TestTVShow`` — aggregate invariants, ``add_episode``, ``collection_status``,
``missing_episodes``, ``is_complete_series``.
"""
from __future__ import annotations
import pytest
from alfred.domain.shared.exceptions import ValidationError
from alfred.domain.shared.media import AudioTrack, SubtitleTrack
from alfred.domain.shared.value_objects import ImdbId, Language
from alfred.domain.tv_shows.entities import Episode, Season, TVShow
from alfred.domain.tv_shows.value_objects import EpisodeNumber, SeasonNumber, ShowStatus
from alfred.domain.tv_shows.value_objects import (
CollectionStatus,
EpisodeNumber,
SeasonNumber,
ShowStatus,
)
# ---------------------------------------------------------------------------
# ShowStatus
@@ -20,11 +50,25 @@ class TestShowStatus:
def test_from_string_case_insensitive(self):
assert ShowStatus.from_string("ONGOING") == ShowStatus.ONGOING
assert ShowStatus.from_string("Ended") == ShowStatus.ENDED
assert ShowStatus.from_string(" Ended ") == ShowStatus.ENDED
def test_from_string_unknown(self):
assert ShowStatus.from_string("cancelled") == ShowStatus.UNKNOWN
@pytest.mark.parametrize(
"raw,expected",
[
("Returning Series", ShowStatus.ONGOING),
("In Production", ShowStatus.ONGOING),
("Pilot", ShowStatus.ONGOING),
("Planned", ShowStatus.ONGOING),
("Canceled", ShowStatus.ENDED),
("Cancelled", ShowStatus.ENDED),
],
)
def test_from_string_tmdb_mappings(self, raw, expected):
assert ShowStatus.from_string(raw) == expected
def test_from_string_empty_or_unknown(self):
assert ShowStatus.from_string("") == ShowStatus.UNKNOWN
assert ShowStatus.from_string("borked") == ShowStatus.UNKNOWN
# ---------------------------------------------------------------------------
@@ -34,12 +78,10 @@ class TestShowStatus:
class TestSeasonNumber:
def test_valid_season(self):
s = SeasonNumber(1)
assert s.value == 1
assert SeasonNumber(1).value == 1
def test_season_zero_is_specials(self):
s = SeasonNumber(0)
assert s.is_special()
assert SeasonNumber(0).is_special()
def test_normal_season_not_special(self):
assert not SeasonNumber(3).is_special()
@@ -69,8 +111,7 @@ class TestSeasonNumber:
class TestEpisodeNumber:
def test_valid_episode(self):
e = EpisodeNumber(1)
assert e.value == 1
assert EpisodeNumber(1).value == 1
def test_zero_raises(self):
with pytest.raises(ValidationError):
@@ -91,64 +132,107 @@ class TestEpisodeNumber:
# ---------------------------------------------------------------------------
# TVShow entity
# Episode entity
# ---------------------------------------------------------------------------
class TestTVShow:
def _make(
self, imdb_id="tt0903747", title="Breaking Bad", seasons=5, status="ended"
):
return TVShow(
imdb_id=imdb_id, title=title, seasons_count=seasons, status=status
class TestEpisode:
def _ep(self, *, season=1, episode=1, title="Pilot", **kwargs) -> Episode:
return Episode(
season_number=season,
episode_number=episode,
title=title,
**kwargs,
)
def test_basic_creation(self):
show = self._make()
assert show.title == "Breaking Bad"
assert show.seasons_count == 5
def test_basic_creation_coerces_numbers(self):
e = self._ep()
assert e.title == "Pilot"
assert isinstance(e.season_number, SeasonNumber)
assert isinstance(e.episode_number, EpisodeNumber)
def test_coerces_string_imdb_id(self):
show = self._make()
from alfred.domain.shared.value_objects import ImdbId
def test_get_filename_format(self):
e = self._ep(season=1, episode=5, title="Gray Matter")
filename = e.get_filename()
assert filename.startswith("S01E05")
assert "Gray.Matter" in filename
assert isinstance(show.imdb_id, ImdbId)
def test_has_file_false_when_no_path(self):
e = self._ep()
assert not e.has_file()
assert not e.is_downloaded()
def test_coerces_string_status(self):
show = self._make(status="ongoing")
assert show.status == ShowStatus.ONGOING
def test_str_format(self):
e = self._ep(season=2, episode=3, title="Bit by a Dead Bee")
s = str(e)
assert "S02E03" in s
assert "Bit by a Dead Bee" in s
def test_is_ongoing(self):
show = self._make(status="ongoing")
assert show.is_ongoing()
assert not show.is_ended()
# ── Audio helpers ──────────────────────────────────────────────────
def test_is_ended(self):
show = self._make(status="ended")
assert show.is_ended()
assert not show.is_ongoing()
def test_has_audio_in_with_str(self):
e = self._ep(
audio_tracks=[
AudioTrack(0, "eac3", 6, "5.1", "eng"),
AudioTrack(1, "ac3", 6, "5.1", "fre"),
]
)
assert e.has_audio_in("eng") is True
assert e.has_audio_in("ENG") is True # case-insensitive
assert e.has_audio_in("ger") is False
def test_negative_seasons_raises(self):
with pytest.raises(ValueError):
TVShow(imdb_id="tt0903747", title="X", seasons_count=-1, status="ended")
def test_has_audio_in_with_language(self):
lang = Language(iso="fre", english_name="French", native_name="Français",
aliases=("fr", "fra", "french"))
e = self._ep(
audio_tracks=[AudioTrack(0, "ac3", 6, "5.1", "fr")]
)
# str query "fre" wouldn't match "fr" directly — but Language does cross-format
assert e.has_audio_in(lang) is True
assert e.has_audio_in("fre") is False # direct compare misses
def test_invalid_imdb_id_type_raises(self):
with pytest.raises(ValueError):
TVShow(imdb_id=12345, title="X", seasons_count=1, status="ended") # type: ignore
def test_audio_languages_dedup_in_order(self):
e = self._ep(
audio_tracks=[
AudioTrack(0, "ac3", 6, "5.1", "eng"),
AudioTrack(1, "ac3", 6, "5.1", "fre"),
AudioTrack(2, "aac", 2, "stereo", "eng"), # dupe
AudioTrack(3, "aac", 2, "stereo", None), # skipped
]
)
assert e.audio_languages() == ["eng", "fre"]
def test_get_folder_name_replaces_spaces(self):
show = self._make(title="Breaking Bad")
assert show.get_folder_name() == "Breaking.Bad"
# ── Subtitle helpers ───────────────────────────────────────────────
def test_get_folder_name_strips_special_chars(self):
show = self._make(title="It's Always Sunny")
name = show.get_folder_name()
assert "'" not in name
def test_has_subtitles_in(self):
e = self._ep(
subtitle_tracks=[SubtitleTrack(0, "subrip", "fre")]
)
assert e.has_subtitles_in("fre") is True
assert e.has_subtitles_in("eng") is False
def test_str_repr(self):
show = self._make()
assert "Breaking Bad" in str(show)
assert "tt0903747" in repr(show)
def test_has_forced_subs(self):
e = self._ep(
subtitle_tracks=[
SubtitleTrack(0, "subrip", "eng", is_forced=False),
SubtitleTrack(1, "subrip", "eng", is_forced=True),
]
)
assert e.has_forced_subs() is True
def test_has_forced_subs_false_when_none(self):
e = self._ep(subtitle_tracks=[SubtitleTrack(0, "subrip", "eng")])
assert e.has_forced_subs() is False
def test_subtitle_languages_dedup_in_order(self):
e = self._ep(
subtitle_tracks=[
SubtitleTrack(0, "subrip", "eng"),
SubtitleTrack(1, "subrip", "fre"),
SubtitleTrack(2, "subrip", "eng"),
]
)
assert e.subtitle_languages() == ["eng", "fre"]
# ---------------------------------------------------------------------------
@@ -157,76 +241,226 @@ class TestTVShow:
class TestSeason:
def test_basic_creation(self):
s = Season(show_imdb_id="tt0903747", season_number=1, episode_count=7)
assert s.episode_count == 7
def _ep(self, episode: int) -> Episode:
return Episode(season_number=1, episode_number=episode, title=f"Ep {episode}")
def test_basic_creation_coerces_season_number(self):
s = Season(season_number=1)
assert isinstance(s.season_number, SeasonNumber)
assert s.episode_count == 0
assert s.episodes == {}
def test_get_folder_name_normal(self):
s = Season(show_imdb_id="tt0903747", season_number=2, episode_count=13)
assert s.get_folder_name() == "Season 02"
assert Season(season_number=2).get_folder_name() == "Season 02"
def test_get_folder_name_specials(self):
s = Season(show_imdb_id="tt0903747", season_number=0, episode_count=3)
s = Season(season_number=0)
assert s.get_folder_name() == "Specials"
assert s.is_special()
def test_negative_episode_count_raises(self):
def test_negative_aired_raises(self):
with pytest.raises(ValueError):
Season(show_imdb_id="tt0903747", season_number=1, episode_count=-1)
Season(season_number=1, aired_episodes=-1)
def test_str(self):
s = Season(
show_imdb_id="tt0903747",
season_number=1,
episode_count=7,
name="Pilot Season",
)
def test_aired_cannot_exceed_expected(self):
with pytest.raises(ValueError):
Season(season_number=1, expected_episodes=5, aired_episodes=6)
def test_add_episode_rejects_mismatched_season(self):
s = Season(season_number=1)
ep = Episode(season_number=2, episode_number=1, title="x")
with pytest.raises(ValueError):
s.add_episode(ep)
def test_add_episode_replaces_same_number(self):
s = Season(season_number=1)
s.add_episode(self._ep(1))
s.add_episode(Episode(season_number=1, episode_number=1, title="Replaced"))
assert s.episodes[EpisodeNumber(1)].title == "Replaced"
def test_str_uses_name_when_present(self):
s = Season(season_number=1, name="Pilot Season")
assert "Pilot Season" in str(s)
# ── Completeness vs aired ──────────────────────────────────────────
def test_is_complete_unknown_aired_is_false(self):
# Conservative: no aired count → cannot claim complete
s = Season(season_number=1)
s.add_episode(self._ep(1))
assert s.is_complete() is False
def test_is_complete_when_owning_all_aired(self):
s = Season(season_number=1, aired_episodes=3)
for i in (1, 2, 3):
s.add_episode(self._ep(i))
assert s.is_complete() is True
def test_is_complete_zero_aired_is_trivially_true(self):
s = Season(season_number=1, aired_episodes=0)
assert s.is_complete() is True
def test_partial_when_missing_aired_episodes(self):
s = Season(season_number=1, aired_episodes=3)
s.add_episode(self._ep(1))
assert s.is_complete() is False
def test_is_fully_aired(self):
s = Season(season_number=1, expected_episodes=10, aired_episodes=10)
assert s.is_fully_aired() is True
def test_is_fully_aired_false_when_in_flight(self):
s = Season(season_number=1, expected_episodes=10, aired_episodes=4)
assert s.is_fully_aired() is False
def test_is_fully_aired_false_with_unknowns(self):
assert Season(season_number=1).is_fully_aired() is False
def test_missing_episodes_when_partial(self):
s = Season(season_number=1, aired_episodes=5)
s.add_episode(self._ep(1))
s.add_episode(self._ep(3))
missing = [n.value for n in s.missing_episodes()]
assert missing == [2, 4, 5]
def test_missing_episodes_empty_when_complete(self):
s = Season(season_number=1, aired_episodes=2)
s.add_episode(self._ep(1))
s.add_episode(self._ep(2))
assert s.missing_episodes() == []
def test_missing_episodes_empty_when_unknown_aired(self):
# Without an aired count we cannot reason about gaps
s = Season(season_number=1)
s.add_episode(self._ep(2))
assert s.missing_episodes() == []
# ---------------------------------------------------------------------------
# Episode entity
# TVShow aggregate root
# ---------------------------------------------------------------------------
class TestEpisode:
class TestTVShow:
def _show(self, **kwargs) -> TVShow:
defaults = dict(
imdb_id="tt0903747",
title="Breaking Bad",
status="ended",
)
defaults.update(kwargs)
return TVShow(**defaults)
# ── Construction & coercion ────────────────────────────────────────
def test_basic_creation(self):
e = Episode(
show_imdb_id="tt0903747",
season_number=1,
episode_number=1,
title="Pilot",
)
assert e.title == "Pilot"
show = self._show(expected_seasons=5)
assert show.title == "Breaking Bad"
assert show.expected_seasons == 5
assert show.seasons == {}
assert show.seasons_count == 0
def test_get_filename_format(self):
e = Episode(
show_imdb_id="tt0903747",
season_number=1,
episode_number=5,
title="Gray Matter",
)
filename = e.get_filename()
assert filename.startswith("S01E05")
assert "Gray.Matter" in filename
def test_coerces_string_imdb_id(self):
assert isinstance(self._show().imdb_id, ImdbId)
def test_has_file_false_when_no_path(self):
e = Episode(
show_imdb_id="tt0903747",
season_number=1,
episode_number=1,
title="Pilot",
)
assert not e.has_file()
assert not e.is_downloaded()
def test_coerces_string_status(self):
assert self._show(status="ongoing").status == ShowStatus.ONGOING
def test_str_format(self):
e = Episode(
show_imdb_id="tt0903747",
season_number=2,
episode_number=3,
title="Bit by a Dead Bee",
)
s = str(e)
assert "S02E03" in s
assert "Bit by a Dead Bee" in s
def test_is_ongoing_and_is_ended(self):
assert self._show(status="ongoing").is_ongoing()
assert self._show(status="ended").is_ended()
def test_negative_expected_seasons_raises(self):
with pytest.raises(ValueError):
self._show(expected_seasons=-1)
def test_invalid_imdb_id_type_raises(self):
with pytest.raises(ValueError):
TVShow(imdb_id=12345, title="X", status="ended") # type: ignore
def test_get_folder_name_replaces_spaces(self):
assert self._show(title="Breaking Bad").get_folder_name() == "Breaking.Bad"
def test_get_folder_name_strips_special_chars(self):
name = self._show(title="It's Always Sunny").get_folder_name()
assert "'" not in name
def test_str_repr(self):
show = self._show()
assert "Breaking Bad" in str(show)
assert "tt0903747" in repr(show)
# ── add_episode — the only sanctioned mutation ─────────────────────
def test_add_episode_creates_missing_season(self):
show = self._show()
show.add_episode(Episode(season_number=1, episode_number=1, title="Pilot"))
assert SeasonNumber(1) in show.seasons
assert show.seasons_count == 1
assert show.episode_count == 1
def test_add_episode_reuses_existing_season(self):
show = self._show()
show.add_episode(Episode(season_number=1, episode_number=1, title="A"))
show.add_episode(Episode(season_number=1, episode_number=2, title="B"))
assert show.seasons_count == 1
assert show.episode_count == 2
def test_add_season_replaces_existing(self):
show = self._show()
s1 = Season(season_number=1, aired_episodes=10)
show.add_season(s1)
s1bis = Season(season_number=1, aired_episodes=5)
show.add_season(s1bis)
assert show.seasons[SeasonNumber(1)] is s1bis
# ── Collection status ──────────────────────────────────────────────
def test_collection_status_empty(self):
assert self._show().collection_status() == CollectionStatus.EMPTY
def test_collection_status_partial_missing_episode(self):
show = self._show()
s = Season(season_number=1, aired_episodes=3)
s.add_episode(Episode(season_number=1, episode_number=1, title="x"))
show.add_season(s)
assert show.collection_status() == CollectionStatus.PARTIAL
def test_collection_status_complete(self):
show = self._show(expected_seasons=1)
s = Season(season_number=1, aired_episodes=2)
for n in (1, 2):
s.add_episode(Episode(season_number=1, episode_number=n, title=f"e{n}"))
show.add_season(s)
assert show.collection_status() == CollectionStatus.COMPLETE
def test_collection_status_partial_when_seasons_missing(self):
# Seasons we own are complete, but expected_seasons says more exist.
show = self._show(expected_seasons=2)
s = Season(season_number=1, aired_episodes=1)
s.add_episode(Episode(season_number=1, episode_number=1, title="x"))
show.add_season(s)
assert show.collection_status() == CollectionStatus.PARTIAL
def test_is_complete_series_requires_ended_and_complete(self):
show = self._show(status="ongoing", expected_seasons=1)
s = Season(season_number=1, aired_episodes=1)
s.add_episode(Episode(season_number=1, episode_number=1, title="x"))
show.add_season(s)
# Ongoing → never "complete series" even if collection is COMPLETE
assert show.is_complete_series() is False
show.status = ShowStatus.ENDED
assert show.is_complete_series() is True
# ── missing_episodes traversal ─────────────────────────────────────
def test_missing_episodes_walks_seasons_in_order(self):
show = self._show()
s2 = Season(season_number=2, aired_episodes=2)
s1 = Season(season_number=1, aired_episodes=3)
s1.add_episode(Episode(season_number=1, episode_number=2, title="x"))
show.add_season(s2)
show.add_season(s1)
missing = [(s.value, e.value) for s, e in show.missing_episodes()]
assert missing == [(1, 1), (1, 3), (2, 1), (2, 2)]