Files

T

francwa df798f55cc refactor(subtitles): introduce SubtitleKnowledge Protocol port

Domain services (SubtitleIdentifier, PatternDetector) used to import the
concrete SubtitleKnowledgeBase class directly from infrastructure for
their type hint. With this commit they depend on a structural Protocol
in alfred/domain/subtitles/ports/knowledge.py declaring just the 7
read-only query methods the domain actually consumes.

The concrete YAML-backed SubtitleKnowledgeBase in infrastructure remains
the sole adapter — no rename, no shim. With this change
alfred/domain/subtitles/ has zero imports from alfred/infrastructure/.

Also extend the changelog entry covering the full domain-io-extraction
branch.

2026-05-19 15:15:43 +02:00

17 KiB

Raw Blame History

Changelog

All notable changes to Alfred are documented here.

The format is loosely based on Keep a Changelog. Alfred is not yet on SemVer — entries are grouped by dated work blocks instead of release numbers. Granularity targets behavioral or API-visible changes; refer to git log for commit-level detail.

Sections used per block: Added / Changed / Deprecated / Removed / Fixed / Internal (for tech-debt and refactor noise that doesn't affect callers).

[Unreleased]

Added

Real-world release fixtures under tests/fixtures/releases/{easy,shitty,path_of_pain}/, each documenting an expected ParsedRelease plus the future routing (library / torrents / seed_hardlinks) for the upcoming organize_media refactor. EASY bucket seeded with 5 cases (movie, single-episode, season pack, movie + noise, YTS bracket-heavy). SHITTY bucket seeded with 15 anti-regression cases covering: 3-level INTEGRALE hierarchy (Angel), French custom titles (Buffy, La Nuit au Musée, Chérie j'ai agrandi), multi-episode chain S14E09E10E11 (Archer, captures E11 loss), lowercase s01e01 (Notre Planète), NxNN with - separators (Vinyl, captures dash artifact), title-with-year-suffix (Deutschland.83), season-range S01-06 (Tatortreiniger, captures movie misclassification), bare folder name (Jurassic Park, media_type=unknown), apostrophe-in-name (Honey Don't, captures full AI-path degeneration), SUBS-tag movie (Hook), space separators (Predator Badlands, captures group=UNKNOWN), subs-only release (Westworld S04). PATH OF PAIN bucket seeded with 10 worst-case fixtures covering: UTF-8 wide pipe yt-dlp slug (Khruangbin), 3-show franchise box-set with double season range and parens-wrapped tech (Deutschland 83-86-89, captures group=S03 misdetection), accented chars in title (Chérie BéBé with VFF), 8-word stand-up comedy title (Jimmy Carr), site-tag prefix + XviD (OxTorrent), episode title + air-date silently lost (Prodiges), full-chaos apostrophe + spaces + Blu-ray dash + 1080i + multi-word audio codec (The Prodigy, full AI-path degeneration), yt-dlp YouTube ID glued to year (Sleaford Mods), bilingual [FR-EN] tag mistaken for group (Super Mario Bros), COMPLETE + S01-S07 range + REPACK + HEVC (Gilmore Girls, the well-behaved exception). Parametrized over tests/domain/test_release_fixtures.py for anti-regression.
NxNN alt season/episode form supported by parse_release. Releases like Show.1x05.720p.HDTV.x264-GRP and Show.2x07x08.1080p.WEB.x265-GRP (multi-ep alt form) now parse as TV shows.
alfred/knowledge/release/separators.yaml declares the token separators used by the release-name tokenizer (., , [, ], (, ), _). New conventions can be added without code changes. The canonical . is always present even if missing from YAML.

Changed

parse_release tokenizer is now data-driven: it splits on any character listed in separators.yaml (regex character class) instead of name.split("."). This makes YTS-style releases (The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]), space-separated names (Inception 2010 1080p BluRay x264-GROUP), and underscore-separated names parse correctly via the direct path — no more fallback through sanitization.
parse_release flow simplified: site-tag extraction always runs first (so parse_path == "sanitized" now reliably indicates a stripped [tag]), then well-formedness is checked only against truly forbidden chars (anything not in the configured separator set).
ISO 639-2/B is now the canonical language code project-wide (was a mix of 639-1 and 639-2/T):
- SubtitlePreferences.languages default is now ["fre", "eng"] (was ["fr", "en"]). Old LTM files are not auto-migrated — delete data/memory/ltm.json to regenerate with the new defaults.
- Subtitle output filenames are now {iso639_2b}.srt (e.g. fre.srt, fre.sdh.srt). Existing fr.srt files are still read correctly (recognized as French via alias) but new files are written canonically.
- Language value object docstring corrected: it has always stored 639-2/B (matching what ffprobe emits), not 639-2/T as previously documented.
MovieService.validate_movie_file minimum size is now configurable via settings.min_movie_size_bytes (default unchanged: 100 MB). Constructor accepts an optional min_movie_size_bytes override for tests.
SubtitleKnowledgeBase delegates language lookup to LanguageRegistry rather than duplicating tokens. subtitles.yaml now only declares subtitle-specific tokens (e.g. vostfr, vf, vff) under a new language_tokens section.

Removed

alfred/domain/tv_shows/services.py and alfred/domain/movies/services.py deleted entirely. They held fossil parsers (parse_episode_filename, extract_movie_metadata, …) with zero production callers — superseded by parse_release as the single source of truth for release-name parsing. Associated tests (tests/domain/test_movies.py, tests/domain/test_tv_shows_service.py) removed as well.
_sanitize and _normalize helpers in alfred/domain/release/services.py — the new tokenizer makes them redundant.
_LANG_KEYWORDS, _SDH_TOKENS, _FORCED_TOKENS, SUBTITLE_EXTENSIONS hardcoded dicts in alfred/domain/subtitles/scanner.py — all knowledge now lives in YAML (CLAUDE.md compliance).
_MIN_MOVIE_SIZE_BYTES module-level constant in alfred/domain/movies/services.py — replaced by the new setting.
Top-level languages: block in subtitles.yaml — superseded by language_tokens: (subtitle-specific only) since iso_languages.yaml is the canonical source.

Fixed

hi token no longer marks a subtitle as SDH (it conflicted with the ISO 639-1 alias for Hindi). SDH is now detected only via sdh, cc, and hearing tokens.
SubtitleKnowledgeBase default rules used "fra" while iso_languages.yaml exposes French as "fre" — preferred languages defaults now match the canonical form.

Internal

Domain I/O extraction (refactor/domain-io-extraction): the domain layer no longer performs subprocess calls, filesystem scans, or YAML loading. Achieved in a series of focused commits:
- Knowledge YAML loaders moved to infrastructure: alfred/domain/release/knowledge.py, alfred/domain/shared/knowledge/language_registry.py, and alfred/domain/subtitles/knowledge/{base,loader}.py relocated to alfred/infrastructure/knowledge/. Re-exports were dropped — callers import directly from the new location.
- MediaProber and FilesystemScanner Protocol ports introduced at alfred/domain/shared/ports/ with frozen-dataclass DTOs (SubtitleStreamInfo, FileEntry). SubtitleIdentifier and PatternDetector are now constructor-injected with concrete adapters (FfprobeMediaProber wrapping subprocess.run(ffprobe) and PathlibFilesystemScanner wrapping pathlib). No more direct subprocess/pathlib usage from the subtitle domain services.
- Live filesystem methods removed from VOs and entities: FilePath.exists() / .is_file() / .is_dir() deleted — FilePath is now a pure address VO. Movie.has_file() and Episode.is_downloaded() dropped. Callers either rely on a prior detection step or use try/except over pre-checks (eliminates TOCTOU races).
- SubtitlePlacer moved to the application layer at alfred/application/subtitles/placer.py — it performs os.link I/O, which doesn't belong in the domain. Pre-checks replaced with try/except for FileNotFoundError/FileExistsError.
- SubtitleRuleSet.resolve() no longer reaches into the knowledge base: the implicit DEFAULT_RULES() helper is gone, replaced by an explicit default_rules: SubtitleMatchingRules parameter. The ManageSubtitles use case loads defaults from the KB once and passes them in.
- SubtitleKnowledge Protocol port at alfred/domain/subtitles/ports/knowledge.py declares the read-only query surface domain services consume (7 methods: known_extensions, format_for_extension, language_for_token, is_known_lang_token, type_for_token, is_known_type_token, patterns). SubtitleIdentifier and PatternDetector depend on this Protocol instead of the concrete SubtitleKnowledgeBase from infrastructure — domain/subtitles/ now has zero imports from infrastructure/. The remaining domain → infra leak (domain/release/ loading separator YAML at import-time) is documented in tech-debt and scheduled for its own branch.
to_dot_folder_name(title) helper in alfred/domain/shared/value_objects.py — extracts the re.sub(r"[^\w\s\.\-]", "", title).replace(" ", ".") pattern that was duplicated between MovieTitle.normalized() and TVShow.get_folder_name().
ParsedRelease.languages uses field(default_factory=list) instead of a manual __post_init__ that assigned [] via object.__setattr__.
file_extensions.yaml splits subtitle sidecars (.srt, .sub, .idx, .ass, .ssa) into a dedicated subtitle: category instead of lumping them under metadata:. The _METADATA_EXTENSIONS set used by detect_media_type remains the union of both (same behavior — subtitles are still ignored when deciding the media type of a folder), but a new load_subtitle_extensions() loader is now available for the subtitles domain. Sematic clarity, no functional change.
tv_shows/entities.py module docstring now shows the aggregate ownership as an ASCII tree before the rule text — quicker visual scan of the DDD structure.
Removed backward-compat shims _sanitise_for_fs / _strip_episode_from_normalised from domain/release/value_objects.py (zero callers).
Cleaned ruff warnings across the codebase: subprocess.run calls now pass explicit check=False (PLW1510); lazy imports promoted to module top where there was no cycle (PLC0415 in manage_subtitles.py, placer.py, qbittorrent/client.py, file_manager.py); fixed module-level import ordering (E402) in language_registry.py and subtitles/knowledge/loader.py; removed unused locals (F841 / B007); replaced unnecessary set comprehension with set() in release/knowledge.py (C416).
Ruff config: ignore PLR0911 / PLR0912 (too-many-returns / too-many-branches) globally — noisy on parser mappers and orchestrator use-cases where early-return validation is essential complexity. Ignore PLW0603 for the documented memory singleton (infrastructure/persistence/context.py).

[2026-05-17] — TVShow & Movie aggregate refactor

Multi-phase refonte of the TV show domain into a real DDD aggregate, with matching parity work on Movie, a language knowledge system, and the shared/media restructure that supports both.

Added

Language knowledge system (alfred/knowledge/iso_languages.yaml + 42 languages including und for undetermined).
- Language value object (frozen dataclass) with iso, english_name, native_name, aliases, and a matches(raw) cross-format helper.
- LanguageRegistry loader (alfred/domain/shared/knowledge/) merging builtin + learned YAML. Not a singleton — the application layer instantiates it.
- ISO 639-2/B is the canonical key; aliases cover 639-1, 639-2/T, English name, native name, and common spellings.
VideoTrack dataclass (alfred/domain/shared/media/video.py) with a resolution property using width-priority bucket detection (handles cinema/scope crops like 1920×960 → 1080p).
shared/media/matching.py — track_lang_matches helper shared by Episode and Movie. Implements the "C+" contract for language helpers:
- Language query → cross-format match via Language.matches()
- str query → case-insensitive direct comparison (no normalization)
TVShow aggregate composition:
- TVShow.seasons: dict[SeasonNumber, Season]
- Season.episodes: dict[EpisodeNumber, Episode]
- Season.expected_episodes / Season.aired_episodes (split so collection state can compare "owned vs aired today" without confusing in-flight seasons with future ones)
Aggregate methods on TVShow:
- add_episode(ep) — sole sanctioned mutation entry point (creates the season if missing)
- add_season(season) — replaces a season wholesale
- collection_status() → CollectionStatus.{EMPTY, PARTIAL, COMPLETE}
- is_complete_series() — true iff ENDED + COMPLETE
- missing_episodes() — flat list of all aired-but-not-owned (season, episode) pairs
CollectionStatus enum (orthogonal to ShowStatus).
Episode track helpers (has_audio_in, has_subtitles_in, has_forced_subs, audio_languages, subtitle_languages), driven by Episode.audio_tracks / Episode.subtitle_tracks.
Movie aggregate parity — Movie now carries audio_tracks / subtitle_tracks and exposes the same helpers as Episode (same C+ contract).
CHANGELOG.md (this file).

Changed

shared/media_info.py exploded into shared/media/{audio,video,subtitle,info,matching}.py. MediaInfo is now symmetric: every stream type is a list[Track]. Flat accessors (width, height, video_codec, resolution) remain as properties that read the first video track.
MediaInfo.duration_seconds / bitrate_kbps moved from VideoTrack to MediaInfo (file-level — they come from the ffprobe format block, not a stream). Files without a video stream now correctly expose duration.
ShowStatus.from_string extended to map TMDB strings (Returning Series, In Production, Pilot, Planned, Canceled, Cancelled). Comparison is whitespace-trimmed and case-insensitive.
Season / Episode dropped their show_imdb_id back-references. They are owned by TVShow and reached only through it.
TVShow.seasons_count and episode_count are now @property (computed from the dict) instead of stored ints.
TVShowService.parse_episode_from_filename rewritten in string operations (no regex). Supports S01E05 / s1e5 and 1x05 / 01x5 forms.
TVShowService.find_next_episode now drives off show.missing_episodes() instead of the hardcoded "max 50 episodes per season" heuristic.
TVShowService constructor no longer takes season_repository / episode_repository — the aggregate persists in one block via TVShowRepository only.
SubtitleTrack in alfred.domain.subtitles.entities renamed to SubtitleCandidate. Coexists with the shared.media.SubtitleTrack ffprobe-view dataclass (different bounded contexts, kept separate intentionally).
tv_shows/services.py _VIDEO_EXTENSIONS now loaded from knowledge/release/file_extensions.yaml via load_video_extensions() (single source of truth).
CLAUDE.md updated with three new policy sections:
- "Tests" — small updates OK during normal work, no mass-update sprees
- "Backwards-compatibility shims" — prefer clean migration over shims
- "Regex" — not forbidden, use judgment when string ops would be fragile

Removed

Legacy Season N Episode N filename form in TVShowService.parse_episode_from_filename. It never appears in the release names Alfred handles, and supporting it forced a regex.
SeasonRepository and EpisodeRepository — only the aggregate root has a repository (DDD rule: one repo per aggregate).
shared/media_info.py compatibility shim — callers updated.
SubtitleTrack compatibility alias in subtitles.entities — callers updated to SubtitleCandidate.

Fixed

MediaInfo.duration_seconds returns None on audio-only files instead of crashing through primary_video.duration_seconds (see the duration/bitrate move under Changed).
MediaOrganizer (infrastructure/filesystem/organizer.py) no longer passes the removed show_imdb_id / episode_count kwargs when constructing a Season for folder-name generation.

Internal

Test suite rewritten where the aggregate redesign broke fixtures: tests/domain/test_tv_shows.py (69 tests), tests/domain/test_media_info.py (rewritten for VideoTrack), tests/application/test_enrich_from_probe.py (helper added), tests/infrastructure/test_filesystem_extras.py (fixtures), tests/domain/test_tv_shows_service.py (find_next_episode driven by real aggregate state).
Subtitle services internal migration: matcher.py, utils.py, placer.py, identifier.py updated to import SubtitleCandidate.
Suite status at end of block: 1066 passed, 8 skipped, 0 failed.

17 KiB Raw Blame History Unescape Escape

Changelog

[Unreleased]

Added

Changed

Removed

Fixed

Internal

[2026-05-17] — TVShow & Movie aggregate refactor

Added

Changed

Removed

Fixed

Internal

17 KiB

Raw Blame History