Files

T

francwa 075a827b0e feat(release): wire v2 EASY path for known release groups

The annotate-based v2 pipeline now handles releases ending in -KONTRAST,
-ELiTE, or -RARBG. Unknown groups still fall through to the legacy
SHITTY heuristic in services.py — nothing changes for them.

Pipeline (alfred/domain/release/parser/pipeline.py):
- tokenize(): string-ops separator split, strips [site.tag] first.
- annotate(): right-to-left group detection (priority to codec-GROUP
  shape, fallback to any non-source dashed token), GroupSchema lookup
  via the kb port, then lockstep walk of tokens against schema chunks.
  Optional chunks skip on mismatch, mandatory mismatches return None so
  the caller falls back gracefully. CODEC pre-consumed by a codec-GROUP
  trailing token correctly skips the CODEC chunk in the body walk.
- assemble(): folds annotated tokens into a ParsedRelease-compatible
  dict (title joined by '.', group from the codec-GROUP token's extras).

Schema (alfred/domain/release/parser/schema.py):
- GroupSchema + SchemaChunk frozen value objects.
- TokenRole.GROUP added.

Port + adapter:
- ReleaseKnowledge.group_schema(name) lookup added (case-insensitive).
- YamlReleaseKnowledge loads alfred/knowledge/release/release_groups/
  *.yaml at construction time; learned overrides in
  data/knowledge/release/release_groups/ also picked up.

Knowledge:
- release_groups/kontrast.yaml, elite.yaml, rarbg.yaml declare the
  canonical chunk_order. ELiTE marks source as optional (Foundation.S02
  has no WEBRip token).

Services:
- parse_release tries the v2 path first; on None falls through to the
  legacy implementation untouched.

Tests:
- tests/domain/release/test_parser_v2_easy.py (10 cases) cover group
  detection (codec-GROUP, dashed-source skip, no-dash → unknown),
  schema-driven annotation (movie, TV episode, season pack with
  optional source, unknown group returns None), and field assembly.
- Existing tests/domain/test_release_fixtures.py (30 cases) stay green:
  5 EASY fixtures now produced by v2, 25 SHITTY/PATH OF PAIN fixtures
  still produced by the legacy path. Verified via spy on v2.assemble.

Suite: 1007 passed, 8 skipped.

Refs: project_release_parser_v2_specs (memory)

2026-05-20 00:21:11 +02:00

21 KiB

Raw Blame History

Changelog

All notable changes to Alfred are documented here.

The format is loosely based on Keep a Changelog. Alfred is not yet on SemVer — entries are grouped by dated work blocks instead of release numbers. Granularity targets behavioral or API-visible changes; refer to git log for commit-level detail.

Sections used per block: Added / Changed / Deprecated / Removed / Fixed / Internal (for tech-debt and refactor noise that doesn't affect callers).

[Unreleased]

Added

Release parser v2 — EASY path live (alfred/domain/release/parser/): new annotate-based pipeline (tokenize → annotate → assemble) drives releases from known groups. Exposes Token (frozen VO with index + role + extra), TokenRole enum (structural/technical/meta families), and GroupSchema / SchemaChunk value objects.
- pipeline.tokenize: string-ops separator split (no regex), strips a [site.tag] prefix/suffix first.
- pipeline.annotate: detects the trailing group right-to-left (priority to codec-GROUP shape, fallback to any non-source dashed token), looks up its GroupSchema, then walks tokens and schema chunks in lockstep — optional chunks that don't match are skipped, mandatory mismatches abort EASY and return None so the caller can fall back to SHITTY.
- pipeline.assemble: folds annotated tokens into a ParsedRelease-compatible dict.
- parse_release (in release.services) tries the v2 EASY path first and falls through to the legacy SHITTY heuristic on None. Legacy SHITTY/PATH OF PAIN behavior is unchanged.
- Knowledge: alfred/knowledge/release/release_groups/{kontrast,elite, rarbg}.yaml declare the canonical chunk order per group, loaded via new ReleaseKnowledge.group_schema(name) port method.
- Tests in tests/domain/release/test_parser_v2_{scaffolding,easy}.py cover token VOs, site-tag stripping, group detection, schema-driven annotation (movie, TV episode, season pack with optional source), and field assembly.
Real-world release fixtures under tests/fixtures/releases/{easy,shitty,path_of_pain}/, each documenting an expected ParsedRelease plus the future routing (library / torrents / seed_hardlinks) for the upcoming organize_media refactor. EASY bucket seeded with 5 cases (movie, single-episode, season pack, movie + noise, YTS bracket-heavy). SHITTY bucket seeded with 15 anti-regression cases covering: 3-level INTEGRALE hierarchy (Angel), French custom titles (Buffy, La Nuit au Musée, Chérie j'ai agrandi), multi-episode chain S14E09E10E11 (Archer, captures E11 loss), lowercase s01e01 (Notre Planète), NxNN with - separators (Vinyl, captures dash artifact), title-with-year-suffix (Deutschland.83), season-range S01-06 (Tatortreiniger, captures movie misclassification), bare folder name (Jurassic Park, media_type=unknown), apostrophe-in-name (Honey Don't, captures full AI-path degeneration), SUBS-tag movie (Hook), space separators (Predator Badlands, captures group=UNKNOWN), subs-only release (Westworld S04). PATH OF PAIN bucket seeded with 10 worst-case fixtures covering: UTF-8 wide pipe yt-dlp slug (Khruangbin), 3-show franchise box-set with double season range and parens-wrapped tech (Deutschland 83-86-89, captures group=S03 misdetection), accented chars in title (Chérie BéBé with VFF), 8-word stand-up comedy title (Jimmy Carr), site-tag prefix + XviD (OxTorrent), episode title + air-date silently lost (Prodiges), full-chaos apostrophe + spaces + Blu-ray dash + 1080i + multi-word audio codec (The Prodigy, full AI-path degeneration), yt-dlp YouTube ID glued to year (Sleaford Mods), bilingual [FR-EN] tag mistaken for group (Super Mario Bros), COMPLETE + S01-S07 range + REPACK + HEVC (Gilmore Girls, the well-behaved exception). Parametrized over tests/domain/test_release_fixtures.py for anti-regression.
NxNN alt season/episode form supported by parse_release. Releases like Show.1x05.720p.HDTV.x264-GRP and Show.2x07x08.1080p.WEB.x265-GRP (multi-ep alt form) now parse as TV shows.
alfred/knowledge/release/separators.yaml declares the token separators used by the release-name tokenizer (., , [, ], (, ), _). New conventions can be added without code changes. The canonical . is always present even if missing from YAML.

Changed

parse_release tokenizer is now data-driven: it splits on any character listed in separators.yaml (regex character class) instead of name.split("."). This makes YTS-style releases (The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]), space-separated names (Inception 2010 1080p BluRay x264-GROUP), and underscore-separated names parse correctly via the direct path — no more fallback through sanitization.
parse_release flow simplified: site-tag extraction always runs first (so parse_path == "sanitized" now reliably indicates a stripped [tag]), then well-formedness is checked only against truly forbidden chars (anything not in the configured separator set).
ISO 639-2/B is now the canonical language code project-wide (was a mix of 639-1 and 639-2/T):
- SubtitlePreferences.languages default is now ["fre", "eng"] (was ["fr", "en"]). Old LTM files are not auto-migrated — delete data/memory/ltm.json to regenerate with the new defaults.
- Subtitle output filenames are now {iso639_2b}.srt (e.g. fre.srt, fre.sdh.srt). Existing fr.srt files are still read correctly (recognized as French via alias) but new files are written canonically.
- Language value object docstring corrected: it has always stored 639-2/B (matching what ffprobe emits), not 639-2/T as previously documented.
MovieService.validate_movie_file minimum size is now configurable via settings.min_movie_size_bytes (default unchanged: 100 MB). Constructor accepts an optional min_movie_size_bytes override for tests.
SubtitleKnowledgeBase delegates language lookup to LanguageRegistry rather than duplicating tokens. subtitles.yaml now only declares subtitle-specific tokens (e.g. vostfr, vf, vff) under a new language_tokens section.

Removed

alfred/domain/tv_shows/services.py and alfred/domain/movies/services.py deleted entirely. They held fossil parsers (parse_episode_filename, extract_movie_metadata, …) with zero production callers — superseded by parse_release as the single source of truth for release-name parsing. Associated tests (tests/domain/test_movies.py, tests/domain/test_tv_shows_service.py) removed as well.
_sanitize and _normalize helpers in alfred/domain/release/services.py — the new tokenizer makes them redundant.
_LANG_KEYWORDS, _SDH_TOKENS, _FORCED_TOKENS, SUBTITLE_EXTENSIONS hardcoded dicts in alfred/domain/subtitles/scanner.py — all knowledge now lives in YAML (CLAUDE.md compliance).
_MIN_MOVIE_SIZE_BYTES module-level constant in alfred/domain/movies/services.py — replaced by the new setting.
Top-level languages: block in subtitles.yaml — superseded by language_tokens: (subtitle-specific only) since iso_languages.yaml is the canonical source.

Fixed

hi token no longer marks a subtitle as SDH (it conflicted with the ISO 639-1 alias for Hindi). SDH is now detected only via sdh, cc, and hearing tokens.
SubtitleKnowledgeBase default rules used "fra" while iso_languages.yaml exposes French as "fre" — preferred languages defaults now match the canonical form.

Internal

Domain I/O extraction (refactor/domain-io-extraction): the domain layer no longer performs subprocess calls, filesystem scans, or YAML loading. Achieved in a series of focused commits:
- Knowledge YAML loaders moved to infrastructure: alfred/domain/release/knowledge.py, alfred/domain/shared/knowledge/language_registry.py, and alfred/domain/subtitles/knowledge/{base,loader}.py relocated to alfred/infrastructure/knowledge/. Re-exports were dropped — callers import directly from the new location.
- MediaProber and FilesystemScanner Protocol ports introduced at alfred/domain/shared/ports/ with frozen-dataclass DTOs (SubtitleStreamInfo, FileEntry). SubtitleIdentifier and PatternDetector are now constructor-injected with concrete adapters (FfprobeMediaProber wrapping subprocess.run(ffprobe) and PathlibFilesystemScanner wrapping pathlib). No more direct subprocess/pathlib usage from the subtitle domain services.
- Live filesystem methods removed from VOs and entities: FilePath.exists() / .is_file() / .is_dir() deleted — FilePath is now a pure address VO. Movie.has_file() and Episode.is_downloaded() dropped. Callers either rely on a prior detection step or use try/except over pre-checks (eliminates TOCTOU races).
- SubtitlePlacer moved to the application layer at alfred/application/subtitles/placer.py — it performs os.link I/O, which doesn't belong in the domain. Pre-checks replaced with try/except for FileNotFoundError/FileExistsError.
- SubtitleRuleSet.resolve() no longer reaches into the knowledge base: the implicit DEFAULT_RULES() helper is gone, replaced by an explicit default_rules: SubtitleMatchingRules parameter. The ManageSubtitles use case loads defaults from the KB once and passes them in.
- SubtitleKnowledge Protocol port at alfred/domain/subtitles/ports/knowledge.py declares the read-only query surface domain services consume (7 methods: known_extensions, format_for_extension, language_for_token, is_known_lang_token, type_for_token, is_known_type_token, patterns). SubtitleIdentifier and PatternDetector depend on this Protocol instead of the concrete SubtitleKnowledgeBase from infrastructure — domain/subtitles/ now has zero imports from infrastructure/. The remaining domain → infra leak (domain/release/ loading separator YAML at import-time) is documented in tech-debt and scheduled for its own branch.
to_dot_folder_name(title) helper in alfred/domain/shared/value_objects.py — extracts the re.sub(r"[^\w\s\.\-]", "", title).replace(" ", ".") pattern that was duplicated between MovieTitle.normalized() and TVShow.get_folder_name().
ParsedRelease.languages uses field(default_factory=list) instead of a manual __post_init__ that assigned [] via object.__setattr__.
file_extensions.yaml splits subtitle sidecars (.srt, .sub, .idx, .ass, .ssa) into a dedicated subtitle: category instead of lumping them under metadata:. The _METADATA_EXTENSIONS set used by detect_media_type remains the union of both (same behavior — subtitles are still ignored when deciding the media type of a folder), but a new load_subtitle_extensions() loader is now available for the subtitles domain. Sematic clarity, no functional change.
tv_shows/entities.py module docstring now shows the aggregate ownership as an ASCII tree before the rule text — quicker visual scan of the DDD structure.
Removed backward-compat shims _sanitise_for_fs / _strip_episode_from_normalised from domain/release/value_objects.py (zero callers).
Cleaned ruff warnings across the codebase: subprocess.run calls now pass explicit check=False (PLW1510); lazy imports promoted to module top where there was no cycle (PLC0415 in manage_subtitles.py, placer.py, qbittorrent/client.py, file_manager.py); fixed module-level import ordering (E402) in language_registry.py and subtitles/knowledge/loader.py; removed unused locals (F841 / B007); replaced unnecessary set comprehension with set() in release/knowledge.py (C416).
Ruff config: ignore PLR0911 / PLR0912 (too-many-returns / too-many-branches) globally — noisy on parser mappers and orchestrator use-cases where early-return validation is essential complexity. Ignore PLW0603 for the documented memory singleton (infrastructure/persistence/context.py).
Release-knowledge DDD purification (refactor/domain-release-knowledge): the last domain → infrastructure leak (domain/release/value_objects.py loading YAML at import-time) is gone. Achieved via:
- ReleaseKnowledge Protocol port at alfred/domain/release/ports/knowledge.py declares the read-only query surface release parsing needs (token sets for resolutions, sources, codecs, languages, hdr extras; structured dicts for audio, video_meta, editions, media_type_tokens; separators list; file-extension sets used by application/infra callers; sanitize_for_fs(text) method).
- YamlReleaseKnowledge adapter at alfred/infrastructure/knowledge/release_kb.py loads every YAML constant once at construction. Builds an immutable str.maketrans translation table for filesystem sanitization.
- parse_release(name, kb) takes the knowledge as an explicit parameter — no more module-level YAML loading inside the domain. Every internal helper (_tokenize, _extract_tech, _extract_languages, _extract_audio, _extract_video_meta, _extract_edition, _extract_title, _infer_media_type, _is_well_formed) takes kb.
- ParsedRelease Option B: sanitization happens once at parse time and is stored on a new title_sanitized: str field. Builder methods (show_folder_name, season_folder_name, episode_filename, movie_folder_name, movie_filename) are now pure — they accept already-sanitized tmdb_title_safe / tmdb_episode_title_safe arguments. Callers at the use-case boundary sanitize TMDB strings via kb.sanitize_for_fs(...) before passing them in.
- All domain-knowledge constants removed from value_objects.py: _RESOLUTIONS, _SOURCES, _CODECS, _AUDIO, _VIDEO_META, _EDITIONS, _HDR_EXTRA, _MEDIA_TYPE_TOKENS, _LANGUAGE_TOKENS, _FORBIDDEN_CHARS, _VIDEO_EXTENSIONS, _NON_VIDEO_EXTENSIONS, _SUBTITLE_EXTENSIONS, _METADATA_EXTENSIONS, _WIN_FORBIDDEN_TABLE, and the _sanitize_for_fs helper. The domain module is now pure.
- Application-layer KB singleton: resolve_destination.py instantiates a module-level _KB: ReleaseKnowledge = YamlReleaseKnowledge() and threads it through every parse_release(...) call. The local _sanitize helper and _WIN_FORBIDDEN regex were dropped in favor of _KB.sanitize_for_fs(...).
- detect_media_type(parsed, source_path, kb) and find_video_file(path, kb) now take the knowledge explicitly instead of importing _*_EXTENSIONS constants from the domain. agent/tools/filesystem.py::analyze_release imports the application KB singleton and passes it through.

[2026-05-17] — TVShow & Movie aggregate refactor

Multi-phase refonte of the TV show domain into a real DDD aggregate, with matching parity work on Movie, a language knowledge system, and the shared/media restructure that supports both.

Added

Language knowledge system (alfred/knowledge/iso_languages.yaml + 42 languages including und for undetermined).
- Language value object (frozen dataclass) with iso, english_name, native_name, aliases, and a matches(raw) cross-format helper.
- LanguageRegistry loader (alfred/domain/shared/knowledge/) merging builtin + learned YAML. Not a singleton — the application layer instantiates it.
- ISO 639-2/B is the canonical key; aliases cover 639-1, 639-2/T, English name, native name, and common spellings.
VideoTrack dataclass (alfred/domain/shared/media/video.py) with a resolution property using width-priority bucket detection (handles cinema/scope crops like 1920×960 → 1080p).
shared/media/matching.py — track_lang_matches helper shared by Episode and Movie. Implements the "C+" contract for language helpers:
- Language query → cross-format match via Language.matches()
- str query → case-insensitive direct comparison (no normalization)
TVShow aggregate composition:
- TVShow.seasons: dict[SeasonNumber, Season]
- Season.episodes: dict[EpisodeNumber, Episode]
- Season.expected_episodes / Season.aired_episodes (split so collection state can compare "owned vs aired today" without confusing in-flight seasons with future ones)
Aggregate methods on TVShow:
- add_episode(ep) — sole sanctioned mutation entry point (creates the season if missing)
- add_season(season) — replaces a season wholesale
- collection_status() → CollectionStatus.{EMPTY, PARTIAL, COMPLETE}
- is_complete_series() — true iff ENDED + COMPLETE
- missing_episodes() — flat list of all aired-but-not-owned (season, episode) pairs
CollectionStatus enum (orthogonal to ShowStatus).
Episode track helpers (has_audio_in, has_subtitles_in, has_forced_subs, audio_languages, subtitle_languages), driven by Episode.audio_tracks / Episode.subtitle_tracks.
Movie aggregate parity — Movie now carries audio_tracks / subtitle_tracks and exposes the same helpers as Episode (same C+ contract).
CHANGELOG.md (this file).

Changed

shared/media_info.py exploded into shared/media/{audio,video,subtitle,info,matching}.py. MediaInfo is now symmetric: every stream type is a list[Track]. Flat accessors (width, height, video_codec, resolution) remain as properties that read the first video track.
MediaInfo.duration_seconds / bitrate_kbps moved from VideoTrack to MediaInfo (file-level — they come from the ffprobe format block, not a stream). Files without a video stream now correctly expose duration.
ShowStatus.from_string extended to map TMDB strings (Returning Series, In Production, Pilot, Planned, Canceled, Cancelled). Comparison is whitespace-trimmed and case-insensitive.
Season / Episode dropped their show_imdb_id back-references. They are owned by TVShow and reached only through it.
TVShow.seasons_count and episode_count are now @property (computed from the dict) instead of stored ints.
TVShowService.parse_episode_from_filename rewritten in string operations (no regex). Supports S01E05 / s1e5 and 1x05 / 01x5 forms.
TVShowService.find_next_episode now drives off show.missing_episodes() instead of the hardcoded "max 50 episodes per season" heuristic.
TVShowService constructor no longer takes season_repository / episode_repository — the aggregate persists in one block via TVShowRepository only.
SubtitleTrack in alfred.domain.subtitles.entities renamed to SubtitleCandidate. Coexists with the shared.media.SubtitleTrack ffprobe-view dataclass (different bounded contexts, kept separate intentionally).
tv_shows/services.py _VIDEO_EXTENSIONS now loaded from knowledge/release/file_extensions.yaml via load_video_extensions() (single source of truth).
CLAUDE.md updated with three new policy sections:
- "Tests" — small updates OK during normal work, no mass-update sprees
- "Backwards-compatibility shims" — prefer clean migration over shims
- "Regex" — not forbidden, use judgment when string ops would be fragile

Removed

Legacy Season N Episode N filename form in TVShowService.parse_episode_from_filename. It never appears in the release names Alfred handles, and supporting it forced a regex.
SeasonRepository and EpisodeRepository — only the aggregate root has a repository (DDD rule: one repo per aggregate).
shared/media_info.py compatibility shim — callers updated.
SubtitleTrack compatibility alias in subtitles.entities — callers updated to SubtitleCandidate.

Fixed

MediaInfo.duration_seconds returns None on audio-only files instead of crashing through primary_video.duration_seconds (see the duration/bitrate move under Changed).
MediaOrganizer (infrastructure/filesystem/organizer.py) no longer passes the removed show_imdb_id / episode_count kwargs when constructing a Season for folder-name generation.

Internal

Test suite rewritten where the aggregate redesign broke fixtures: tests/domain/test_tv_shows.py (69 tests), tests/domain/test_media_info.py (rewritten for VideoTrack), tests/application/test_enrich_from_probe.py (helper added), tests/infrastructure/test_filesystem_extras.py (fixtures), tests/domain/test_tv_shows_service.py (find_next_episode driven by real aggregate state).
Subtitle services internal migration: matcher.py, utils.py, placer.py, identifier.py updated to import SubtitleCandidate.
Suite status at end of block: 1066 passed, 8 skipped, 0 failed.

21 KiB Raw Blame History Unescape Escape

Changelog

[Unreleased]

Added

Changed

Removed

Fixed

Internal

[2026-05-17] — TVShow & Movie aggregate refactor

Added

Changed

Removed

Fixed

Internal

21 KiB

Raw Blame History