73 Commits

Author SHA1 Message Date
francwa 745dec39f5 FINAL COMMIT BEFORE REWRITE 2026-05-26 21:45:11 +02:00
francwa 42fa6139ed refactor(tools): wire filesystem tools to new use cases, drop broken ones
Updates alfred/agent/tools/filesystem.py to use the five free-function
use cases introduced in the previous commit:

  - list_folder            -> list_dir_use_case(Path(path), roots)
  - create_directory (new) -> create_dir_use_case(Path(path), roots)
  - move_media             -> move_file_use_case(src, dst, roots)
  - move_to_destination    -> create_dir_use_case(dst.parent) + move_file

A module-level _load_directory_roots() helper reads memory once per
call and builds the DirectoryRoots VO; missing roots produce an
explicit 'roots_not_configured' error.

Tools whose backing code was moved to *_OLD files are removed entirely
rather than left broken: manage_subtitles, set_path_for_folder,
create_seed_links, and the four resolve_*_destination tools. They will
come back when the matching application/domain code is rebuilt later
on this branch.

alfred/agent/tools/__init__.py shrinks accordingly. find_media_imdb_id
(already broken before this branch — name not exported by tools.api)
is dropped from the package re-exports so the package imports cleanly
again.
2026-05-26 19:46:49 +02:00
francwa 2df7843d8b refactor(filesystem): split into 5 atomic free-function ops + use cases
Replaces the monolithic FileManager class + scattered helpers in
alfred/infrastructure/filesystem with five free functions, each
single-responsibility and pathlib-native:

  list_dir / create_dir / link_file / move_file / move_dir

The infra layer now raises typed exceptions (FilesystemError base
+ SourceNotFound / DestinationExists / NotADirectory / NotAFile /
PermissionDenied / CrossDevice / FilesystemOSError) instead of
returning {status: ok|error} dicts. No more get_memory() reads
from infra.

Application layer mirrors the same split: five free use cases
(<op>_use_case) wrap each infra op, guard inputs against escaping
the new DirectoryRoots VO (downloads / torrents / movies /
tv_shows), catch infra exceptions, and return frozen DTOs. Roots
are injected — no global state.

Legacy files kept on disk with _OLD suffix for reference during
the follow-up rewiring (FileManager, MediaOrganizer,
create_folder/move helpers; CreateSeedLinks/ListFolder/MoveMedia/
ManageSubtitles use cases, resolve_destination). They are no
longer exported from __init__, which intentionally breaks current
agent tool wrappers and downstream tests — re-wiring is the next
chunk of work on the unfuck branch.
2026-05-26 19:22:09 +02:00
francwa 28304bb162 fix(releases): repair singular 'release' imports in parser
The CHOP CHOP CHOP pass left parser/{pipeline,scoring,services}.py
importing from alfred.domain.release.value_objects (singular), which
does not exist. parse_release was unimportable; all release tests
errored at collection.

Point the 3 imports at value_objects_old_question_mark.py, which still
holds ParsedRelease/ParseReport/MediaTypeToken/TokenizationRoute. The
file name is misleading (it is not 'old' — it is the active parser VO);
naming will be resolved when ParsedRelease itself is replaced. Tracked
in .claude/specs/unfuck_technical_debt.md #4.
2026-05-26 06:55:30 +02:00
francwa c62ae81275 refactor(tmdb): ACL pass — push VOs into DTOs, split search per media type
Anti-corruption boundary tightened on the TMDB adapter:

* TmdbMovieInfo / TmdbShowInfo now carry domain VOs (TmdbId, ImdbId,
  MovieTitle, ReleaseYear, ShowStatus) instead of raw scalars —
  validation happens at the boundary, not three layers later.
* ShowStatus enum added (domain/tv_shows/value_objects) with a
  from_tmdb() mapper that falls back to UNKNOWN + logs a warning on
  unrecognized values. TVShow.status is now ShowStatus, not str.
* MovieTitle cap raised from 100 to 150 chars.
* MediaResult / ExternalIds dropped. Replaced by per-media search
  DTOs: TmdbMovieSearchResult and TmdbShowSearchResult. Neither
  carries imdb_id — search no longer enriches with external_ids
  (callers needing imdb_id follow up with get_movie_info /
  get_tv_show_info on the chosen tmdb_id).
* TMDBClient: search_multi / search_media / _parse_result removed.
  search_movies (/search/movie) and search_shows (/search/tv) added,
  each parsing hits into VO-typed DTOs.
* SearchMovieUseCase returns a list of MovieHit (flattened to
  primitives for the agent). New symmetric SearchShowUseCase +
  ShowHit / SearchShowResponse DTOs.
* agent/tools/api.py: find_media_imdb_id → search_movies +
  search_shows wrappers.
* FileEntry moved from domain/shared/ports/filesystem_scanner.py to
  domain/shared/file_entry.py (it's a DTO, not a Protocol); size_kb
  (float) → size (int bytes). Scanner and SubtitleIdentifier
  updated.

Tests: 79/79 pass on tests/infrastructure/api/ +
tests/application/test_search_movie.py +
tests/application/test_search_show.py.
2026-05-26 05:54:58 +02:00
francwa cffafa2e60 CHOP CHOP CHOP 2026-05-26 05:45:07 +02:00
francwa b3abad4da4 chore(dot_alfred): Phase 5 cleanup + changelog
Delete the orphan MediaWithTracks mixin (and its only consumer,
track_lang_matches) from alfred.domain.shared.media — zero callers
since the v2 aggregates landed in Phase 3, parked for the Phase 5
decision in CHANGELOG.

Cleanup sweep across alfred/ and tests/ returns zero hits for:
* MediaWithTracks
* the v1 dot_alfred symbols
* the v1 sidecar names
* the alfred.application.library package
The v2 surface is the only one left.

CHANGELOG updated with:
* the Phase 5 sync orchestrators (sync_show / sync_movie),
* the Phase 4b PACK vs EPISODIC fix (Fixed section),
* the MediaWithTracks deletion in Removed,
* refreshed suite count (1277 passing).
2026-05-26 00:55:17 +02:00
francwa 7ff2e6bc4e feat(movies): sync_movie populates library index from TMDB
Parallel to sync_show. Calls TMDBClient.get_movie_info,
combines the TmdbMovieInfo with the on-disk MovieRelease loaded
via DotAlfredMovieReleaseRepository.load_by_tmdb_id, and upserts
into DotAlfredMovieLibraryIndex.

Policy mirrors sync_show with two adaptations specific to movies:
* placeholder signature is name == metadata.path (auto-heal writes
  them equal — the schema requires name to be non-empty so we can't
  use name == "" as the spec originally suggested),
* when the per-movie sidecar is gone but the index entry remains,
  sync warns and returns the existing entry unchanged (no upsert
  possible without a release: index.upsert requires folder/imdb_id
  from the MovieRelease itself).

Raises MovieNotFoundInLibrary when neither index nor sidecar
carry tmdb_id.
2026-05-26 00:51:43 +02:00
francwa 8f31f880aa feat(tv_shows): sync_show populates library index from TMDB
New orchestrator alfred.application.tv_shows.sync.sync_show calls
TMDBClient.get_tv_show_info, combines the response with the on-disk
release loaded via DotAlfredSeriesReleaseRepository.load_by_tmdb_id,
and upserts the result into DotAlfredTVShowLibraryIndex.

Policy:
* placeholders (auto-healed entries, status=="unknown") always
  refresh regardless of TTL,
* fresh entries within Settings.tmdb_cache_ttl_days are no-ops,
* stale entries past TTL refresh,
* force=True overrides both gates,
* indexed shows whose per-show sidecar is gone still get a fresh
  TMDB pass — slot map clears until rescan repopulates it,
* truly absent shows raise ShowNotFoundInLibrary from the new
  alfred.application.exceptions module.
2026-05-26 00:49:00 +02:00
francwa 1efe9a82c1 feat(dot_alfred): load_by_tmdb_id on release repos
Series repo returns (release, folder) so the upcoming sync
orchestrator can feed the library index's upsert(..., path=...).
Movie repo returns the release alone (folder is on release.folder
by the one-folder-one-file convention) — kept as a semantic alias
of find_by_tmdb_id for symmetry with the series side.
2026-05-26 00:45:14 +02:00
francwa 0dc053881a feat(tmdb): add TmdbMovieInfo DTO and get_movie_info
Symmetric to TmdbShowInfo / get_tv_show_info — gives the upcoming
sync_movie orchestrator a typed cache snapshot for the v2 movie
library index.

* TmdbMovieInfo(tmdb_id, imdb_id, title, release_year)
* parse_movie_info(details, external_ids) — pure builder, parses
  release_year from the first 4 chars of release_date (None on
  missing/empty/non-numeric)
* TMDBClient.get_movie_info(tmdb_id) — aggregates
  /movie/{id} + /movie/{id}/external_ids and feeds the parser

Tests cover happy path, missing/null/empty imdb_id, every
release_year edge (none/empty/short/non-numeric/missing key),
and the two required-field errors (id, title).
2026-05-26 00:35:42 +02:00
francwa 97dc799a26 fix(tv_shows): correct PACK vs EPISODIC classification model
The Phase 4 walker + rescan logic classified seasons by parser
output (does the filename carry Exx?), but PACK vs EPISODIC is a
structural distinction:

* PACK = season folder with N flat SxxEyy videos directly inside
* EPISODIC = season folder with N subfolders, each holding one video

Changes:
* walker.py: descends two levels under show_root and classifies
  each season folder by FS structure. SeasonFolder now carries
  mode: ReleaseMode | None. Mixed layouts (flat + subfolders) and
  EPISODIC subfolders with >1 video log a warning and report
  mode=None.
* rescan.py: trusts walker.mode; drops the bogus 'single un-
  numbered video → PACK with empty episodes' branch. A season
  with no parseable episodes is now skipped with a warning.
* Tests rewritten against the real model: PACK with flat numbered
  files, EPISODIC with one-video-per-subfolder, malformed mixed
  layout skipped, single-un-numbered-file skipped.

Suite: 1237 → 1245 passing.
2026-05-25 21:37:34 +02:00
francwa fe9857aaed docs(changelog): Phase 4 Step 5 — record dot_alfred v2 Phase 4 work
Append Phase 4 entry under [Unreleased]:
* Added: rescan_show v2 signature + new rescan_movie + PACK empty-
  episodes semantics + Settings.tmdb_cache_ttl_days + library-index
  anchor-mismatch warning
* Removed: v1 dot_alfred stack (bridge/repository/serializer/sidecar),
  abstract domain ports (TVShowRepository / MovieRepository),
  application/library/ package, two Phase-3 quarantine test files
* Internal: 1233 → 1237 passing, 10 → 8 skips; MediaWithTracks
  mixin parked for Phase 5

Phase 3 entries left intact (historically accurate at commit time).
2026-05-25 21:17:23 +02:00
francwa cc334a7951 feat(dot_alfred/v2): Phase 4 Step 4 — settings + anchor warning
Two small additions that close out Phase 4's loose ends.

Settings — tmdb_cache_ttl_days

    class Settings(BaseSettings):
        # --- DOT_ALFRED ---
        tmdb_cache_ttl_days: int = 14

Default 14 days, matching the dot_alfred_v2 master spec. Will drive
the Phase 5 TTL policy on TVShowLibraryIndexSidecar /
MovieLibraryIndexSidecar (decide when a TMDB-cached entry is stale
and triggers a refresh sync).

Anchor-mismatch warning

DotAlfredTVShowLibraryIndex._load_or_heal and DotAlfredMovieLibraryIndex
._load_or_heal now cross-check each indexed entry's metadata.path
against the on-disk folder layout right after a successful parse.
Drift (sidecar says folder X, X no longer exists under library_root)
is surfaced as a WARNING log — one per missing folder, with the
tmdb_id for cross-reference. No auto-heal on drift; the caller
decides (the heal path remains opt-in via index.heal()).

The warning fires only on the parsed-index path. The heal path
always synthesizes entries from real folder names, so it can never
drift — silent by construction.

Tests

* TestTVShowLibraryIndexAnchorWarning — 3 scenarios:
  warn-on-drift / no-warn-on-match / no-warn-on-heal.
* TestMovieLibraryIndexAnchorWarning — symmetric coverage.

Full suite: 1237 passed / 8 skipped / 4 xfailed.
2026-05-25 21:14:18 +02:00
francwa 86222d95d1 refactor(persistence): Phase 4 Step 3 — delete v1 dot_alfred + ports
Now that rescan_show + rescan_movie run on the v2 release repositories
(Phase 4 Steps 1-2), the v1 dot_alfred stack and its abstract domain
ports have zero callers. Delete them and lift the Phase 3 quarantines.

Deleted

* alfred/infrastructure/persistence/dot_alfred/bridge.py
* alfred/infrastructure/persistence/dot_alfred/repository.py     (v1)
* alfred/infrastructure/persistence/dot_alfred/serializer.py     (v1)
* alfred/infrastructure/persistence/dot_alfred/sidecar.py        (v1)
* alfred/domain/tv_shows/repositories.py     (TVShowRepository ABC)
* alfred/domain/movies/repositories.py       (MovieRepository ABC)
* tests/infrastructure/persistence/dot_alfred/test_repository.py
* tests/infrastructure/persistence/dot_alfred/test_serializer.py

Rewrite

alfred/infrastructure/persistence/dot_alfred/__init__.py now re-
exports only the v2 surface: the four concrete repositories
(DotAlfredSeriesReleaseRepository, DotAlfredMovieReleaseRepository,
DotAlfredTVShowLibraryIndex, DotAlfredMovieLibraryIndex) plus
ShowFolderUnknown. DTO-level imports go through
alfred.infrastructure.persistence.dot_alfred.v2 directly.

No backwards-compat shims (per CLAUDE.md): the v1 names are gone,
not aliased. Test suite drops from 10 → 8 skips (the two Phase 3
module-level skips disappear with the quarantined files).

Full suite: 1233 passed / 8 skipped / 4 xfailed.

The MediaWithTracks mixin in alfred.domain.shared.media is now
orphaned (Episode lost its tracks in Phase 3, MovieRelease doesn't
inherit it). Parked for Phase 5, which will either mount it on
MovieRelease / SeasonRelease or delete it for good.
2026-05-25 21:10:32 +02:00
francwa 9e48c70b8a feat(rescan): Phase 4 Step 2 — add rescan_movie orchestrator
Mirror rescan_show for the movies library. Locates the main video via
find_video_file, runs inspect_release once (movies are one-folder-one-
main-file by convention), and writes a v2 MovieRelease sidecar via
DotAlfredMovieReleaseRepository.

Signature

    rescan_movie(
        movie_dir,
        *,
        tmdb_id: TmdbId,
        imdb_id: ImdbId | None = None,
        movie_repo: DotAlfredMovieReleaseRepository,
        prober,
        kb,
    ) -> MovieRelease

Behavior

* added_at = datetime.now(UTC) — the v2 sidecar records when the
  release was last reconciled with disk, not filesystem mtime (which
  drifts across moves and hard-links). Phase 3 made this field
  required on MovieRelease.
* No TMDB call. Index auto-heals from the new sidecar on next read.
* MovieRescanFailed raised when no video is found inside movie_dir
  (only explicit failure mode; all other adapter errors degrade
  gracefully into empty / partial fields).
* file_path is recorded relative to movie_dir so the sidecar stays
  portable across library moves.

Tests

tests/application/movies/test_rescan.py: 8 scenarios on the real v2
movie repo + real KB + stubbed prober. Covers track flattening,
sidecar round-trip, prober returning None, video in subfolder,
explicit no-video failure, imdb_id optional.

Full suite: 1233 passed / 10 skipped / 4 xfailed.
2026-05-25 21:09:02 +02:00
francwa 7da0f887e7 refactor(rescan): Phase 4 Step 1 — rescan_show on v2 release repo
Rewrite rescan_show to build a SeriesRelease (Phase 1 v2 aggregate)
and persist it via DotAlfredSeriesReleaseRepository. The orchestrator
keeps reusing inspect_release as the single source of parse/probe
truth — only the assembly target changes (SeriesRelease/SeasonRelease/
EpisodeRelease instead of TVShow/Season/Episode).

New signature

    rescan_show(
        show_root,
        *,
        tmdb_id: TmdbId,
        imdb_id: ImdbId | None = None,
        series_repo: DotAlfredSeriesReleaseRepository,
        scanner,
        prober,
        kb,
    ) -> SeriesRelease

Identity is TMDB-anchored (tmdb_id required, no coercion); imdb_id is
optional. No TMDB call from rescan — the library index auto-heals
from the new sidecar on its next read.

PACK vs EPISODIC

* Single-video + season-parsed + no-episode → SeasonRelease(
  mode=PACK, folder=<season folder>, episodes=()). The slot map stays
  empty until the Phase 5 TMDB sync supplies episode_count. We do
  not fabricate an EpisodeRange we cannot prove on disk.
* Otherwise → EPISODIC: every file with (season, episode) becomes an
  EpisodeRelease with EpisodeRange(start, end) = (E, E). Multi-episode
  files (S01E01E02) still record only the first slot — Parser does
  not yet expose episode_end (existing tech debt, unchanged).

Package move

The orchestrator moves from alfred/application/library/ to
alfred/application/tv_shows/ for symmetry with alfred/application/
movies/ (Step 2). walker.py + its tests move with it. The empty
library/ package is deleted.

Tests

tests/application/tv_shows/test_rescan.py rewritten end-to-end on
the real v2 repository, real KB, real scanner, stubbed prober.
9 happy-path + edge-case scenarios cover EPISODIC track flattening,
PACK empty-episodes semantics, sidecar round-trip, imdb_id optional,
empty show root, season folder with no videos, prober returning None.
test_walker.py moved verbatim (import path updated).

Full suite: 1214 passed / 10 skipped / 4 xfailed. The three v1
dot_alfred quarantines from Phase 3 stay in place until Step 3.
2026-05-25 21:07:25 +02:00
francwa c22b2b78eb refactor(domain): Phase 3 — TVShow/Movie aggregates become TMDB-only
Filesystem-side concerns (file paths, tracks, quality, mode, added_at)
move to the releases/ domain added in Phase 1; the TMDB aggregates now
carry only identity + TMDB catalog facts.

Domain entities:
- TVShow: tmdb_id: TmdbId required (primary key), imdb_id: ImdbId | None
  optional, status: str = "unknown" added.
- Season: episode_count: int = 0 added (TMDB-cached); audio_tracks,
  subtitle_tracks, mode property removed.
- Episode: slimmed to identity + title. file_path/file_size/tracks
  removed. No longer inherits MediaWithTracks.
- Movie: tmdb_id required, imdb_id optional. file_path/file_size/quality/
  added_at/audio_tracks/subtitle_tracks removed. get_filename() now
  returns "Title.Year" — quality moves to MovieRelease.

Builders:
- TVShowBuilder requires tmdb_id: TmdbId; imdb_id/status optional.
- SeasonBuilder.set_episode_count(int) replaces set_audio_tracks /
  set_subtitle_tracks.

No-coercion contract: TVShow(tmdb_id=1396) raises — callers pass
TmdbId(1396). No ergonomic shim per the no-shims rule.

Cascade fixes:
- MediaOrganizer test fixtures updated to new Movie/TVShow shapes.
- Movie.get_filename() re-added (without Quality) so MediaOrganizer
  keeps working until Phase 4 rewires it through MovieRelease.

Quarantined (deleted in Phase 4 alongside v1 dot_alfred):
- tests/application/library/test_rescan.py — module-level skip.
- tests/infrastructure/persistence/dot_alfred/test_repository.py —
  module-level skip.
- tests/infrastructure/persistence/dot_alfred/test_serializer.py —
  module-level skip.

Suite: 1216 passed, 11 skipped (8 pre-existing + 3 Phase 3
quarantines), 4 xfailed. CHANGELOG updated under [Unreleased].
2026-05-25 19:54:35 +02:00
francwa 2f160644da feat(dot_alfred/v2): bump SCHEMA_VERSION to 2 — added_at on MovieRelease
Phase 3 prep: Movie aggregate is about to become TMDB-only (no
filesystem fields). added_at is a release-time observation, not a
TMDB-aggregate concern, so it moves to MovieRelease +
MovieReleaseSidecar.

- Add added_at: datetime (required) to MovieRelease with a
  type-check in __post_init__.
- Add added_at: datetime (required) to MovieReleaseSidecar.
- Bump SCHEMA_VERSION 1 → 2 with a version-history note.
- Bridge round-trips added_at via Pydantic mode="json" (datetime
  → ISO 8601 string).
- Tests: update MovieRelease fixtures, add a validator test, add
  an added_at round-trip test, switch hard-coded `1` assertions
  to SCHEMA_VERSION for future-proofing.

No v1 sidecars in the wild yet — no migration code needed.
2026-05-25 19:47:25 +02:00
francwa e65c1df229 feat(.alfred v2 — Phase 2): Pydantic sidecars, atomic repos, auto-heal index
Spec: specs/dot_alfred_v2.md (Phase 2).

New package alfred/infrastructure/persistence/dot_alfred/v2/:
  * sidecar_release.py / sidecar_root.py — Pydantic DTOs
    (extra="forbid", frozen=True) for per-item sidecars and the
    library-root index. schema_version enforced via model_validator.
  * serializer.py — read_yaml / atomic_write_yaml (.tmp + os.replace).
    SidecarSchemaError wraps YAML + Pydantic errors uniformly.
  * bridge.py — lossless domain <-> sidecar for SeriesRelease /
    MovieRelease; projection-only show_index_entry_from /
    movie_index_entry_from with multi-episode-file flattening.
  * repository.py — DotAlfredSeriesReleaseRepository /
    DotAlfredMovieReleaseRepository (log+skip on corruption),
    DotAlfredTVShowLibraryIndex / DotAlfredMovieLibraryIndex with
    silent auto-heal on missing/corrupt index reads. Writes never
    auto-heal (read paths handle that).

TMDB client extensions:
  * TmdbSeasonInfo / TmdbShowInfo DTOs + pure parse_tv_show_info.
  * TMDBClient.get_tv_show_info aggregates /tv/{id} +
    /tv/{id}/external_ids.

Domain change:
  * SubtitleTrack gains is_sdh: bool = False, populated from
    ffprobe's hearing_impaired disposition. Required for v2 sidecar
    parity (spec replaces v1's type: "sdh" with explicit flag).
    Default keeps every existing caller unchanged.

Tests: 37 new v2 integration tests on tmp_path (round-trips, atomic
writes, schema mismatch handling, anchor warnings, auto-heal paths)
plus 16 TMDB DTO tests. Full suite: 1240 -> 1277 passed.

Implementation notes filed in .claude/specs/dot_alfred_v2_notes.md
(strict=True trade-off, upsert signature deviation from spec, etc.).

Phases 3-5 (TVShow/Movie refactor to TMDB-only, rescan_show rewrite,
v1 deletion + wiring) are next.
2026-05-25 16:01:39 +02:00
francwa c0f6d01048 feat(releases): Phase 1 — new filesystem release domain + TmdbId VO
First step of specs/dot_alfred_v2.md. Introduces a separate bounded
context (alfred/domain/releases/) for the filesystem-side aggregates,
disjoint from TMDB identity which stays in tv_shows/ and movies/.
The link between the two worlds is TmdbId, used as the natural key
in the persistence layer (no domain-level reference).

New package alfred/domain/releases/:
- value_objects: EpisodeRange (covers SxxE01E02E03 multi-episode
  files via start/end inclusive range, with count/numbers/is_single
  helpers), ReleaseMode enum (PACK = N video files direct in the
  season folder, EPISODIC = N sub-folders).
- entities: TrackProfile, EpisodeRelease, SeasonRelease (with
  episode_count() summing each EpisodeRange.count()), SeriesRelease
  (tmdb_id primary anchor, optional imdb_id secondary), MovieRelease.
  All frozen dataclasses.
- builders: SeasonReleaseBuilder + SeriesReleaseBuilder mirroring
  the v1 TVShowBuilder pattern. Builders sort episodes by range
  start on emit and reject overlapping ranges (two files claiming
  the same TMDB slot). from_existing() seeds a builder from an
  existing frozen aggregate for round-trip edits.
- repositories: abstract ports (SeriesReleaseRepository,
  MovieReleaseRepository); concrete .alfred sidecar impls arrive
  in Phase 2.

New shared VO alfred/domain/shared/value_objects.py::TmdbId — positive
int, rejects bool/str/float, symmetric with the existing ImdbId VO.

73 unit tests cover VO validation, entity invariants, builder sort
+ overlap detection, and from_existing() round-trips.

v1 code paths are untouched at this stage; the new domain coexists
with the old TVShow aggregate until Phase 3 refactors it.
2026-05-25 15:19:23 +02:00
francwa de7030fa9c feat(library): add rescan_show orchestrator + walker (Step 4)
Step 4 of specs/dot_alfred.md — rebuild a TVShow aggregate from disk
by reusing the existing release pipeline (inspect_release) on every
video file in a show folder, then persist via the .alfred repository.

- alfred/application/library/walker.py — pure structural walk
  (season folders detected via \bS\d{1,2}\b regex, video files
  filtered against kb.video_extensions, no recursion).
- alfred/application/library/rescan.py — orchestrator that ingests
  each season folder, infers PACK vs EPISODIC from on-disk file
  count + parser output, and assembles via TVShowBuilder. Episode
  paths stored relative to show_root. Logs + skips corrupt input
  (no season parsed, mixed season numbers, unparseable episodes).
- Season now inherits MediaWithTracks: PACK seasons carry
  season-level audio_tracks / subtitle_tracks; EPISODIC seasons
  leave them empty (tracks live per-episode). SeasonBuilder gains
  set_audio_tracks / set_subtitle_tracks; bridge writes/reads them
  in the PACK branch via shared _synth_* helpers.

Out of scope, tracked as tech debt: adjacent .srt capture, multi-
episode (episode_end), TMDB-driven PACK detection (the current
heuristic '1 file == PACK' is a placeholder until ShowTracker lands).

18 new tests (11 walker + 7 rescan integration) on tmp_path with
the Foundation layout. Full suite: 1149 passed.
2026-05-24 15:22:18 +02:00
francwa 3622c95154 chore(lint): Lint the shit out of it 2026-05-24 15:21:58 +02:00
francwa c7c11180d9 feat(persistence): add DotAlfredTVShowRepository (filesystem-backed)
Step 3 of specs/dot_alfred.md. Concrete TVShowRepository
implementation reading and writing per-show .alfred YAML files under
a configurable library_root. Writes are atomic (.alfred.tmp +
os.replace), reads tolerate corrupted/wrong-schema sidecars (log +
skip), and the repo never invents a folder name — save(show)
requires the target folder to exist beforehand (raises
ShowFolderUnknown otherwise), matching the spec's
MediaOrganizer-then-sidecar split.

Cold folders without a sidecar are skipped by find_all and yield
None from find_by_imdb_id — the upcoming rescan_show tool (step 4)
will own the opt-in rebuild path.

A small bridge module translates between the rich domain TVShow
(AudioTrack/SubtitleTrack with full ffprobe minutiae) and the
compact sidecar shape (language-only audio, embedded-only subs with
type derived from is_forced). The bridge is intentionally lossy on
probe details the sidecar does not store, per the spec's
factual-only philosophy.

20 integration tests on tmp_path: round-trip save/find,
cold-folder/unknown-id returns, find_all skipping
(corrupted/schema-violating sidecars), delete/exists, atomic write
(no .alfred.tmp leftover), overwrite, and folder-name fallbacks
(get_folder_name guess + full-scan rescue when renamed).
2026-05-22 17:16:41 +02:00
francwa b0e275bd11 feat(persistence): add .alfred sidecar serializer (DTO ↔ dict)
Step 2 of the specs/dot_alfred.md plan. Pure-dict in/out
(serialize(sidecar) -> dict, deserialize(data) -> ShowSidecar);
YAML I/O lives in the repository layer (step 3) and is kept out
for trivial testability.

DTOs mirror the YAML schema field-for-field:
- ShowSidecar (root: imdb_id, tmdb_id, schema_version, seasons)
- SeasonSidecar (number, path, optional audio/subtitles, optional episodes)
- EpisodeSidecar (number, path, optional audio/subtitles)
- SubtitleEntry (language, source, type)

The sidecar acts as a scan cache: it stores only what is genuinely
costly to recompute — folder/file paths (skipping the FS walk) and
probed track metadata (skipping ffprobe). Release identifiers
(group, source, quality, codec) live in folder/file names and are
derived on demand by the parser; they are deliberately absent from
the schema and rejected as unknown keys on deserialize.

The serializer is strict on schema: unknown keys at any level raise
SidecarSchemaError, missing required fields raise clearly, and bool
cannot sneak in as a season/episode number. Optional fields
(tmdb_id, empty audio/subtitles/episodes) are omitted from the
output rather than emitted as null / [].

Tests cover round-trip equivalence (DTO → dict → DTO and DTO → YAML
text → DTO), the Foundation S01 PACK case (real-world fixture with
mixed sub types — superset captured at season scope), and a
Breaking Bad S05 EPISODIC case. An on-disk tmp_path fixture
recreates the Foundation folder structure with placeholder files,
ready to be reused by the upcoming repository walk tests in step 3.
2026-05-22 16:56:56 +02:00
francwa 6c12c18a27 refactor(tv_shows): freeze aggregate, builder-only construction, drop ShowTracker fields
The TVShow aggregate is now fully immutable. TVShow, Season and Episode
are @dataclass(frozen=True), children stored as ordered tuples sorted
by number. All construction goes through TVShowBuilder / SeasonBuilder
(new module), which expose from_existing() to seed from a current
frozen aggregate and apply modifications.

ShowTracker-territory fields are stripped from the domain: ShowStatus,
CollectionStatus, expected_seasons/episodes, aired_episodes,
collection_status(), is_complete_series(), missing_episodes(),
is_ongoing(), is_ended(), Season.name, the aired<=expected validation,
and the TMDB status string mapping. These will reappear in a dedicated
ShowTracker layer (to be designed) combining the .alfred sidecar with
live TMDB data.

New SeasonMode enum (PACK / EPISODIC) computed at read time from the
season's structural shape — never stored, the YAML sidecar encodes the
mode via presence/absence of the episodes: block.

Test suite for the domain entirely rewritten to cover frozen invariants,
builder ordering, last-write-wins, from_existing round-trip, and
SeasonMode derivation. Full suite still green (1078 passed).
2026-05-22 16:09:37 +02:00
francwa 1427c8a54b docs(specs): add dot_alfred sidecar design doc
First entry in the new specs/ directory. Specifies the layout and
semantics of the per-show .alfred/ sidecar that will back the future
concrete TVShowRepository:

- One .alfred/ directory per show, containing show.yaml + one
  season_NN.yaml per season (zero-padded, season_00 for Specials).
- Per-episode entries store file size + mtime so cache lookups skip
  a full ffprobe rescan when nothing changed.
- Self-healing on drift (file missing/modified/new) without raising.
- Atomic writes via temp file + os.replace().
- Phased implementation plan (builder + freeze first, then
  serializer, then cache validator, then repo, then wiring).

No code yet — spec only, awaiting review before the implementation
phases. Companion entry in CHANGELOG (Added).
2026-05-21 18:05:55 +02:00
francwa 8491edac22 infra(gitignore): track specs/ + carve out private .claude/
The repo-level .gitignore had a blanket *.md rule with only
CHANGELOG.md exempted. Two adjustments:

- Allow specs/ to be tracked (design docs / RFCs live here, public).
- Restrict the README.md exception to the root (/README.md) so that
  per-directory README files (e.g. tests/fixtures/releases/README.md)
  stay ignored as before — no unintended scope creep.
- Explicitly ignore /.claude/, the private dev-docs sub-repo that
  lives inside the working tree but is versioned and pushed
  separately.

CHANGELOG: Internal entry.
2026-05-21 18:05:33 +02:00
francwa 02e478a157 refactor(domain): freeze Movie and Episode, switch track collections to tuple
Movie and Episode become @dataclass(frozen=True, eq=False), with
audio_tracks/subtitle_tracks held as tuple[...] instead of list[...].
Identity-based equality is preserved via the existing __eq__/__hash__.
__post_init__ coercion (imdb_id, title, season_number, episode_number)
uses object.__setattr__ to stay compatible with frozen.

The MediaWithTracks mixin contract is updated to tuple accordingly.

Callers projecting enrichment results (probe output, file metadata) now
rebuild via dataclasses.replace(...) — same pattern recently adopted for
ParsedRelease.

Season and TVShow stay mutable for now: freezing the aggregate root
would cascade a full reconstruction on every add_episode, deferred.
2026-05-21 13:40:22 +02:00
francwa 3dc73a5214 feat(release): add fullwidth vertical bar | (U+FF5C) to separators
CJK release names sometimes use the fullwidth vertical bar as a token
separator, as do occasional decorative YouTube-style uploads. Adding
the codepoint to separators.yaml lets the tokenizer split on it
instead of leaving the wide pipe glued onto an adjacent token.

The tokenizer in alfred/domain/release/parser/pipeline.py iterates
the separator list as plain strings (no regex), so a multi-byte
UTF-8 separator works without any code change.
2026-05-21 08:05:56 +02:00
francwa 88f156b7a4 refactor(subtitles): rename SubtitleCandidate → SubtitleScanResult
The old name conflated 'might become a placed subtitle' with 'what a
scan pass produced'. The class is the output of a scan/identify pass —
language/format may still be None while classification is in progress,
confidence reflects classifier certainty, raw_tokens holds filename
fragments under analysis. SubtitleScanResult says that directly.

Pure rename + refreshed docstring; no behavior change. Touches the
domain entity, the matcher/identifier/utils services, the
manage_subtitles use case, the placer, the metadata store, the
shared-media cross-ref comment, and 7 test modules.
2026-05-21 08:05:46 +02:00
francwa 5107cb32c0 feat(release): InspectedResult.recommended_action centralizes exclusion decision
Add a derived 'recommended_action' property on InspectedResult that
collapses the orchestrator's go / wait / skip decision into one value:

- 'skip'      → no main_video, or media_type == 'other'
- 'ask_user'  → media_type == 'unknown', or road == 'path_of_pain'
- 'process'   → confident parse with a main video on disk

The ordering is part of the contract (skip > ask_user > process) —
documented in the property docstring.

Until now every consumer (workflows, the agent, the orchestrator
sketch) had to re-derive this from the road / media_type / main_video
triple, with subtle drift between sites. One place, one rule.

Exposed through the analyze_release tool so the LLM can route on it.
Spec YAML updated to describe the new field.

Suite: 1083 passed (+6 new tests in tests/application/test_inspect.py
covering the four branches and the precedence rules).
2026-05-21 07:54:17 +02:00
francwa b7979c0f8b refactor(release): freeze ParsedRelease + enrich_from_probe returns new instance
ParsedRelease is now @dataclass(frozen=True). The enrichment passes that
used to patch fields in place now produce new instances:

- enrich_from_probe(parsed, info, kb) returns a new ParsedRelease via
  dataclasses.replace (no allocation when no field changed).
- inspect_release rebinds 'parsed' after detect_media_type (wrapped in
  MediaTypeToken — the strict isinstance check now also runs on
  replace) and after enrich_from_probe.

languages becomes a tuple[str, ...] so the VO is properly immutable.
Parser pipeline packs languages as a tuple in the assemble dict.

Callers updated: inspect_release, testing/recognize_folders_in_downloads.py.
Tests updated: 22 enrich_from_probe call sites rebound, language
assertions switched to tuple literals, test_release_fixtures normalizes
result['languages'] back to list for YAML-fixture comparison.

Suite: 1077 passed.
2026-05-21 07:51:49 +02:00
francwa 9f1ce94690 refactor(application): inject kb/prober into resolve_destination use cases
Remove the module-level _KB / _PROBER singletons from
alfred/application/filesystem/resolve_destination.py. The four
resolve_{season,episode,movie,series}_destination use cases now take
kb: ReleaseKnowledge and prober: MediaProber as required arguments,
matching the shape of inspect_release.

The singletons now live at the agent-tools frontier
(alfred/agent/tools/filesystem.py), where the LLM-facing wrappers
instantiate YamlReleaseKnowledge / FfprobeMediaProber once and thread
them through. The wrappers' Python signatures are unchanged — the
inspect-based JSON-schema generator in agent/registry.py still sees the
same LLM-passable params.

analyze_release drops the dirty 'from ... import _KB' indirection.

Tests inject their own stubs by keyword (prober=_StubProber(...)) via
thin convenience wrappers, replacing the prior
monkeypatch.setattr(rd, '_PROBER', ...) pattern.

testing/debug_release.py: instantiate YamlReleaseKnowledge() /
FfprobeMediaProber() inline at the two call sites.

Suite: 1077 passed.
2026-05-21 07:46:13 +02:00
francwa 5e0ed11672 refactor(release): rename ParsePath enum to TokenizationRoute
ParsePath collided with pathlib.Path in mental models, and was one
letter from the parse_path attribute that stores its value — confusion
on confusion. Road (EASY/SHITTY/PATH_OF_PAIN) is the parser-confidence
axis; TokenizationRoute (DIRECT/SANITIZED/AI) is the tokenization-method
axis. They're orthogonal and the new name makes that obvious.

Field name parse_path stays — it's the right name for the attribute
that *holds* the route. String values ("direct", "sanitized", "ai")
stay too, so YAML fixtures and the analyze_release tool spec are
unchanged. Only the type symbol changes:

- value_objects.py: class rename + docstring spelling out orthogonality
  with Road.
- services.py: 3 call sites.
- scoring.py: docstring cross-reference updated.
- tests/domain/release/test_parser_v2_scoring.py: import + 3 call sites.
2026-05-21 07:39:42 +02:00
francwa 0246f85ef8 refactor(release): move codec mappings from code to YAML knowledge
The three module-level dicts in enrich_from_probe (ffprobe codec name
to scene token, channel count to layout) were exactly the kind of
domain lookup table CLAUDE.md says belongs in YAML, not in Python.
Move them to alfred/knowledge/release/probe_mappings.yaml, load
through a new ReleaseKnowledge.probe_mappings port field, and add a
kb parameter to enrich_from_probe so the consumer reads the maps via
the same injection pattern as everything else.

- New knowledge file: alfred/knowledge/release/probe_mappings.yaml
- New loader: load_probe_mappings() in infrastructure/knowledge/release.py
  (normalizes channel-count keys back to int).
- Port: ReleaseKnowledge gains probe_mappings: dict.
- Adapter: YamlReleaseKnowledge populates it at __init__.
- Consumer: enrich_from_probe(parsed, info, kb) reads the three sub-maps
  from kb.probe_mappings; unknown codecs still fall back to uppercase
  raw value, same behaviour as before.
- Call sites updated: inspect_release passes kb through; the testing
  script gets its kb wiring (it was already broken since the
  ReleaseKnowledge refactor); all 22 enrich_from_probe call sites in
  tests/application/test_enrich_from_probe.py pass _KB.
2026-05-21 07:37:42 +02:00
francwa e62dc90bd1 refactor(release): make tech_string a derived property
ParsedRelease.tech_string was a stored str field re-computed in two
places (assemble() at parse time, enrich_from_probe() after the probe).
The second site was a reactive fix (e79ca46) for filename builders that
saw a stale value. Turn it into an @property so it stays in sync with
quality/source/codec by construction.

- Drop the field from the dataclass + the key from assemble()'s dict.
- Drop tech_string="" from parse_release's malformed-name fallback.
- Drop the manual recomputation at the end of enrich_from_probe.
- Inject the property into asdict() result in the fixtures runner
  (same treatment as is_season_pack).
- Update tests that passed tech_string= to the constructor; rewrite the
  TestTechString case that mutated p.tech_string manually.
2026-05-21 07:33:53 +02:00
francwa 688c37bbec docs(changelog): recap session 2026-05-20 tech-debt cleanup
Consolidate the five domain-purity refactors of the session under
[Unreleased]: RuleScopeLevel enum, FilePath VO post_init, Language
strict + from_raw, ParsedRelease.normalised → clean, ParsedRelease
enum strictness. Removes the duplicate min_movie_size_bytes entry
(now sits under its proper Removed section).
2026-05-20 23:57:06 +02:00
francwa 757e4045ee refactor(release): ParsedRelease.media_type & parse_path are strict enums
The fields were already typed as MediaTypeToken / ParsePath, but a
tolerant __post_init__ coerced raw strings into their enum form. With
MediaTypeToken(str, Enum) (and ParsePath idem), the coercion served no
purpose — callers that pass '.value' got back the enum anyway, and
callers that pass an unknown string got a ValidationError just like
they would now.

Strict mode: constructor rejects non-enum values directly. The two
in-tree builders (parse_release() and the parser pipeline) already
produce enum values; all .value sites have been removed. Drops the
unused _VALID_MEDIA_TYPES / _VALID_PARSE_PATHS lookup tables.
2026-05-20 23:52:30 +02:00
francwa c3767aacb6 refactor(release): rename ParsedRelease.normalised → clean
Le champ s'appelait normalised mais ne faisait pas la normalisation
suggérée par son nom (dots instead of spaces). En pratique il contient
raw - site_tag - apostrophes, qui sert uniquement à season_folder_name()
via _strip_episode_from_normalized. Renommé en 'clean' qui décrit ce
qu'il contient réellement, docstring corrigée.
2026-05-20 23:50:05 +02:00
francwa 5bcf22b408 refactor(shared): Language VO is strict; from_raw() factory for un-normalized input
object.__setattr__ inside __post_init__ on a frozen dataclass is a
code smell — it bypasses the immutability guarantee to mutate fields
mid-construction. Split the responsibilities:

* Direct constructor is strict — rejects un-normalized input (uppercase
  iso, whitespace in aliases, etc.) so once a Language exists in the
  system, its fields are guaranteed canonical.
* Language.from_raw() factory handles arbitrary YAML/user input — it
  lowercases the iso, dedups/normalizes aliases, then constructs.

Only caller that built from raw data (LanguageRegistry loading YAML)
moves to from_raw(). Test fixtures already pass normalized data so
they keep using the direct constructor.
2026-05-20 23:48:30 +02:00
francwa cfa9f54d9f refactor(shared): FilePath VO uses __post_init__ instead of custom __init__
Custom __init__ on a @dataclass(frozen=True) is a code smell — it
bypasses the generated dataclass __init__ and re-implements the
str/Path coercion + frozen-aware setattr by hand. Replaced with a
single __post_init__ that performs the same normalization. Same
public API (FilePath(str) and FilePath(Path) both work), same
behavior, no callers touched.
2026-05-20 23:47:03 +02:00
francwa f0aaf50c97 refactor(subtitles): RuleScope.level → RuleScopeLevel enum
Six niveaux possibles (global, release_group, movie, show, season,
episode) étaient passés en str libre, le commentaire docstring servant
de seule documentation. Introduit RuleScopeLevel(str, Enum) — toujours
sérialisable en YAML, mais le set fixe est désormais imposé par le
typage. to_dict() sort explicitement .value pour rester safe côté
écrivains YAML.
2026-05-20 23:46:22 +02:00
francwa a09262b33f chore(settings): remove unused min_movie_size_bytes
Le champ + son validator étaient orphelins depuis la suppression
de MovieService.validate_movie_file. L'exclusion par extension
(application/release/supported_media.py) + le PoP couvrent désormais
la règle 'vrai film vs sample'. Si on a un jour besoin d'un seuil de
taille, il ira dans data/knowledge/, pas dans settings.
2026-05-20 23:41:41 +02:00
francwa 9c7cd66d2b Merge branch 'refactor/flatten-shared-media' 2026-05-20 23:35:52 +02:00
francwa 83dbed887b refactor(domain): flatten shared/media package into single module
Six small files (audio, video, subtitle, info, matching, tracks_mixin
+ __init__) collapsed into one ~250 LoC media.py module. Python treats
media.py and media/__init__.py interchangeably, so the 12 import sites
that read 'from alfred.domain.shared.media import ...' continue to work
without changes.

Reasoning: the whole bounded context fits on one screen; splitting into
sub-modules added more navigation friction than it saved. Tests stay
green (1077 passed).
2026-05-20 23:35:49 +02:00
francwa 0c9489e16b Merge branch 'feat/parser-phase-d' 2026-05-20 23:30:36 +02:00
francwa 621bb96995 fix(release/parser): pre-strip apostrophes so titles like Don't parse cleanly
Apostrophes are in the forbidden-chars list, which made any release
with a title like "Don't" or "L'avare" short-circuit to the AI
fallback (parse_path=ai, everything UNKNOWN). They are now stripped
up front from the name before the well-formed check and tokenize,
so the parse completes normally. The raw name is preserved on the
VO; only the title field loses its apostrophe.

parse_path becomes 'sanitized' when an apostrophe was stripped, to
surface that the parser cleaned something up.

Fixtures updated:
- shitty/honey_uhd_hdr/ — went from total UNKNOWN to a clean parse
  (title=Honey.Dont, year=2025, quality=2160p, source=WEBRip,
  codec=x265, group=Amen).
- path_of_pain/the_prodigy_full_chaos/ — went from total failure to
  partial success (title, year, source, codec extracted). Remaining
  gaps (1080i, multi-word audio, Blu-ray-with-dash) are tracked
  separately in tech debt.
2026-05-20 23:29:10 +02:00
francwa 448ef3b79c fix(release/parser): recognize Sxx-yy season range as tv_complete
`Der.Tatortreiniger.S01-06.GERMAN...` previously parsed as a movie
with 'S01-06' glued to the title. The parser now matches the
season-range form in _parse_season_episode (returning season=first,
episode=None), and the assemble step detects the range token to
promote media_type to 'tv_complete'.

The first season is exposed as `season` so `is_season_pack`
fires (season is not None and episode is None) — useful for routing
to a series root folder.

Fixture shitty/tatortreiniger_flat_multiseason/ updated:
- title: Der.Tatortreiniger.S01-06 → Der.Tatortreiniger
- season: null → 1
- media_type: movie → tv_complete
- is_season_pack: false → true
2026-05-20 23:26:40 +02:00
francwa b1c7f35ffb fix(release/parser): drop pure-punctuation TITLE tokens at assembly
Releases using ' - ' as a separator (Vinyl - 1x01 - FHD) tokenize to
['Vinyl', '-', '1x01', '-', 'FHD'] — the standalone '-' tokens were
ending up in title_parts and leaked into the joined title
('Vinyl.-'). We can't add '-' to the separator list (it would break
codec-GROUP), so we filter at assembly: a TITLE token with no
alphanumeric characters carries no title content.

Side win: same logic eliminates the UTF-8 wide-pipe '|' from the
khruangbin_yt_wide_pipe fixture title.

Fixtures updated:
- shitty/vinyl_1x01_format/expected.yaml (title: Vinyl.- → Vinyl)
- path_of_pain/khruangbin_yt_wide_pipe/expected.yaml (| dropped)
2026-05-20 23:24:40 +02:00
francwa 5bbdc9081f fix(release/parser): collapse chained multi-episode markers to full range
S14E09E10E11 previously parsed to episode=9, episode_end=10 — E11
was silently dropped. The parser now takes episodes[-1] as
episode_end so the full chain is captured (episode=9, episode_end=11).
Intermediate values stay implied.

Fixture shitty/archer_multi_episode/ updated from anti-regression of
the bug to anti-regression of the fix.
2026-05-20 23:23:08 +02:00
francwa 5d7b214af2 Merge branch 'refactor/language-port' 2026-05-20 23:20:18 +02:00
francwa 18267d0165 refactor(language): LanguageRepository port + SubtitleKnowledgeBase wired to it
Mirror the MediaProber / FilesystemScanner pattern for language lookup:

- New Protocol `LanguageRepository` in alfred.domain.shared.ports
  covering from_iso, from_any, all, __contains__, __len__ — the
  surface previously coupled to the concrete LanguageRegistry.
- SubtitleKnowledgeBase types its `language_registry` parameter
  against the Protocol; the concrete LanguageRegistry stays in
  infrastructure as the YAML-backed adapter and remains the default
  when no repository is injected.
- New unit tests in tests/infrastructure/test_language_registry.py
  cover the adapter surface (from_iso, from_any, membership,
  case-insensitivity, non-string inputs).

Behaviour is unchanged for existing callers. The split opens the
door to in-memory fakes in future tests without loading the full
ISO 639 YAML.
2026-05-20 23:18:25 +02:00
francwa 19fe8a519a Merge branch 'feat/release-inspect-orchestrator'
Inspection pipeline groundwork:
- MediaProber.probe() port extension (full media inspection on the port)
- inspect_release orchestrator + InspectedResult frozen VO
- enrich_from_probe now refreshes tech_string
- resolve_*_destination use cases consume inspect_release
- detect_media_type & enrich_from_probe moved to application/release
2026-05-20 09:31:22 +02:00
francwa a0d1846ff2 refactor(release): move detect_media_type & enrich_from_probe to application/release
Both helpers are inspection-pipeline pieces, not filesystem use cases —
they belong next to inspect_release, not next to move_media /
resolve_destination / list_folder.

The move also kills the lazy import that was hiding inside
_resolve_parsed: alfred.application.filesystem.resolve_destination
no longer triggers a cycle through alfred.application.filesystem
__init__ when loading inspect_release. Top-level import restored.

Call sites updated: inspect.py, test_detect_media_type.py,
test_enrich_from_probe.py, testing/recognize_folders_in_downloads.py.
Module docstrings + test-file docstrings updated to match the new
location.
2026-05-20 09:29:58 +02:00
francwa 0fb59a4581 feat(filesystem): wire inspect_release into resolve_destination
The four resolve_*_destination use cases now route through a private
_resolve_parsed helper that picks the right entry point:

  - source path provided AND it exists -> inspect_release(name, path)
    runs the full pipeline (parse + media-type refinement + probe
    + enrich), so missing tech tokens (quality, codec, ...) get
    filled by ffprobe and the refreshed tech_string lands in the
    destination folder / file names.

  - source path missing or absent       -> parse_release(name) only,
    same behavior as before. Back-compat: tests using fake /dl/*.mkv
    paths still pass unchanged.

resolve_episode_destination / resolve_movie_destination reuse their
existing source_file parameter as the inspection target. The two
folder-move use cases (season / series) gain a new OPTIONAL
source_path parameter — threaded through the agent tool wrappers
and documented in the YAML specs.

The lazy import inside _resolve_parsed avoids a circular import:
inspect_release imports detect_media_type / enrich_from_probe from
the same application.filesystem package whose __init__ re-exports
resolve_destination.

Three new tests in TestProbeEnrichmentWiring with a stub MediaProber
prove the wiring: movie picks up probe quality, season picks it up
via source_path, and a missing path correctly skips probe (back-compat
guard).
2026-05-20 09:26:30 +02:00
francwa e79ca462b8 fix(release): refresh tech_string after enrich_from_probe
enrich_from_probe fills None fields on ParsedRelease (quality, source,
codec, audio_*, languages) but left tech_string at its parser-time
value — so the filename builders (movie_folder_name, episode_filename,
…) saw stale tech tokens even after a successful probe.

Re-derive tech_string the same way the parser does — quality.source.codec
joined by dots, skipping None — at the end of enrich_from_probe. Token-
level values still win because enrich only fills None fields.

Four new tests in TestTechString cover: enrichment rebuilds it,
existing source survives, no-info input leaves it untouched, fully
empty parsed produces ''.
2026-05-20 09:26:09 +02:00
francwa 03aa844d7d feat(release): inspect_release orchestrator + InspectedResult VO
New application-layer entry point that composes the four inspection
layers in one call:

  1. parse_release(name, kb)              -> (ParsedRelease, ParseReport)
  2. detect_media_type(parsed, path, kb)  -> patch parsed.media_type
  3. find_main_video(path, kb)            -> Path | None (top-level scan)
  4. prober.probe(video) + enrich         -> when video exists and
                                             media_type not in
                                             {unknown, other}

Returns a frozen InspectedResult(parsed, report, source_path,
main_video, media_info, probe_used). kb and prober are injected — no
module-level singletons in inspect.py.

analyze_release tool now delegates to inspect_release; its output
gains two fields, confidence (0-100) and road (easy/shitty/path_of_pain),
surfaced from ParseReport so the LLM can route by confidence. Spec
updated to document them.

12 new tests covering happy paths, probe gating (no video, media_type
'other', probe failure), mutation contract (detect refining
parsed.media_type, enrich filling None fields), resilience
(nonexistent path), and frozen contract. Suite: 1058 passing.
2026-05-20 09:15:29 +02:00
francwa c303efea48 refactor(probe): consolidate full probe() into MediaProber port
Add probe(video) -> MediaInfo | None to the MediaProber Protocol and
implement it on FfprobeMediaProber. The standalone
alfred/infrastructure/filesystem/ffprobe.py module is removed; all
callers (analyze_release / probe_media tools, testing scripts) now go
through the adapter.

Tests for the probe path moved to tests/infrastructure/test_ffprobe_prober.py
(patching subprocess.run at the adapter module level).

Unblocks the upcoming inspect_release orchestrator, which needs the
port — not a free function — to compose parse + main-video selection
+ probe in one shot.
2026-05-20 09:11:24 +02:00
francwa 5db350a1df Merge branch 'feat/release-parser-scoring' 2026-05-20 08:47:38 +02:00
francwa 12dc796ea2 docs(changelog): freeze confidence scoring + exclusion work block 2026-05-20 08:47:29 +02:00
francwa 9ddd85929e feat(release): pre-pipeline exclusion helpers
Add the application-layer helpers that decide which files are worth
parsing, sitting one notch above parse_release.

- is_supported_video(path, kb): extension-only check against
  kb.video_extensions. Lowercased suffix lookup. Directories and
  broken symlinks return False.
- find_main_video(folder, kb): top-level scan only (no recursion into
  subdirectories — releases that wrap their video in Sample/ are
  PATH_OF_PAIN territory). Lexicographically-first eligible file wins
  when several qualify (deterministic, no size-based ranking). A bare
  file as folder argument is supported for single-file releases.

No size threshold and no filename heuristics ('sample' / 'trailer'):
the parser's job is to extract structure, not to second-guess
non-standard release shapes. PoP catches the rest.

17 tests under tests/application/test_supported_media.py.
2026-05-20 01:34:32 +02:00
francwa ed7680b58f docs(changelog): log parse-confidence scoring + ParseReport tuple 2026-05-20 01:21:47 +02:00
francwa b4c9efd13b feat(release): parse_release returns (ParsedRelease, ParseReport)
Wire the scoring foundations into the parser entry point. parse_release
now returns a tuple — the structural ParsedRelease and a diagnostic
ParseReport carrying confidence (0-100), road
(EASY / SHITTY / PATH_OF_PAIN), the residual UNKNOWN tokens, and the
list of critical fields that couldn't be filled.

EASY is decided structurally (a group schema matched), independently
of the score. SHITTY vs PATH_OF_PAIN is decided by score against the
60 cutoff from scoring.yaml. Malformed names (forbidden chars) emit a
zero-confidence PoP report and short-circuit to parse_path=AI as
before.

ParsePath stays as-is (DIRECT / SANITIZED / AI) — it records *how* we
tokenized, not how confident we are. The two dimensions are now
properly separated.

Call sites propagated:
- alfred/application/filesystem/resolve_destination.py (4 occurrences)
- alfred/agent/tools/filesystem.py
- tests/domain/test_release.py
- tests/domain/test_release_fixtures.py
- tests/application/test_detect_media_type.py

New tests/domain/release/test_parser_v2_scoring.py (22 cases) locks
ParseReport validation, compute_score arithmetic, decide_road
thresholding, the collector helpers, and the end-to-end tuple contract.
2026-05-20 01:21:30 +02:00
francwa 98c688f29b feat(release): foundations for parse-confidence scoring
Add the building blocks for Phase A scoring without yet wiring them
into parse_release. Nothing changes at runtime — parse_release still
returns a single ParsedRelease — but the pieces needed to upgrade it
in a follow-up commit are now in place.

- alfred/knowledge/release/scoring.yaml: weights / penalties /
  thresholds. Title and media_type are heavy (30 / 20), structural
  fields medium (year 15, season 10), tech fields light (5 each).
  Unknown-token penalty 5 capped at -30. SHITTY/PoP cutoff at 60.
- load_scoring() loader with safe defaults baked in: a missing or
  partial YAML only de-tunes, never breaks.
- ReleaseKnowledge port grows a 'scoring: dict' field. YamlReleaseKnowledge
  populates it from load_scoring().
- New parser/scoring.py module with Road enum (EASY / SHITTY /
  PATH_OF_PAIN, distinct from ParsePath which records the tokenization
  route), and pure functions: compute_score, decide_road,
  collect_unknown_tokens, collect_missing_critical.
- ParseReport frozen VO in value_objects.py — exported alongside
  ParsedRelease.
2026-05-20 01:21:17 +02:00
francwa fcd80763e2 Merge branch 'refactor/release-parser-v2' 2026-05-20 01:08:20 +02:00
francwa 629387591f docs(changelog): freeze release parser v2 work block (2026-05-20) 2026-05-20 01:08:17 +02:00
francwa 230a7ab88a docs(changelog): log SHITTY simplification + distributor split 2026-05-20 01:03:52 +02:00
francwa 3737f66851 refactor(release): simplify SHITTY to dict-driven token tagging
Replace the ~480-line legacy heuristic block in services.py with a
small dict-driven pass in pipeline._annotate_shitty: each token is
looked up against the kb buckets (resolutions / sources / codecs /
distributors / year / sxxexx) with first-match-wins semantics, the
leftmost contiguous UNKNOWN run becomes the title, done.

SHITTY's scope is intentionally narrow — releases that *look* like
scene names but don't have a registered group schema. Anything more
exotic (parenthesized tech, bare-dashed title fragments, YT slugs,
franchise boxes) is PATH OF PAIN territory and stays out of here.

- annotate() no longer returns None; SHITTY is the always-on fallback
- services.py shrunk from ~525 to ~85 lines (legacy extractors gone)
- 4 fixtures get xfail markers documenting PoP-grade pathologies
  (deutschland franchise box, sleaford YT slug, super_mario bilingual,
  predator space-separators — the last one moved from shitty/ → pop/)
- ReleaseFixture grows xfail_reason; the parametrized suite wires the
  pytest.mark.xfail(strict=False) automatically
2026-05-20 01:03:25 +02:00
francwa fd3bd1ad8c feat(release): distinguish streaming distributors from sources
Introduce a separate dimension for streaming-platform tags (NF, AMZN,
DSNP, HMAX, ATVP, …) so they stop polluting the encoding-source field.
WEB-DL is the source; the platform that released it is the distributor.

- new distributors.yaml knowledge file
- ReleaseKnowledge port exposes distributors set
- TokenRole.DISTRIBUTOR + ParsedRelease.distributor field
- removed NF/AMZN/DSNP/HMAX/ATVP from sources.yaml
- notre_planete fixture now records distributor: NF
2026-05-20 01:03:11 +02:00
francwa 7dc7f0c241 feat(release): v2 enricher pass for audio/video-meta/edition/language
The EASY pipeline now extracts the full ParsedRelease surface from
known-group releases, not just the structural backbone. Behavior is
unchanged for releases that don't carry these tokens.

Pipeline (parser/pipeline.py):
- Structural walk (renamed _annotate_structural): no longer requires
  body to be fully consumed. Tokens passed over between schema chunks
  remain UNKNOWN so the enricher pass can claim them.
- _find_chunk(): scans forward in the body for the next token matching
  a given role, skipping already-annotated tokens. Lets optional and
  mandatory chunks both tolerate intercalated enricher tokens.
- _annotate_enrichers(): new non-positional pass. Walks UNKNOWN tokens
  and tags AUDIO_CODEC / AUDIO_CHANNELS / BIT_DEPTH / HDR / EDITION /
  LANGUAGE. Multi-token sequences from kb.audio / kb.video_meta /
  kb.editions are matched first (longest-first ordering preserved from
  the YAML), single tokens after.
- _apply_sequences(): mutates the token list, tagging the first token
  of a matched sequence with extra['sequence']=<canonical value> and
  trailing members with extra['sequence_member']='True' so assemble
  skips them.
- _detect_channel_pairs(): handles the '5.1' / '7.1' case where the
  '.' separator splits the layout into two tokens. Strips a trailing
  '-GROUP' suffix on the second before joining.

Assemble:
- New fields populated: languages (list), audio_codec, audio_channels,
  bit_depth, hdr_format, edition. Each role-handler skips
  sequence_member tokens.
- media_type heuristic extended: edition in {COMPLETE, INTEGRALE,
  COLLECTION} + no season → tv_complete (mirrors legacy).

Tests:
- 4 new TestEnrichers cases covering bit_depth+audio_codec+channels,
  HDR sequence + edition sequence + TrueHD.Atmos + 7.1, multi-language
  with DTS-HD.MA sequence, TV episode with single language.
- All 14 v2 tests + 30 fixture tests still green. Suite: 1011 passed,
  8 skipped.

Refs: project_release_parser_v2_specs (memory)
2026-05-20 00:26:05 +02:00
francwa 075a827b0e feat(release): wire v2 EASY path for known release groups
The annotate-based v2 pipeline now handles releases ending in -KONTRAST,
-ELiTE, or -RARBG. Unknown groups still fall through to the legacy
SHITTY heuristic in services.py — nothing changes for them.

Pipeline (alfred/domain/release/parser/pipeline.py):
- tokenize(): string-ops separator split, strips [site.tag] first.
- annotate(): right-to-left group detection (priority to codec-GROUP
  shape, fallback to any non-source dashed token), GroupSchema lookup
  via the kb port, then lockstep walk of tokens against schema chunks.
  Optional chunks skip on mismatch, mandatory mismatches return None so
  the caller falls back gracefully. CODEC pre-consumed by a codec-GROUP
  trailing token correctly skips the CODEC chunk in the body walk.
- assemble(): folds annotated tokens into a ParsedRelease-compatible
  dict (title joined by '.', group from the codec-GROUP token's extras).

Schema (alfred/domain/release/parser/schema.py):
- GroupSchema + SchemaChunk frozen value objects.
- TokenRole.GROUP added.

Port + adapter:
- ReleaseKnowledge.group_schema(name) lookup added (case-insensitive).
- YamlReleaseKnowledge loads alfred/knowledge/release/release_groups/
  *.yaml at construction time; learned overrides in
  data/knowledge/release/release_groups/ also picked up.

Knowledge:
- release_groups/kontrast.yaml, elite.yaml, rarbg.yaml declare the
  canonical chunk_order. ELiTE marks source as optional (Foundation.S02
  has no WEBRip token).

Services:
- parse_release tries the v2 path first; on None falls through to the
  legacy implementation untouched.

Tests:
- tests/domain/release/test_parser_v2_easy.py (10 cases) cover group
  detection (codec-GROUP, dashed-source skip, no-dash → unknown),
  schema-driven annotation (movie, TV episode, season pack with
  optional source, unknown group returns None), and field assembly.
- Existing tests/domain/test_release_fixtures.py (30 cases) stay green:
  5 EASY fixtures now produced by v2, 25 SHITTY/PATH OF PAIN fixtures
  still produced by the legacy path. Verified via spy on v2.assemble.

Suite: 1007 passed, 8 skipped.

Refs: project_release_parser_v2_specs (memory)
2026-05-20 00:21:11 +02:00
francwa a2c917618f feat(release): scaffold v2 parser package (annotate-based pipeline)
New package alfred/domain/release/parser/ lays the foundation for the
release parser refactor (specs in memory). Exposes:

- Token: frozen VO carrying text + stream index + TokenRole + extra dict.
  with_role() returns a new instance (no mutation).
- TokenRole: str-backed enum split into structural (TITLE/YEAR/SEASON_EP/
  GROUP), technical (RESOLUTION/SOURCE/CODEC/AUDIO_*/BIT_DEPTH/HDR/
  EDITION/LANGUAGE), and meta (SITE_TAG/UNKNOWN) families.
- pipeline.strip_site_tag(): pulls a [site.tag] prefix or suffix.
- pipeline.tokenize(): release name -> list[Token] (all UNKNOWN),
  string-ops split on kb.separators (no regex, per CLAUDE.md).
- pipeline.annotate(): documented stub. Walk order recorded in docstring
  (group right-to-left, then season/episode, year, tech, title).

Legacy parse_release in release.services remains the live implementation
until the annotate step lands. Scaffolding tests verify Token API,
site-tag stripping (prefix/suffix), and tokenize output shape.

Refs: project_release_parser_v2_specs (memory)
2026-05-20 00:12:33 +02:00
338 changed files with 14642 additions and 6141 deletions
+6
View File
@@ -74,5 +74,11 @@ docs/
# .md files (project-level Markdown is brol-y; allow-list the ones we track) # .md files (project-level Markdown is brol-y; allow-list the ones we track)
*.md *.md
!CHANGELOG.md !CHANGELOG.md
!/README.md
!specs/
!specs/**/*.md
# Private dev docs (separate git repo inside; see .claude/CLAUDE.md)
/.claude/
# #
+880
View File
@@ -15,8 +15,872 @@ callers).
## [Unreleased] ## [Unreleased]
### Changed
- **`filesystem` infra + application rewritten as 5 atomic free
functions.** On branch `unfuck`. Replaces the monolithic
`FileManager` class + scattered helpers with five small, pure ops in
`alfred/infrastructure/filesystem/`: `list_dir`, `create_dir`,
`link_file`, `move_file`, `move_dir`. Each takes `pathlib.Path`
arguments and raises typed exceptions from a dedicated hierarchy
(`FilesystemError``SourceNotFound` / `DestinationExists` /
`NotADirectory` / `NotAFile` / `PermissionDenied` / `CrossDevice` /
`FilesystemOSError`) — no more `{"status": "ok" | "error"}` dicts at
the infra boundary, no more `get_memory()` reads.
- **`filesystem` application: 5 use cases as free functions.** A
matching `<op>_use_case(path, …, roots: DirectoryRoots)` wraps each
infra op, guards inputs against escaping a new `DirectoryRoots` VO
(downloads / torrents / movies / tv_shows), catches infra exceptions,
and returns a frozen `<Op>Response` DTO. Roots are now injected, not
pulled from the global memory singleton.
- **Agent tool wrappers partially re-wired** to the new use cases.
`list_folder` now delegates to `list_dir_use_case`; `move_media`
to `move_file_use_case`; `move_to_destination` chains
`create_dir_use_case` + `move_file_use_case`; a new
`create_directory` tool wraps `create_dir_use_case`. Roots are
loaded once via a module-level `_load_directory_roots()` helper
that reads the persisted memory (no more per-call singleton
reads inside the use cases themselves).
### Removed
- `FileManager` / `MediaOrganizer` / `create_folder` / `move` from the
public API of `alfred.infrastructure.filesystem`. Their files remain
on disk renamed with an `_OLD` suffix (e.g. `file_manager_OLD.py`) so
the migration can finish on a follow-up commit without losing
reference material. They are no longer re-exported from `__init__`.
- `CreateSeedLinksUseCase` / `ListFolderUseCase` / `MoveMediaUseCase` /
`ManageSubtitlesUseCase` / `resolve_destination` from the public API
of `alfred.application.filesystem`. Same `_OLD` rename treatment.
This intentionally breaks current tool wrappers and tests downstream
— re-wiring is the next chunk of work on this branch.
- **Agent tools dropped during the refactor** (to be reintroduced
when the matching domain/application code lands):
`manage_subtitles`, `set_path_for_folder`, `create_seed_links`,
`resolve_season_destination`, `resolve_episode_destination`,
`resolve_movie_destination`, `resolve_series_destination`.
Their wrappers are removed from `alfred.agent.tools.filesystem`;
`alfred.agent.tools.__init__` now re-exports only what still
imports cleanly. `find_media_imdb_id` (already broken before this
branch — name no longer exported by `tools.api`) was also dropped
from the package re-exports.
### Added ### Added
- **`.alfred` v2 — Phase 4: v2-shaped `rescan_show` + new
`rescan_movie` + index anchor-warning + `tmdb_cache_ttl_days`
setting.** Fourth and final structural phase of
`specs/dot_alfred_v2.md` on branch `refactor/dot-alfred-v2`. The TV
+ movie rescan orchestrators now write v2 release aggregates
(`SeriesRelease` / `MovieRelease`) via the concrete v2
repositories; the library index keeps auto-healing from the new
sidecars on its next read (no TMDB call from rescan — that stays
Phase 5).
- **`rescan_show`** moves from `alfred/application/library/` to
`alfred/application/tv_shows/` (symmetry with the new
`alfred/application/movies/`). New signature:
`(show_root, *, tmdb_id: TmdbId, imdb_id: ImdbId | None = None,
series_repo, scanner, prober, kb) -> SeriesRelease`.
- **`rescan_movie`** (new — `alfred/application/movies/rescan.py`)
locates the main video via `find_video_file`, runs
`inspect_release` once, and writes the per-movie `.alfred`
sidecar. `added_at = datetime.now(UTC)` on every rescan (the
sidecar records reconciliation time, not filesystem mtime).
Raises `MovieRescanFailed` when no video is found in the folder.
- **PACK semantics in `rescan_show`**: a single-video + no-episode
season becomes `SeasonRelease(mode=PACK, folder=…, episodes=())`.
The slot map stays empty until the Phase 5 TMDB sync supplies
`episode_count` — no fabricated `EpisodeRange` lands in the
sidecar. *(Superseded by Phase 4b — see Fixed.)*
- **`Settings.tmdb_cache_ttl_days: int = 14`** — placeholder for the
Phase 5 TTL policy on library-index entries (`fetched_at + TTL`
drives refresh decisions).
- **Library-index anchor-mismatch warning** — both
`DotAlfredTVShowLibraryIndex` and `DotAlfredMovieLibraryIndex` now
cross-check each entry's `metadata.path` against the on-disk
folder layout right after a successful parse. Drift is logged as a
`WARNING` (one per missing folder, with `tmdb_id`); the heal path
stays silent by construction (it always synthesizes from real
folder names).
- **`.alfred` v2 — Phase 5: TMDB sync orchestrators.** Fifth phase
of `specs/dot_alfred_v2.md` on branch `refactor/dot-alfred-v2`.
Two new orchestrators refresh the library-root index's
TMDB-cached fields from on-disk truth + a single TMDB call:
- **`sync_show`** (`alfred/application/tv_shows/sync.py`) calls
`TMDBClient.get_tv_show_info`, loads the release via
`DotAlfredSeriesReleaseRepository.load_by_tmdb_id`, and upserts
the result into `DotAlfredTVShowLibraryIndex`. Honors
`Settings.tmdb_cache_ttl_days`; placeholder entries (auto-healed,
`status == "unknown"`) always refresh; `force=True` overrides
both gates. Raises `ShowNotFoundInLibrary` when neither index nor
sidecar carry `tmdb_id`. Indexed shows with a missing per-show
sidecar still get a fresh TMDB pass — slot map clears until
rescan repopulates it.
- **`sync_movie`** (`alfred/application/movies/sync.py`) is the
movie-side parallel. Placeholder signature is `name ==
metadata.path` (auto-heal copies the folder name into `name`;
the sidecar schema requires `name` non-empty so we can't use
`name == ""`). When the per-movie sidecar is gone but the
index entry remains, sync warns and returns the existing entry
unchanged (no upsert possible without a release).
- **`TmdbMovieInfo` DTO + `TMDBClient.get_movie_info`** — symmetric
to the existing `TmdbShowInfo` / `get_tv_show_info` pair. Carries
`tmdb_id`, `imdb_id`, `title`, and `release_year` (parsed from
TMDB's `release_date`).
- **`load_by_tmdb_id` on the v2 release repositories.** The series
repo returns `(SeriesRelease, show_folder_name)` so the sync
orchestrator can feed `DotAlfredTVShowLibraryIndex.upsert(...,
path=...)`; the movie repo returns `MovieRelease` alone (folder is
on `release.folder` already) and is provided as a semantic alias
of `find_by_tmdb_id` for symmetry.
- **`alfred/application/exceptions.py`** — new module for the two
shared `*NotFoundInLibrary` exceptions raised by the sync
orchestrators (`ShowNotFoundInLibrary`, `MovieNotFoundInLibrary`).
### Fixed
- **PACK vs EPISODIC classification (Phase 4b).** The Phase 4
walker + `rescan_show` logic classified seasons by parser output
(does the filename carry `Exx`?), but PACK vs EPISODIC is a
*structural* distinction:
- **PACK** = season folder with N flat `SxxEyy` videos.
- **EPISODIC** = season folder with N subfolders, each holding
one video.
The walker now descends two levels under `show_root` and
classifies per season folder. Mixed (flat + subfolders) is
malformed — warn and skip. `rescan_show` trusts the walker's
mode and stops conflating "single un-numbered video" with PACK
(that case is now skipped as malformed too). Tests rewritten
against the real model. Supersedes the PACK-semantics bullet
above in Added.
### Removed
- **v1 dot_alfred stack and its abstract domain ports.** Deleted
`alfred/infrastructure/persistence/dot_alfred/{bridge,repository,
serializer,sidecar}.py`, plus the
`alfred/domain/{tv_shows,movies}/repositories.py` ABCs
(`TVShowRepository` / `MovieRepository`) — zero callers after
Phase 4. `dot_alfred/__init__.py` is rewritten as a v2-only
re-export (four concrete repositories + `ShowFolderUnknown`).
- **`alfred/application/library/` package** (rescan + walker moved
to `alfred/application/tv_shows/`).
- The two Phase 3 module-level test skips
(`test_repository.py`, `test_serializer.py`) are lifted by
deleting the quarantined files.
- **`MediaWithTracks` mixin + `track_lang_matches` helper** in
`alfred.domain.shared.media`. Parked in Phase 4 pending a
Phase 5 decision; zero callers across `alfred/` and `tests/`
after the v2 aggregates landed, so both go.
### Internal
- **Suite**: 1233 → 1277 passing; 10 → 8 skips (only LLM-not-running
skips remain — the Phase 3 quarantines are gone with their files).
- Phase 5 cleanup sweep returns zero hits for `MediaWithTracks`,
v1 dot_alfred symbols, v1 sidecar names, and `alfred.application.
library` — the v2 surface is the only one left.
### Changed
- **`.alfred` v2 — Phase 3: `TVShow` / `Movie` aggregates become
TMDB-only.** Third phase of `specs/dot_alfred_v2.md` on branch
`refactor/dot-alfred-v2`. Filesystem-side concerns (file paths,
tracks, quality, mode, `added_at`) move to the `releases/` domain
added in Phase 1; the TMDB aggregates now carry only identity +
TMDB catalog facts.
- **`TVShow`** — `tmdb_id: TmdbId` is now the **required primary
key**; `imdb_id: ImdbId | None` is the optional secondary anchor.
Added `status: str = "unknown"` (raw TMDB string, default matches
the v2 library-index auto-heal placeholder). `episode_count`
aggregates the TMDB-cached counts on each `Season` (was: sum of
materialized `Episode` objects).
- **`Season`** — added `episode_count: int = 0` (TMDB-cached,
authoritative). **Removed**: `audio_tracks`, `subtitle_tracks`,
and the `mode` property (release mode now lives only on
`SeasonRelease.mode` — single source of truth).
- **`Episode`** — slimmed to identity + title. **Removed**:
`file_path`, `file_size`, `audio_tracks`, `subtitle_tracks`. The
`MediaWithTracks` mixin is no longer in `Episode`'s MRO; on-disk
facts live on the matching `EpisodeRelease` keyed by
`(season_number, episode_number)`.
- **`Movie`** — `tmdb_id: TmdbId` required, `imdb_id` optional.
**Removed**: `file_path`, `file_size`, `quality`, `added_at`,
`audio_tracks`, `subtitle_tracks`. `get_filename()` now returns
`"Title.Year"` (quality lives on `MovieRelease` and is appended
by a release-aware caller — Phase 4 wires this through
`MediaOrganizer`).
- **`TVShowBuilder` / `SeasonBuilder`** — constructor requires
`tmdb_id: TmdbId`; `imdb_id` and `status` are optional.
`SeasonBuilder.set_episode_count(int)` replaces the old
`set_audio_tracks` / `set_subtitle_tracks` (tracks no longer
persisted on `Season`).
- **`MovieRelease` carries `added_at: datetime`** (required).
Bumped `dot_alfred/v2` `SCHEMA_VERSION` from `1``2` to add
`added_at: datetime` to `MovieReleaseSidecar`. Round-trip via
Pydantic `mode="json"` (datetime ↔ ISO 8601 string). No migration
code shipped — no v2.1 sidecars exist in the wild yet.
- **No-coercion `TmdbId` contract.** `TVShow(tmdb_id=1396)` now raises
— callers pass `TmdbId(1396)`. Same for `imdb_id: ImdbId | None`
on `TVShow`/`Movie`. Honest type contract, no ergonomic shim.
### Removed
- `Season.mode` property (derive from `SeasonRelease.mode` instead).
- `Episode.file_path` / `file_size` / `audio_tracks` /
`subtitle_tracks`.
- `Movie.file_path` / `file_size` / `quality` / `added_at` /
`audio_tracks` / `subtitle_tracks`.
### Internal
- v1 dot_alfred package (`bridge.py`, `repository.py`,
`serializer.py`, `sidecar.py`), the abstract `TVShowRepository` /
`MovieRepository` ports typed against the pre-Phase-3 aggregates,
and `alfred/application/library/rescan.py` are **intentionally
left in tree as a known-red island**. Their tests
(`tests/infrastructure/persistence/dot_alfred/test_repository.py`,
`test_serializer.py`, `tests/application/library/test_rescan.py`)
are module-level skipped with a Phase 4 reference. Phase 4 rewrites
`rescan_show` / introduces `rescan_movie` on top of the v2
release repositories + library index, then deletes the v1 stack +
the abstract ports + the quarantined tests in one swing.
- Test suite: 1216 passed, 11 skipped (8 pre-existing + 3 Phase-3
quarantines), 4 xfailed. v2 round-trip tests now reference
`SCHEMA_VERSION` instead of hard-coded `1` for future-proofing.
### Added
- **`.alfred` v2 — Phase 2: new persistence package + TMDB client
extensions.** Second phase of `specs/dot_alfred_v2.md` on branch
`refactor/dot-alfred-v2`. The new
`alfred/infrastructure/persistence/dot_alfred/v2/` package ships
the full v2 sidecar stack while leaving v1 (and the existing
`TVShow` aggregate) untouched — Phase 3 is the cutover.
- **Pydantic DTOs** — `SeriesReleaseSidecar` /
`MovieReleaseSidecar` (per-item), `TVShowLibraryIndexSidecar` /
`MovieLibraryIndexSidecar` (library-root index). All built on a
common `_Strict` base (`extra="forbid"`, `frozen=True`) with a
`@model_validator` enforcing `schema_version == 1`.
- **Track entries** — `AudioTrackEntry` / `SubtitleEntry` (sidecar
cache shape, slimmed from the domain track types). `SubtitleEntry`
carries `is_forced` + `is_sdh` as explicit booleans (v1's
`type: "sdh"` overload is gone).
- **Serializer** — `read_yaml` / `atomic_write_yaml` helpers
centralize YAML I/O and atomic writes (`.tmp + os.replace`).
`SidecarSchemaError` wraps both YAML parse errors and Pydantic
validation errors for uniform catch-and-skip semantics.
- **Bridge** — lossless `domain ↔ sidecar` conversion for
`SeriesRelease` / `MovieRelease` (round-trippable, including
multi-episode ranges and `is_sdh` subtitles); one-way projection
for library-index entries (`show_index_entry_from`,
`movie_index_entry_from`) that flattens multi-episode files into
per-TMDB-slot maps in `seasons[*].episodes`.
- **Repositories** —
`DotAlfredSeriesReleaseRepository` /
`DotAlfredMovieReleaseRepository` walk `library_root/*/` with
log+skip on corruption; **`DotAlfredTVShowLibraryIndex`** /
**`DotAlfredMovieLibraryIndex`** auto-heal silently on missing or
corrupt index files by rebuilding from the per-item sidecars
(healed entries keep TMDB-cached fields as placeholders until the
next sync repopulates them). Writes are atomic and never auto-heal
(read paths handle that).
- **TMDB client extensions** — `TmdbSeasonInfo` / `TmdbShowInfo`
DTOs + `TMDBClient.get_tv_show_info(tmdb_id)` aggregating
`/tv/{id}` + `/tv/{id}/external_ids`. The parsing logic is a pure
function (`parse_tv_show_info`) testable without HTTP, with an
injectable reference date for deterministic `aired` flag tests.
- **`is_sdh` flag on `SubtitleTrack`.** Added to
`alfred/domain/shared/media.py::SubtitleTrack` to mirror ffprobe's
`hearing_impaired` disposition. Wired through the ffprobe layer
(`ffprobe_prober.py`) and the v2 sidecar bridge so SDH information
round-trips end-to-end. Defaults to `False` — backwards-compatible
for every existing caller.
- **37 v2 integration tests** on `tmp_path` covering round-trips
(domain ↔ sidecar ↔ YAML ↔ domain), atomic writes (no `.tmp`
leftovers), per-item log+skip on corruption / schema mismatch,
movie anchor-mismatch warning, full upsert / find / delete on both
library indexes, and the auto-heal path on missing / corrupt /
schema-mismatched index files. **16 TMDB DTO tests** for the new
`parse_tv_show_info` pure function.
- **`.alfred` v2 — Phase 1: new `releases/` domain.** First step of
`specs/dot_alfred_v2.md` on branch `refactor/dot-alfred-v2`. The
new `alfred/domain/releases/` package introduces a filesystem-only
bounded context separated from TMDB identity (the existing
`tv_shows` / `movies` domains). It hosts:
- **`EpisodeRange` VO** — covers single-episode files
(`EpisodeRange(E02, E02)`) and multi-episode files
(`EpisodeRange(E02, E04)` for `SxxE02E03E04.mkv`), with
`count()` / `numbers()` / `is_single()` helpers.
- **`ReleaseMode` enum** — `PACK` (N video files directly in the
season folder) vs `EPISODIC` (N sub-folders, one episode each);
classified by the walker, never re-derived.
- **Aggregates** — `TrackProfile`, `EpisodeRelease`,
`SeasonRelease` (with `episode_count()` summing each file's
range), `SeriesRelease`, `MovieRelease`. All frozen
dataclasses; mutation via `SeasonReleaseBuilder` /
`SeriesReleaseBuilder` (mirror the v1 `TVShowBuilder` pattern,
including `from_existing()` round-trip).
- **Abstract ports** — `SeriesReleaseRepository`,
`MovieReleaseRepository` (concrete `DotAlfred*` arrive in
Phase 2).
- **`TmdbId` VO** added to `alfred/domain/shared/value_objects.py`
(positive int, rejects bool/str/float — symmetry with `ImdbId`).
- 73 unit tests covering VO validation, entity invariants, builder
sort + overlap detection, and `from_existing()` round-trips. v1
code paths untouched at this stage; new domain coexists.
- **`rescan_show` orchestrator
(`alfred/application/library/rescan.py`).** Step 4 of the
`specs/dot_alfred.md` plan. Walks an Alfred-managed show folder,
runs the existing `inspect_release` pipeline on every video file it
finds, and assembles a frozen `TVShow` aggregate persisted via the
injected `TVShowRepository`. Reuses the release parser + ffprobe
path verbatim — no duplicated parse/probe logic at the library
layer. PACK vs EPISODIC inferred per season folder from the
on-disk file count + parser output: a single video whose name
carries no `Exx` token becomes a PACK season (tracks lifted to the
season-level `audio_tracks` / `subtitle_tracks`), anything else
becomes EPISODIC (one `Episode` per file). Episode paths are
stored relative to the show root for portability. Files that fail
to parse a season/episode number, or seasons with mixed numbers,
are logged and skipped — the orchestrator never raises. Embedded
subtitle tracks are captured from `ffprobe`; adjacent `.srt`
files, multi-episode entries (`S01E01E02`), and TMDB-driven PACK
detection are tracked as tech debt for a dedicated subtitles /
ShowTracker session. 7 integration tests on `tmp_path` with the
Foundation layout (S01 EPISODIC + S02 PACK) cover the round-trip
through the real `.alfred` repository.
- **Show tree walker (`alfred/application/library/walker.py`).**
Step 4a foundation. `walk_show(show_root, scanner, kb)` returns a
`ShowTree(show_root, season_folders=tuple[SeasonFolder, ...])`
pure structural snapshot, no parsing, no probing. Season folders
are detected by a `\bS\d{1,2}\b` token anywhere in the directory
name (release-style naming, no Plex `Season 01` / `Specials`
conventions). Video files are filtered against
`kb.video_extensions`; no recursion into sub-sub-folders. 11 unit
tests on `tmp_path` cover detection (case-insensitive, in-word
rejection), filtering (subs, NFO, sample files), and edge cases
(empty / missing show root).
- **Season-level audio/subtitle tracks
(`alfred/domain/tv_shows/entities.py`,
`alfred/domain/tv_shows/builders.py`).** `Season` now inherits
from `MediaWithTracks` and carries `audio_tracks` /
`subtitle_tracks` tuples (empty by default). Populated only in
PACK mode (the single release covering the whole season); empty in
EPISODIC mode where tracks live per-episode. `SeasonBuilder`
gains `set_audio_tracks()` / `set_subtitle_tracks()` and forwards
them through `from_existing()`. The bridge writes / reads them in
the PACK branch via shared `_synth_audio_tracks` /
`_synth_subtitle_tracks` helpers used for episodes too.
- **`DotAlfredTVShowRepository` — filesystem-backed implementation of
the `TVShowRepository` port
(`alfred/infrastructure/persistence/dot_alfred/repository.py`).**
Step 3 of the `specs/dot_alfred.md` plan. Reads and writes one
`.alfred` YAML file per show under a configurable `library_root`.
`save(show)` writes atomically (`.alfred.tmp` + `os.replace`) into a
folder that **must already exist** — the repository never invents a
folder name (the upstream `MediaOrganizer` is in charge of placing
files; the repo writes the sidecar next to them). `find_by_imdb_id` /
`find_all` walk `library_root/*/`, loading each readable sidecar;
folders without a sidecar return `None` / are skipped (no implicit
cold scan — that is the job of the upcoming `rescan_show` tool).
Corrupted YAML and schema violations are logged and skipped, never
raised, so a single bad folder does not break the rest of the
library. The repo keeps a tiny in-memory `imdb_id → folder_name`
index populated on every successful read/save, so subsequent saves
find the right destination without re-walking — useful when the show
folder name diverges from `show.get_folder_name()` (custom 1080p / 4K
variants). 20 integration tests on `tmp_path` cover the round-trip,
cold folder / unknown id returns, multi-show `find_all`, corrupted /
wrong-schema skipping, atomic write (no `.alfred.tmp` left behind),
overwrite, and folder-name fallbacks.
- **Sidecar ↔ TVShow bridge
(`alfred/infrastructure/persistence/dot_alfred/bridge.py`).**
`to_sidecar(show, folder_paths=...)` summarizes the rich domain
`AudioTrack` / `SubtitleTrack` to the sidecar's compact form (unique
audio languages in track order; subtitle entries derived from
`is_forced` and assumed `source="embedded"`). `from_sidecar(sidecar,
title=...)` reconstructs the domain `TVShow` with synthesized tracks
— one `AudioTrack` per language, one `SubtitleTrack` per entry, with
ffprobe-only fields (`codec`, `channels`, `channel_layout`) left as
`None`. The bridge is intentionally lossy on probe minutiae the
sidecar does not store; this is the documented trade-off from the
factual-only spec.
- **`.alfred` sidecar serializer
(`alfred/infrastructure/persistence/dot_alfred/`).** Implements step 2
of the `specs/dot_alfred.md` plan. Pure-dict in/out
(`serialize(sidecar) -> dict`, `deserialize(data) -> ShowSidecar`) —
YAML I/O lives in the repository layer (step 3) and is kept out for
trivial testability. Ships the DTOs that mirror the YAML schema
field-for-field (`ShowSidecar`, `SeasonSidecar`, `EpisodeSidecar`,
`SubtitleEntry`). The sidecar acts as a **scan cache**: it stores
only what is genuinely costly to recompute — folder/file paths
(skipping the FS walk) and probed track metadata (skipping ffprobe).
Release identifiers (group, source, quality, codec) live in folder
and file names and are derived on demand by the parser — they are
deliberately absent from the schema and rejected on deserialize. The
serializer is **strict on schema**: unknown keys at any level raise
`SidecarSchemaError`, missing required fields raise clearly, and
`bool` cannot sneak in as a season/episode number. Optional fields
(`tmdb_id`, empty `audio`/`subtitles`/`episodes`) are omitted from
the output rather than emitted as `null` / `[]`. Tests cover
round-trip equivalence (DTO → dict → DTO and DTO → YAML text → DTO),
the Foundation S01 PACK case (real-world fixture with mixed sub
types — superset captured at season scope), and a Breaking Bad S05
EPISODIC case. An on-disk `tmp_path` fixture recreates the Foundation
folder structure with placeholder files, ready to be reused by the
upcoming repository walk tests in step 3.
- **`TVShowBuilder` / `SeasonBuilder` — sole construction surface for the
TVShow aggregate** (`alfred/domain/tv_shows/builders.py`). The aggregate
is now fully frozen; building goes through a mutable scratchpad that
emits an immutable `TVShow` via `build()`. Both builders offer a
`from_existing()` classmethod to seed from a current frozen aggregate
and apply modifications. Episodes are emitted sorted by number within a
season, seasons sorted by number within the show.
- **`SeasonMode` enum** (`PACK` / `EPISODIC`) in
`alfred/domain/tv_shows/value_objects.py`. Computed at read time from
the season's structural shape (`Season.mode` property): a season with
no explicit episodes is `PACK` (a single release covering the whole
season), a season with episodes is `EPISODIC` (currently airing, one
release per episode). Never stored — the YAML sidecar encodes the
mode via the presence/absence of the `episodes:` block.
### Changed
- **TVShow aggregate is now frozen all the way down.** `TVShow`,
`Season` and `Episode` are all `@dataclass(frozen=True)`. Children
are stored as ordered tuples (`tuple[Season, ...]`,
`tuple[Episode, ...]`) sorted by their respective numbers, replacing
the previous mutable dicts. Lookup helpers `TVShow.get_season(n)` and
`Season.get_episode(n)` traverse the tuple lazily via `next()`. The
former `add_episode` / `add_season` mutation methods are gone — all
construction goes through `TVShowBuilder` / `SeasonBuilder`.
### Removed
- **ShowTracker-territory fields stripped from the TVShow aggregate.**
The aggregate now models only what the `.alfred` sidecar stores
(filesystem-observable facts + immutable identity). Dropped from the
domain:
- `TVShow.status` (`ShowStatus`) and the `ShowStatus` enum entirely,
along with its TMDB string mapping (`from_string`).
- `TVShow.expected_seasons`, `Season.expected_episodes`,
`Season.aired_episodes`, `Season.name`.
- `TVShow.collection_status()`, `is_complete_series()`,
`missing_episodes()`, `is_ongoing()`, `is_ended()` and the
`CollectionStatus` enum.
- `Season.is_complete()`, `is_fully_aired()`, `missing_episodes()`
and the `aired ≤ expected` validation.
- `TVShow.add_episode()` / `TVShow.add_season()` /
`Season.add_episode()` — replaced by the builder API.
These concerns will reappear in a dedicated `ShowTracker` layer (to
be designed) that combines the `.alfred` sidecar with live TMDB data
to answer questions like "is this show complete?" or "are new
episodes out?". Keeping volatile/derived state out of the aggregate
matches the factuel-only philosophy locked in `specs/dot_alfred.md`.
### Internal
- **Test suite rewritten for the new aggregate shape.**
`tests/domain/test_tv_shows.py` now covers frozen invariants, builder
ordering, last-write-wins on duplicates, `from_existing` round-trip,
and `SeasonMode` derivation. `tests/infrastructure/test_filesystem_extras.py`
helper simplified (no more `ShowStatus.ENDED` / `expected_seasons` on
test shows). 1078 tests still green.
- **Design doc for `.alfred/` sidecar persistence
(`specs/dot_alfred.md`).** First entry in the new `specs/` directory.
Specifies a per-show `.alfred/` directory holding a `show.yaml` and
one `season_NN.yaml` per season, used by the upcoming concrete
`TVShowRepository` to cache parse/probe results and avoid full
rescans on every library read. Covers schema, naming conventions,
cache invalidation strategy (size + mtime), self-healing on
drift, atomicity (`os.replace`), edge cases (legacy folders,
corrupted sidecars, manual file removal), and a phased
implementation plan. No code yet — spec only.
### Internal
- **`specs/` is now tracked.** The repo-level `.gitignore` had a
blanket `*.md` rule with only `CHANGELOG.md` allow-listed. Added
explicit exceptions for `/README.md` (root only — avoids
unintentionally exposing fixture READMEs) and `specs/**/*.md` so the
new design-doc directory ships with the project. Also added an
explicit `/.claude/` ignore line for the private dev-docs sub-repo
that sits inside the working tree but is versioned separately.
### Fixed
- **Multi-episode chain (e.g. `S14E09E10E11`) now collapses to a full
range.** The parser previously captured `episode=9, episode_end=10`
and dropped E11+. It now returns `episode=first, episode_end=last`,
with intermediate values implied. Fixture
`shitty/archer_multi_episode/` updated from anti-regression-of-bug
to anti-regression-of-fix.
- **Apostrophes in titles no longer push the release through the AI
fallback.** `Honey.Don't.2025.2160p.WEBRip.DSNP.DV.HDR.x265-Amen`
previously parsed with `parse_path="ai"` and everything UNKNOWN
because `'` is in the forbidden-chars list. Apostrophes are now
pre-stripped before the well-formed check, so the parse completes
normally (`title=Honey.Dont, year=2025, quality=2160p, ...`); only
the title text loses its apostrophe. `parse_path` becomes
`sanitized` to surface the cleanup. Side win: PoP fixture
`the_prodigy_full_chaos/` also moves from total failure to a
partially-correct parse (year, source, codec extracted).
- **Season-range markers (`Sxx-yy`) are now recognized as
`tv_complete`.** `Der.Tatortreiniger.S01-06.GERMAN...` previously
parsed as `media_type=movie` with `S01-06` glued onto the title.
The parser now recognizes the range, sets `season=first`,
`media_type=tv_complete`, and removes the marker from the title.
`is_season_pack` flips to `true`.
- **Pure-punctuation TITLE tokens are dropped at assembly.** Releases
with surrounding ` - ` separators (`Vinyl - 1x01 - FHD`) previously
produced `title="Vinyl.-"`. Such tokens (a stray dash, a wide pipe
``, …) carry no title content and are now filtered out. Side
effect: PoP fixture `khruangbin_yt_wide_pipe/` also benefits — the
YouTube wide-pipe no longer leaks into the title.
### Added
- **Fullwidth vertical bar `` (U+FF5C) is now a recognized release-name
token separator.** Added to `alfred/knowledge/release/separators.yaml`
so CJK release names (and the occasional decorative YouTube-style use)
tokenize cleanly instead of leaving the wide pipe glued onto an
adjacent token. The tokenizer in
`alfred/domain/release/parser/pipeline.py` already iterates the
separator list as plain strings (no regex), so a multi-byte UTF-8
separator works without any code change.
- **`InspectedResult.recommended_action` property** — derived hint that
collapses the orchestrator's go / wait / skip decision into a single
value (``"process"`` / ``"ask_user"`` / ``"skip"``). Centralizes the
exclusion logic that was previously dispersed across road /
media_type / main_video checks at each call site. Ordering is part of
the contract: ``skip`` (no main video, or media_type == ``"other"``)
wins over ``ask_user`` (media_type == ``"unknown"`` or road ==
``"path_of_pain"``) which wins over ``process``. Surfaced through the
``analyze_release`` tool so the LLM can route on it directly.
6 new tests in ``tests/application/test_inspect.py`` cover the four
branches and the precedence rules.
- **`LanguageRepository` port** in `alfred.domain.shared.ports`. Structural
Protocol covering `from_iso`, `from_any`, `all`, `__contains__`, `__len__`
— the surface previously coupled to the concrete `LanguageRegistry`.
Mirrors the `MediaProber` / `FilesystemScanner` pattern: domain code
depends on the Protocol, infrastructure provides the YAML-backed
adapter. Tests in `tests/infrastructure/test_language_registry.py`.
### Changed
- **`Movie` and `Episode` are now frozen dataclasses.** Both entities
hold their track collections as `tuple[AudioTrack, ...]` and
`tuple[SubtitleTrack, ...]` instead of mutable lists, and are
`@dataclass(frozen=True, eq=False)` (identity-based equality
preserved via `__eq__`/`__hash__`). `__post_init__` coercion uses
`object.__setattr__` for the `imdb_id` / `title` /
`season_number` / `episode_number` normalizations. To project
enrichment results (probe output, file metadata) callers now rebuild
via `dataclasses.replace(...)`. Pattern aligned with the recent
`ParsedRelease` freeze. `MediaWithTracks` mixin contract updated to
`tuple` accordingly. `Season` and `TVShow` remain mutable for now —
freezing the aggregate root would cascade a full reconstruction on
every `add_episode`, deferred.
- **`SubtitleCandidate` renamed to `SubtitleScanResult`.** The old name
conflated "this might become a placed subtitle" with "this is what a
scan pass produced". The class is the output of a scan/identify pass
— language/format may still be `None`, confidence reflects how sure
the classifier is, and `raw_tokens` holds the filename fragments
under analysis. `SubtitleScanResult` says that directly. Pure rename
with a refreshed docstring in `alfred/domain/subtitles/entities.py`;
no behavior change. Touches the domain entity + `__init__` export,
the matcher / identifier / utils services, the manage_subtitles use
case, the placer, the metadata store, the shared-media cross-ref
comment, and the seven test modules that imported the type.
- **`ParsedRelease` is now frozen; enrichment passes return new
instances.** The VO was mutable so `detect_media_type` and
`enrich_from_probe` could patch fields in place — a code smell in a
value object whose identity *is* its content. `ParsedRelease` is now
`@dataclass(frozen=True)`; `languages` is a `tuple[str, ...]`
instead of a `list[str]`. `enrich_from_probe` returns a new
`ParsedRelease` via `dataclasses.replace` (only allocates when at
least one field actually changed). `inspect_release` rebinds
`parsed` after both `detect_media_type` (wrapped in `MediaTypeToken`
to satisfy the strict isinstance check that now also runs on
replace) and `enrich_from_probe`. Parser pipeline now packs
`languages` as a tuple in the assemble dict. Callers updated:
`inspect_release`, `testing/recognize_folders_in_downloads.py`, and
the enrichment tests (22 call sites + language assertions switched
to tuple literals).
- **`resolve_destination` use cases take `kb` / `prober` as required
params; module-level singletons gone.** The four
`resolve_{season,episode,movie,series}_destination` use cases now
accept `kb: ReleaseKnowledge` and `prober: MediaProber` as required
arguments, matching the shape of `inspect_release`. The module-level
`_KB = YamlReleaseKnowledge()` and `_PROBER = FfprobeMediaProber()`
singletons that previously lived in
`alfred/application/filesystem/resolve_destination.py` are removed —
the application layer no longer reaches into infrastructure. The
singletons now live at the agent-tools frontier
(`alfred/agent/tools/filesystem.py`), where the LLM-facing wrappers
instantiate them once and thread them through. `analyze_release` no
longer needs the dirty `from ... import _KB` indirection. Tests
inject their own stubs by keyword (`prober=_StubProber(...)`) instead
of monkeypatching a module attribute.
- **`ParsePath` enum renamed to `TokenizationRoute`.** The old name
collided with `pathlib.Path` in code-reading mental models, and was
one letter from `parse_path` (the field that holds the value) — making
it harder than it needed to be to spot the type vs the attribute.
``TokenizationRoute`` says what it actually captures (DIRECT /
SANITIZED / AI = how the name reached the tokenizer), and the class
docstring now spells out the orthogonality with ``Road`` (EASY /
SHITTY / PATH_OF_PAIN, which captures parser confidence on
``ParseReport``). The ``parse_path`` field name stays unchanged —
string values too — so YAML fixtures, the ``analyze_release`` tool
spec, and any external consumer are untouched.
- **`enrich_from_probe` codec mappings moved to YAML.** The three
hard-coded module dicts (`_VIDEO_CODEC_MAP`, `_AUDIO_CODEC_MAP`,
`_CHANNEL_MAP`) translating ffprobe output to scene tokens
(`hevc → x265`, `eac3 → EAC3`, `8 → "7.1"`, …) now live in
`alfred/knowledge/release/probe_mappings.yaml` and are loaded into
`ReleaseKnowledge.probe_mappings` (new port field, populated by
`YamlReleaseKnowledge`). `enrich_from_probe` gains a third `kb`
parameter and reads the maps from there. Aligns with the CLAUDE.md
rule that lookup tables of domain knowledge belong in YAML, not in
Python — and opens the door to a future "learn new codec" pass.
Callers updated: `inspect_release`, `testing/recognize_folders_in_downloads.py`,
and all 22 sites in `tests/application/test_enrich_from_probe.py`.
- **`ParsedRelease.tech_string` is now a derived `@property`**
(`alfred/domain/release/value_objects.py`). It computes
`quality.source.codec` joined by dots on every access, so it stays in
sync with the underlying fields by construction. The stored field is
gone from the dataclass, the dict returned by `assemble()` no longer
carries the key, `parse_release`'s malformed-name fallback drops the
`tech_string=""` kwarg, and `enrich_from_probe` no longer re-derives
it after filling `quality`/`source`/`codec`. Closes the
parser/enrichment double-source-of-truth that `e79ca46` had to fix
reactively. The fixtures runner now injects `tech_string` alongside
`is_season_pack` since `asdict()` skips properties.
- **`RuleScope.level` is now an enum (`RuleScopeLevel`).** The set of
valid levels (global, release_group, movie, show, season, episode)
was documented only in a docstring comment and validated nowhere.
`RuleScopeLevel(str, Enum)` keeps wire compatibility (YAML
serialization, `.value` access) while making the closed set explicit
to type-checkers and IDEs. `to_dict()` emits `.value` strings so
YAML output is unchanged.
- **`FilePath` VO uses `__post_init__` instead of a hand-rolled
`__init__`.** Same public API (accepts `str | Path`), same behavior,
but the dataclass-generated `__init__` is no longer bypassed. One
less smell in the shared VOs.
- **`Language` VO is strict by default; `Language.from_raw()` factory
for normalization.** The previous `__post_init__` mutated `iso` and
`aliases` via `object.__setattr__` on a frozen dataclass — a code
smell hiding behind the dataclass facade. Split: the direct
constructor now rejects un-normalized input (uppercase iso,
whitespace in aliases, etc.), and `Language.from_raw()` handles
arbitrary YAML/user input. Only one caller (LanguageRegistry loading
the ISO YAML) needed migration.
- **`ParsedRelease.normalised` renamed to `clean`.** The field name
promised "dots instead of spaces" but in practice held
`raw - site_tag - apostrophes` — only used by `season_folder_name()`.
Renamed and docstring corrected.
- **`ParsedRelease.media_type` / `parse_path` are strict enums.** The
fields were already typed as `MediaTypeToken` / `ParsePath`, but a
tolerant `__post_init__` coerced raw strings. With both classes
being `(str, Enum)`, the coercion served no purpose. Strict
constructor; `.value` no longer passed at call sites; dropped the
unused `_VALID_MEDIA_TYPES` / `_VALID_PARSE_PATHS` lookup tables.
### Removed
- **`settings.min_movie_size_bytes`** — orphan Pydantic field +
validator. Its only consumer (`MovieService.validate_movie_file`)
had been removed during an earlier refactor. The "real movie vs
sample" rule now lives in extension-based exclusion
(`application/release/supported_media.py`) and PoP. If a size
threshold is ever needed, it'll go in a knowledge YAML, not in
`settings`.
### Internal
- **Flattened `alfred.domain.shared.media/` package into a single
`media.py` module.** The 6-file package (audio, video, subtitle,
info, matching, tracks_mixin + `__init__`) collapsed into one ~250
LoC module. All 12 import sites continue to resolve unchanged
(`from alfred.domain.shared.media import AudioTrack, MediaInfo, …`)
since Python treats `media.py` and `media/__init__.py`
interchangeably for import paths. Easier to scan when the whole
bounded-context fits on one screen.
- **`SubtitleKnowledgeBase` types `language_registry` against the
`LanguageRepository` port** instead of the concrete `LanguageRegistry`
class. The default constructor still instantiates the concrete adapter
when no repository is injected — behaviour is unchanged for existing
callers. Opens the door to in-memory fakes in future tests without
loading the full ISO 639 YAML.
- **Moved `detect_media_type` and `enrich_from_probe` from
`alfred.application.filesystem` to `alfred.application.release`**.
They are inspection-pipeline helpers — their natural home is next to
`inspect_release`, not next to the filesystem use cases. The move
also eliminates a circular-import workaround in
`resolve_destination.py`: `inspect_release` can now be imported at
module top instead of lazily inside `_resolve_parsed`. Public
surface is unchanged for callers that imported the helpers from
their full module paths (the only call sites — `inspect.py`, two
tests, one testing script — were updated in this commit).
### Added
- **`resolve_*_destination` use cases now consume `inspect_release`**.
`resolve_episode_destination` and `resolve_movie_destination` reuse
their existing `source_file` parameter as the inspection target;
`resolve_season_destination` and `resolve_series_destination` gain
a new **optional** `source_path` parameter (also threaded through
the tool wrappers and YAML specs). When the path exists, ffprobe
data fills tokens missing from the release name (e.g. quality) and
refreshes `tech_string`, so the destination folder / file names
end up more accurate. When the path is missing or absent (back-compat
callers), the use cases fall back to parse-only — same behavior as
before.
### Fixed
- **`enrich_from_probe` now refreshes `tech_string`** after filling
`quality` / `source` / `codec`. Previously the field stayed at its
parser-time value, so filename builders saw stale tech tokens even
after a successful probe. New `TestTechString` class in
`tests/application/test_enrich_from_probe.py` locks the behavior.
### Added
- **`inspect_release` orchestrator + `InspectedResult` VO**
(`alfred/application/release/inspect.py`). Single composition of the
four inspection layers: `parse_release` → `detect_media_type` (patches
`parsed.media_type`) → `find_main_video` (top-level scan) →
`prober.probe` + `enrich_from_probe` when a video exists and the
refined media type isn't in `{"unknown", "other"}`. Returns a frozen
`InspectedResult(parsed, report, source_path, main_video, media_info,
probe_used)` that downstream callers consume directly instead of
rebuilding the same chain. `kb` and `prober` are injected — no
module-level singletons. Never raises.
### Changed
- **`analyze_release` tool now delegates to `inspect_release`** — same
output shape, plus two new fields: `confidence` (0100) and `road`
(`"easy"` / `"shitty"` / `"path_of_pain"`) surfaced from the parser's
`ParseReport`. The tool spec (`specs/analyze_release.yaml`) documents
both fields so the LLM can route releases by confidence.
- **`MediaProber` port now covers full media probing**: added
`probe(video) -> MediaInfo | None` alongside the existing
`list_subtitle_streams`. `FfprobeMediaProber` (in
`alfred/infrastructure/probe/`) implements both methods and is now
the single adapter shelling out to `ffprobe`. The standalone
`alfred/infrastructure/filesystem/ffprobe.py` module was removed —
all callers (tools, testing scripts) instantiate
`FfprobeMediaProber` instead. Unblocks the upcoming
`inspect_release` orchestrator, which depends on the port.
### Removed
- `alfred/infrastructure/filesystem/ffprobe.py` (folded into the
`FfprobeMediaProber` adapter).
---
## [2026-05-20] — Release parser confidence scoring + exclusion
### Added
- **Pre-pipeline exclusion helpers** (`alfred/application/release/supported_media.py`):
`is_supported_video(path, kb)` (extension-only check against
`kb.video_extensions`) and `find_main_video(folder, kb)` (top-level
scan, lexicographically-first eligible file, returns `None` when no
video qualifies; accepts a bare file as folder for single-file
releases). No size threshold, no filename heuristics —
PATH_OF_PAIN handles the exotic cases. Foundation for the future
`inspect_release` orchestrator.
- **Release parser — parse-confidence scoring** (`alfred/domain/release/parser/scoring.py`,
`alfred/knowledge/release/scoring.yaml`). `parse_release` now returns
`(ParsedRelease, ParseReport)`. The new `ParseReport` frozen VO
carries a 0100 `confidence`, a `road` (`"easy"` / `"shitty"` /
`"path_of_pain"`), the residual UNKNOWN tokens, and the missing
critical fields. EASY is decided structurally (a group schema
matched); SHITTY vs PATH_OF_PAIN is decided by score against a
YAML-configurable cutoff (default 60). Weights and penalties also
live in `scoring.yaml` — title 30, media_type 20, year 15, season
10, episode 5, tech 5 each; penalty 5 per UNKNOWN token capped at
-30. `Road` is a new enum, distinct from `ParsePath` (which records
the tokenization route, not the confidence tier). `ReleaseKnowledge`
port gains a `scoring: dict` field.
### Changed
- **`parse_release` signature** is now `(name, kb) → tuple[ParsedRelease,
ParseReport]` instead of returning a bare `ParsedRelease`. Call
sites updated in `application/filesystem/resolve_destination.py` and
`agent/tools/filesystem.py`. Tests updated accordingly.
---
## [2026-05-20] — Release parser v2 (EASY + SHITTY)
### Added
- **Release parser v2 — EASY path live** (`alfred/domain/release/parser/`):
new annotate-based pipeline (tokenize → annotate → assemble) drives
releases from known groups. Exposes `Token` (frozen VO with `index` +
`role` + `extra`), `TokenRole` enum (structural/technical/meta families),
and `GroupSchema` / `SchemaChunk` value objects.
- `pipeline.tokenize`: string-ops separator split (no regex), strips
a `[site.tag]` prefix/suffix first.
- `pipeline.annotate`: detects the trailing group right-to-left
(priority to `codec-GROUP` shape, fallback to any non-source dashed
token), looks up its `GroupSchema`, then walks tokens and schema
chunks in lockstep — optional chunks that don't match are skipped,
mandatory mismatches abort EASY and return `None` so the caller can
fall back to SHITTY.
- `pipeline.assemble`: folds annotated tokens into a
`ParsedRelease`-compatible dict.
- `parse_release` (in `release.services`) tries the v2 EASY path first
and falls through to the legacy SHITTY heuristic on `None`. Legacy
SHITTY/PATH OF PAIN behavior is unchanged.
- Knowledge: `alfred/knowledge/release/release_groups/{kontrast,elite,
rarbg}.yaml` declare the canonical chunk order per group, loaded via
new `ReleaseKnowledge.group_schema(name)` port method.
- Tests in `tests/domain/release/test_parser_v2_{scaffolding,easy}.py`
cover token VOs, site-tag stripping, group detection, schema-driven
annotation (movie, TV episode, season pack with optional source),
and field assembly.
- **Release parser v2 — enricher pass** completes the EASY pipeline.
The structural schema walk now tolerates non-positional tokens
between chunks (instead of aborting on leftover tokens), and a second
pass tags them with audio / video-meta / edition / language roles.
Multi-token sequences from `audio.yaml`, `video.yaml`, `editions.yaml`
(e.g. `DTS.HD.MA`, `DV.HDR10`, `TrueHD.Atmos`, `DIRECTORS.CUT`) are
matched before single tokens. Channel layouts like `5.1` and `7.1`
(split into two tokens by the `.` separator) are detected as
consecutive pairs. Sequence members carry an `extra["sequence_member"]`
marker so `assemble` extracts the canonical value only from the
primary token. KONTRAST releases with audio / HDR / edition / language
metadata now produce a fully populated `ParsedRelease`.
- **Streaming distributor as a separate dimension** from encoding source.
New `alfred/knowledge/release/distributors.yaml` (NF, AMZN, DSNP, HMAX,
ATVP, HULU, PCOK, PMTP, CR) feeds a new `ReleaseKnowledge.distributors`
port field, a `TokenRole.DISTRIBUTOR` annotation, and a
`ParsedRelease.distributor` field. `WEB-DL` stays the source; the
platform that produced the release is now recorded distinctly. The
five entries (NF, AMZN, DSNP, HMAX, ATVP) were correspondingly removed
from `sources.yaml`.
- **Real-world release fixtures** under `tests/fixtures/releases/{easy,shitty,path_of_pain}/`, - **Real-world release fixtures** under `tests/fixtures/releases/{easy,shitty,path_of_pain}/`,
each documenting an expected `ParsedRelease` plus the future `routing` each documenting an expected `ParsedRelease` plus the future `routing`
(library / torrents / seed_hardlinks) for the upcoming `organize_media` (library / torrents / seed_hardlinks) for the upcoming `organize_media`
@@ -54,6 +918,22 @@ callers).
### Changed ### Changed
- **Release parser v2 — SHITTY simplified to dict-driven tagging**.
The legacy ~480-line heuristic block in `release/services.py` is gone;
`pipeline._annotate_shitty` does a single pass that looks each token
up in the kb buckets (resolutions / sources / codecs / distributors /
year / `SxxExx`) with first-match-wins semantics, and the leftmost
contiguous UNKNOWN run becomes the title. `annotate()` no longer
returns `None` — SHITTY is the always-on fallback when no group schema
matches. `services.py` shrunk from ~525 to ~85 lines. Four fixtures
(`deutschland_franchise_box`, `sleaford_yt_slug`,
`super_mario_bilingual`, `predator_space_separators` — the last one
moved from `shitty/` → `path_of_pain/`) are now marked
`pytest.mark.xfail(strict=False)` documenting PoP-grade pathologies
that SHITTY intentionally won't handle. `ReleaseFixture` grows an
`xfail_reason` field; the parametrized suite wires the xfail mark
automatically.
- **`parse_release` tokenizer is now data-driven**: it splits on any character - **`parse_release` tokenizer is now data-driven**: it splits on any character
listed in `separators.yaml` (regex character class) instead of `name.split(".")`. listed in `separators.yaml` (regex character class) instead of `name.split(".")`.
This makes YTS-style releases (`The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]`), This makes YTS-style releases (`The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]`),
+3 -3
View File
@@ -6,13 +6,13 @@ from collections.abc import AsyncGenerator
from pathlib import Path from pathlib import Path
from typing import Any from typing import Any
from alfred.infrastructure.metadata import MetadataStore from alfred.infrastructure.metadata_TO_CHECK import MetadataStore
from alfred.infrastructure.persistence import get_memory from alfred.infrastructure.persistence_TO_CHECK import get_memory
from alfred.settings import settings from alfred.settings import settings
from .prompt import PromptBuilder from .prompt import PromptBuilder
from .registry import Tool, make_tools from .registry import Tool, make_tools
from .workflows import WorkflowLoader from .workflows_TO_CHECK import WorkflowLoader
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
+3 -3
View File
@@ -3,12 +3,12 @@
import json import json
from typing import Any from typing import Any
from alfred.infrastructure.persistence import get_memory from alfred.infrastructure.persistence_TO_CHECK import get_memory
from alfred.infrastructure.persistence.memory import MemoryRegistry from alfred.infrastructure.persistence_TO_CHECK.memory import MemoryRegistry
from .expressions import build_expressions_context from .expressions import build_expressions_context
from .registry import Tool from .registry import Tool
from .workflows import WorkflowLoader from .workflows_TO_CHECK import WorkflowLoader
# Tools that are always available, regardless of workflow scope. # Tools that are always available, regardless of workflow scope.
# Kept small on purpose — the noyau is what the agent uses to either # Kept small on purpose — the noyau is what the agent uses to either
+6 -6
View File
@@ -6,8 +6,8 @@ from collections.abc import Callable
from dataclasses import dataclass from dataclasses import dataclass
from typing import Any from typing import Any
from .tools.spec import ToolSpec, ToolSpecError from .tools_TO_CHECK.spec import ToolSpec, ToolSpecError
from .tools.spec_loader import load_tool_specs from .tools_TO_CHECK.spec_loader import load_tool_specs
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
@@ -130,10 +130,10 @@ def make_tools(settings) -> dict[str, Tool]:
Returns: Returns:
Dictionary mapping tool names to Tool objects. Dictionary mapping tool names to Tool objects.
""" """
from .tools import api as api_tools # noqa: PLC0415 from .tools_TO_CHECK import api as api_tools # noqa: PLC0415
from .tools import filesystem as fs_tools # noqa: PLC0415 from .tools_TO_CHECK import filesystem as fs_tools # noqa: PLC0415
from .tools import language as lang_tools # noqa: PLC0415 from .tools_TO_CHECK import language as lang_tools # noqa: PLC0415
from .tools import workflow as wf_tools # noqa: PLC0415 from .tools_TO_CHECK import workflow as wf_tools # noqa: PLC0415
tool_functions = [ tool_functions = [
fs_tools.set_path_for_folder, fs_tools.set_path_for_folder,
-22
View File
@@ -1,22 +0,0 @@
"""Tools module - filesystem and API tools for the agent."""
from .api import (
add_torrent_by_index,
add_torrent_to_qbittorrent,
find_media_imdb_id,
find_torrent,
get_torrent_by_index,
)
from .filesystem import list_folder, set_path_for_folder
from .language import set_language
__all__ = [
"set_path_for_folder",
"list_folder",
"find_media_imdb_id",
"find_torrent",
"get_torrent_by_index",
"add_torrent_to_qbittorrent",
"add_torrent_by_index",
"set_language",
]
+23
View File
@@ -0,0 +1,23 @@
"""Tools module — agent-exposed wrappers.
Re-exports are intentionally minimal during the ``unfuck`` refactor.
Tool wiring (registry / specs / LLM-facing surface) is the last
chunk of work on this branch; until then, importers should reach
into the submodules directly (``alfred.agent.tools.filesystem``, …).
"""
from .api import (
add_torrent_by_index,
add_torrent_to_qbittorrent,
find_torrent,
get_torrent_by_index,
)
from .language import set_language
__all__ = [
"find_torrent",
"get_torrent_by_index",
"add_torrent_to_qbittorrent",
"add_torrent_by_index",
"set_language",
]
@@ -3,35 +3,47 @@
import logging import logging
from typing import Any from typing import Any
from alfred.application.movies import SearchMovieUseCase from alfred.application.movies_TO_CHECK import SearchMovieUseCase
from alfred.application.torrents import AddTorrentUseCase, SearchTorrentsUseCase from alfred.application.torrents_TO_CHECK import AddTorrentUseCase, SearchTorrentsUseCase
from alfred.infrastructure.api.knaben import knaben_client from alfred.application.tv_shows_TO_CHECK import SearchShowUseCase
from alfred.infrastructure.api.qbittorrent import qbittorrent_client from alfred.infrastructure.api_TO_CHECK.knaben import knaben_client
from alfred.infrastructure.api.tmdb import tmdb_client from alfred.infrastructure.api_TO_CHECK.qbittorrent import qbittorrent_client
from alfred.infrastructure.persistence import get_memory from alfred.infrastructure.api_TO_CHECK.tmdb import tmdb_client
from alfred.infrastructure.persistence_TO_CHECK import get_memory
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
def find_media_imdb_id(media_title: str) -> dict[str, Any]: def search_movies(media_title: str) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/find_media_imdb_id.yaml.""" """Thin tool wrapper — semantics live in alfred/agent/tools/specs/search_movies.yaml."""
use_case = SearchMovieUseCase(tmdb_client) use_case = SearchMovieUseCase(tmdb_client)
response = use_case.execute(media_title) response = use_case.execute(media_title)
result = response.to_dict() result = response.to_dict()
if result.get("status") == "ok": if result.get("status") == "ok":
memory = get_memory() memory = get_memory()
memory.stm.set_entity( memory.stm.set_entity("last_movie_search", {"hits": result.get("hits", [])})
"last_media_search", memory.stm.set_topic("searching_movie")
{ logger.debug(
"title": result.get("title"), f"Stored movie search result in STM: {len(result.get('hits', []))} hits"
"imdb_id": result.get("imdb_id"), )
"media_type": result.get("media_type"),
"tmdb_id": result.get("tmdb_id"), return result
},
def search_shows(show_title: str) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/search_shows.yaml."""
use_case = SearchShowUseCase(tmdb_client)
response = use_case.execute(show_title)
result = response.to_dict()
if result.get("status") == "ok":
memory = get_memory()
memory.stm.set_entity("last_show_search", {"hits": result.get("hits", [])})
memory.stm.set_topic("searching_show")
logger.debug(
f"Stored show search result in STM: {len(result.get('hits', []))} hits"
) )
memory.stm.set_topic("searching_media")
logger.debug(f"Stored media search result in STM: {result.get('title')}")
return result return result
@@ -1,4 +1,20 @@
"""Filesystem tools for folder management.""" """Filesystem tools for folder management.
Thin wrappers around the 5 atomic filesystem use cases
(``alfred.application.filesystem``) plus a few self-contained tools
(``analyze_release``, ``probe_media``, ``learn``, ).
Tools removed during the ``unfuck`` filesystem refactor to be
rewired in a later step:
- ``manage_subtitles`` (depends on the rewritten subtitle services)
- ``set_path_for_folder`` (no replacement use case yet)
- ``create_seed_links`` (flow has changed: hard-link straight to
library, no copy back; will be re-introduced per-file when the
organize-release workflow lands)
- ``resolve_season_destination`` / ``resolve_episode_destination``
/ ``resolve_movie_destination`` / ``resolve_series_destination``
(their use cases moved to ``_OLD`` files pending a rewrite)
"""
from pathlib import Path from pathlib import Path
from typing import Any from typing import Any
@@ -7,120 +23,136 @@ import yaml
import alfred as _alfred_pkg import alfred as _alfred_pkg
from alfred.application.filesystem import ( from alfred.application.filesystem import (
CreateSeedLinksUseCase, DirectoryRoots,
ListFolderUseCase, create_dir_use_case,
ManageSubtitlesUseCase, list_dir_use_case,
MoveMediaUseCase, move_file_use_case,
SetFolderPathUseCase,
) )
from alfred.application.filesystem.detect_media_type import detect_media_type from alfred.infrastructure.knowledge_TO_CHECK.release_kb import YamlReleaseKnowledge
from alfred.application.filesystem.enrich_from_probe import enrich_from_probe from alfred.infrastructure.metadata_TO_CHECK import MetadataStore
from alfred.application.filesystem.resolve_destination import ( from alfred.infrastructure.persistence_TO_CHECK import get_memory
resolve_episode_destination as _resolve_episode_destination, from alfred.infrastructure.probe_TO_CHECK import FfprobeMediaProber
)
from alfred.application.filesystem.resolve_destination import ( # Agent-tools frontier: this is the legitimate home for the singletons that
resolve_movie_destination as _resolve_movie_destination, # back every LLM-exposed wrapper. The use cases below take ``kb`` / ``prober``
) # as required params; tests inject their own stubs.
from alfred.application.filesystem.resolve_destination import ( _KB = YamlReleaseKnowledge()
resolve_season_destination as _resolve_season_destination, _PROBER = FfprobeMediaProber()
)
from alfred.application.filesystem.resolve_destination import (
resolve_series_destination as _resolve_series_destination,
)
from alfred.infrastructure.filesystem import FileManager, create_folder, move
from alfred.infrastructure.filesystem.ffprobe import probe
from alfred.infrastructure.filesystem.find_video import find_video_file
from alfred.infrastructure.metadata import MetadataStore
from alfred.infrastructure.persistence import get_memory
_LEARNED_ROOT = Path(_alfred_pkg.__file__).parent.parent / "data" / "knowledge" _LEARNED_ROOT = Path(_alfred_pkg.__file__).parent.parent / "data" / "knowledge"
class _RootsNotConfigured(Exception):
"""Raised when one of the 4 expected roots is missing from memory."""
def __init__(self, missing: list[str]):
super().__init__(f"Roots not configured: {missing}")
self.missing = missing
def _load_directory_roots() -> DirectoryRoots:
"""Build :class:`DirectoryRoots` from the persisted memory.
Reads:
- ``ltm.workspace.download`` ``downloads``
- ``ltm.workspace.torrent`` ``torrents``
- ``ltm.library_paths['movies']`` ``movies``
- ``ltm.library_paths['tv_shows']`` ``tv_shows``
Raises:
_RootsNotConfigured: if any of the four paths is unset.
"""
memory = get_memory()
downloads = memory.ltm.workspace.download
torrents = memory.ltm.workspace.torrent
movies = memory.ltm.library_paths.get("movies")
tv_shows = memory.ltm.library_paths.get("tv_shows")
missing: list[str] = []
if not downloads:
missing.append("downloads")
if not torrents:
missing.append("torrents")
if not movies:
missing.append("movies")
if not tv_shows:
missing.append("tv_shows")
if missing:
raise _RootsNotConfigured(missing)
return DirectoryRoots(
downloads=Path(downloads),
torrents=Path(torrents),
movies=Path(movies),
tv_shows=Path(tv_shows),
)
def _roots_error(exc: _RootsNotConfigured) -> dict[str, Any]:
return {
"status": "error",
"error": "roots_not_configured",
"message": (
f"Missing roots: {exc.missing}. "
"Configure them via /set_path before using filesystem tools."
),
}
# ---------------------------------------------------------------------------
# 5 atomic filesystem tools — thin wrappers over the use cases.
# ---------------------------------------------------------------------------
def list_folder(path: str) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/list_folder.yaml."""
try:
roots = _load_directory_roots()
except _RootsNotConfigured as e:
return _roots_error(e)
return list_dir_use_case(Path(path), roots).to_dict()
def create_directory(path: str) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/create_directory.yaml."""
try:
roots = _load_directory_roots()
except _RootsNotConfigured as e:
return _roots_error(e)
return create_dir_use_case(Path(path), roots).to_dict()
def move_media(source: str, destination: str) -> dict[str, Any]: def move_media(source: str, destination: str) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/move_media.yaml.""" """Thin tool wrapper — semantics live in alfred/agent/tools/specs/move_media.yaml."""
file_manager = FileManager() try:
use_case = MoveMediaUseCase(file_manager) roots = _load_directory_roots()
return use_case.execute(source, destination).to_dict() except _RootsNotConfigured as e:
return _roots_error(e)
return move_file_use_case(Path(source), Path(destination), roots).to_dict()
def move_to_destination(source: str, destination: str) -> dict[str, Any]: def move_to_destination(source: str, destination: str) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/move_to_destination.yaml.""" """Thin tool wrapper — semantics live in alfred/agent/tools/specs/move_to_destination.yaml.
parent = str(Path(destination).parent)
result = create_folder(parent) Convenience tool that creates the destination's parent directory
if result["status"] != "ok": if missing, then moves the file. Saves the LLM from having to
return result chain ``create_directory`` + ``move_media`` explicitly.
return move(source, destination) """
try:
roots = _load_directory_roots()
except _RootsNotConfigured as e:
return _roots_error(e)
dst = Path(destination)
mkdir_resp = create_dir_use_case(dst.parent, roots)
if mkdir_resp.status != "ok":
return mkdir_resp.to_dict()
return move_file_use_case(Path(source), dst, roots).to_dict()
def resolve_season_destination( # ---------------------------------------------------------------------------
release_name: str, # Self-contained tools — not impacted by the filesystem refactor.
tmdb_title: str, # ---------------------------------------------------------------------------
tmdb_year: int,
confirmed_folder: str | None = None,
) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_season_destination.yaml."""
return _resolve_season_destination(
release_name, tmdb_title, tmdb_year, confirmed_folder
).to_dict()
def resolve_episode_destination(
release_name: str,
source_file: str,
tmdb_title: str,
tmdb_year: int,
tmdb_episode_title: str | None = None,
confirmed_folder: str | None = None,
) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_episode_destination.yaml."""
return _resolve_episode_destination(
release_name,
source_file,
tmdb_title,
tmdb_year,
tmdb_episode_title,
confirmed_folder,
).to_dict()
def resolve_movie_destination(
release_name: str,
source_file: str,
tmdb_title: str,
tmdb_year: int,
) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_movie_destination.yaml."""
return _resolve_movie_destination(
release_name, source_file, tmdb_title, tmdb_year
).to_dict()
def resolve_series_destination(
release_name: str,
tmdb_title: str,
tmdb_year: int,
confirmed_folder: str | None = None,
) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_series_destination.yaml."""
return _resolve_series_destination(
release_name, tmdb_title, tmdb_year, confirmed_folder
).to_dict()
def create_seed_links(
library_file: str, original_download_folder: str
) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/create_seed_links.yaml."""
file_manager = FileManager()
use_case = CreateSeedLinksUseCase(file_manager)
return use_case.execute(library_file, original_download_folder).to_dict()
def manage_subtitles(source_video: str, destination_video: str) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/manage_subtitles.yaml."""
file_manager = FileManager()
use_case = ManageSubtitlesUseCase(file_manager)
return use_case.execute(source_video, destination_video).to_dict()
def learn(pack: str, category: str, key: str, values: list[str]) -> dict[str, Any]: def learn(pack: str, category: str, key: str, values: list[str]) -> dict[str, Any]:
@@ -180,32 +212,12 @@ def learn(pack: str, category: str, key: str, values: list[str]) -> dict[str, An
} }
def set_path_for_folder(folder_name: str, path_value: str) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/set_path_for_folder.yaml."""
file_manager = FileManager()
use_case = SetFolderPathUseCase(file_manager)
response = use_case.execute(folder_name, path_value)
return response.to_dict()
def analyze_release(release_name: str, source_path: str) -> dict[str, Any]: def analyze_release(release_name: str, source_path: str) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/analyze_release.yaml.""" """Thin tool wrapper — semantics live in alfred/agent/tools/specs/analyze_release.yaml."""
from alfred.application.filesystem.resolve_destination import _KB # noqa: PLC0415 from alfred.application.release_TO_CHECK import inspect_release # noqa: PLC0415
from alfred.domain.release.services import parse_release # noqa: PLC0415
path = Path(source_path)
parsed = parse_release(release_name, _KB)
parsed.media_type = detect_media_type(parsed, path, _KB)
probe_used = False
if parsed.media_type not in ("unknown", "other"):
video_file = find_video_file(path, _KB)
if video_file:
media_info = probe(video_file)
if media_info:
enrich_from_probe(parsed, media_info)
probe_used = True
result = inspect_release(release_name, Path(source_path), _KB, _PROBER)
parsed = result.parsed
return { return {
"status": "ok", "status": "ok",
"media_type": parsed.media_type, "media_type": parsed.media_type,
@@ -227,7 +239,10 @@ def analyze_release(release_name: str, source_path: str) -> dict[str, Any]:
"edition": parsed.edition, "edition": parsed.edition,
"site_tag": parsed.site_tag, "site_tag": parsed.site_tag,
"is_season_pack": parsed.is_season_pack, "is_season_pack": parsed.is_season_pack,
"probe_used": probe_used, "probe_used": result.probe_used,
"confidence": result.report.confidence,
"road": result.report.road,
"recommended_action": result.recommended_action,
} }
@@ -241,7 +256,7 @@ def probe_media(source_path: str) -> dict[str, Any]:
"message": f"{source_path} does not exist", "message": f"{source_path} does not exist",
} }
media_info = probe(path) media_info = _PROBER.probe(path)
if media_info is None: if media_info is None:
return { return {
"status": "error", "status": "error",
@@ -285,14 +300,6 @@ def probe_media(source_path: str) -> dict[str, Any]:
} }
def list_folder(folder_type: str, path: str = ".") -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/list_folder.yaml."""
file_manager = FileManager()
use_case = ListFolderUseCase(file_manager)
response = use_case.execute(folder_type, path)
return response.to_dict()
def read_release_metadata(release_path: str) -> dict[str, Any]: def read_release_metadata(release_path: str) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/read_release_metadata.yaml.""" """Thin tool wrapper — semantics live in alfred/agent/tools/specs/read_release_metadata.yaml."""
path = Path(release_path) path = Path(release_path)
@@ -3,7 +3,7 @@
import logging import logging
from typing import Any from typing import Any
from alfred.infrastructure.persistence import get_memory from alfred.infrastructure.persistence_TO_CHECK import get_memory
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
@@ -80,3 +80,6 @@ returns:
site_tag: Source-site tag if present. site_tag: Source-site tag if present.
is_season_pack: True when the folder contains a full season. is_season_pack: True when the folder contains a full season.
probe_used: True when ffprobe successfully enriched the result. probe_used: True when ffprobe successfully enriched the result.
confidence: Parser confidence score, 0100 (higher = more reliable).
road: "Parser road: 'easy' (group schema matched), 'shitty' (heuristic but acceptable), or 'path_of_pain' (low confidence — ask the user before auto-routing)."
recommended_action: "Orchestrator hint: 'process' (go straight to resolve_*_destination), 'ask_user' (media_type unknown or road=path_of_pain — confirm with the user first), or 'skip' (no main video, or media_type=other — nothing to organize)."
@@ -61,6 +61,17 @@ parameters:
one. one.
example: Oz.1997.1080p.WEBRip.x265-KONTRAST example: Oz.1997.1080p.WEBRip.x265-KONTRAST
source_path:
description: |
Absolute path to the release folder on disk. Optional.
why_needed: |
When provided, the tool runs ffprobe on the main video inside the
folder and uses the probe data to fill quality/codec tokens that
may be missing from the release name. The enriched tech tokens
end up in the destination folder name, so providing source_path
gives more accurate names for releases with sparse metadata.
example: /downloads/Oz.S03.1080p.WEBRip.x265-KONTRAST
returns: returns:
ok: ok:
description: Paths resolved unambiguously; ready to move. description: Paths resolved unambiguously; ready to move.
@@ -56,6 +56,16 @@ parameters:
Forces the use case to use this exact folder name and skip detection. Forces the use case to use this exact folder name and skip detection.
example: The.Wire.2002.1080p.BluRay.x265-GROUP example: The.Wire.2002.1080p.BluRay.x265-GROUP
source_path:
description: |
Absolute path to the release folder on disk. Optional.
why_needed: |
When provided, the tool runs ffprobe on the main video inside the
folder and uses probe data to fill quality/codec tokens that may
be missing from the release name, producing a more accurate
destination folder name.
example: /downloads/The.Wire.S01-S05.1080p.BluRay.x265-GROUP
returns: returns:
ok: ok:
description: Path resolved; ready to move the pack. description: Path resolved; ready to move the pack.
@@ -9,9 +9,9 @@ to reason over the full set.
import logging import logging
from typing import Any from typing import Any
from alfred.infrastructure.persistence import get_memory from alfred.infrastructure.persistence_TO_CHECK import get_memory
from ..workflows import WorkflowLoader from ..workflows_TO_CHECK import WorkflowLoader
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
+1 -1
View File
@@ -15,7 +15,7 @@ from alfred.agent.agent import Agent
from alfred.agent.llm.deepseek import DeepSeekClient from alfred.agent.llm.deepseek import DeepSeekClient
from alfred.agent.llm.exceptions import LLMAPIError, LLMConfigurationError from alfred.agent.llm.exceptions import LLMAPIError, LLMConfigurationError
from alfred.agent.llm.ollama import OllamaClient from alfred.agent.llm.ollama import OllamaClient
from alfred.infrastructure.persistence import get_memory, init_memory from alfred.infrastructure.persistence_TO_CHECK import get_memory, init_memory
from alfred.settings import settings from alfred.settings import settings
logging.basicConfig( logging.basicConfig(
+26
View File
@@ -0,0 +1,26 @@
"""Application-layer exceptions shared across orchestrators.
Kept in a dedicated module (rather than inside each orchestrator's
file) because the sync flows for TV shows and movies raise structurally
identical "not found in library" errors — pulling them out makes the
shared semantics explicit and avoids cross-imports between the
``tv_shows`` and ``movies`` packages.
"""
from __future__ import annotations
class ShowNotFoundInLibrary(LookupError):
"""Raised when no on-disk TV show carries the requested ``tmdb_id``.
The sync orchestrator raises this when both the library index and
the per-show release repository return ``None`` for a lookup —
there is nothing on disk to refresh TMDB facts against.
"""
class MovieNotFoundInLibrary(LookupError):
"""Raised when no on-disk movie carries the requested ``tmdb_id``.
Symmetric to :class:`ShowNotFoundInLibrary` for the movies library.
"""
+36 -41
View File
@@ -1,47 +1,42 @@
"""Filesystem use cases.""" """Filesystem application layer — 5 atomic use cases as free functions.
from .create_seed_links import CreateSeedLinksUseCase Each use case:
- accepts :class:`pathlib.Path` inputs plus a :class:`DirectoryRoots` VO,
- guards inputs against escaping configured roots,
- calls the matching infra op,
- catches :class:`~alfred.infrastructure.filesystem.FilesystemError` and
returns a frozen DTO with a normalized error code.
No global state, no ``get_memory()``. Roots are injected.
"""
from .create_dir import create_dir_use_case
from .directory_roots import DirectoryRoots
from .dto import ( from .dto import (
CreateSeedLinksResponse, CreateDirResponse,
ListFolderResponse, LinkFileResponse,
ManageSubtitlesResponse, ListDirResponse,
MoveMediaResponse, MoveDirResponse,
PlacedSubtitle, MoveFileResponse,
SetFolderPathResponse,
) )
from .list_folder import ListFolderUseCase from .link_file import link_file_use_case
from .manage_subtitles import ManageSubtitlesUseCase from .list_dir import list_dir_use_case
from .move_media import MoveMediaUseCase from .move_dir import move_dir_use_case
from .resolve_destination import ( from .move_file import move_file_use_case
ResolvedEpisodeDestination,
ResolvedMovieDestination,
ResolvedSeasonDestination,
ResolvedSeriesDestination,
resolve_episode_destination,
resolve_movie_destination,
resolve_season_destination,
resolve_series_destination,
)
from .set_folder_path import SetFolderPathUseCase
__all__ = [ __all__ = [
"SetFolderPathUseCase", # use cases
"ListFolderUseCase", "list_dir_use_case",
"CreateSeedLinksUseCase", "create_dir_use_case",
"MoveMediaUseCase", "link_file_use_case",
"ManageSubtitlesUseCase", "move_file_use_case",
"ResolvedSeasonDestination", "move_dir_use_case",
"ResolvedEpisodeDestination", # VO
"ResolvedMovieDestination", "DirectoryRoots",
"ResolvedSeriesDestination", # DTOs
"resolve_season_destination", "ListDirResponse",
"resolve_episode_destination", "CreateDirResponse",
"resolve_movie_destination", "LinkFileResponse",
"resolve_series_destination", "MoveFileResponse",
"SetFolderPathResponse", "MoveDirResponse",
"ListFolderResponse",
"CreateSeedLinksResponse",
"MoveMediaResponse",
"ManageSubtitlesResponse",
"PlacedSubtitle",
] ]
+41
View File
@@ -0,0 +1,41 @@
"""Internal helpers: mapping infra exceptions → error codes.
Kept private (``_errors``) — only the 5 use cases in this package use
it. Centralizes the exception → code translation so every use case
returns consistent error payloads.
"""
from __future__ import annotations
from alfred.infrastructure.filesystem import (
CrossDevice,
DestinationExists,
FilesystemError,
FilesystemOSError,
NotADirectory,
NotAFile,
PermissionDenied,
SourceNotFound,
)
# Application-layer error codes (guard violations, not infra).
PATH_NOT_ALLOWED = "path_not_allowed"
def code_for(exc: FilesystemError) -> str:
"""Return the snake-case error code for an infra exception."""
if isinstance(exc, SourceNotFound):
return "source_not_found"
if isinstance(exc, DestinationExists):
return "destination_exists"
if isinstance(exc, NotADirectory):
return "not_a_directory"
if isinstance(exc, NotAFile):
return "not_a_file"
if isinstance(exc, PermissionDenied):
return "permission_denied"
if isinstance(exc, CrossDevice):
return "cross_device"
if isinstance(exc, FilesystemOSError):
return "filesystem_os_error"
return "filesystem_error"
@@ -0,0 +1,33 @@
"""create_dir use case — create a directory under one of the configured roots."""
from __future__ import annotations
from pathlib import Path
from alfred.infrastructure.filesystem import FilesystemError, create_dir
from ._errors import PATH_NOT_ALLOWED, code_for
from .directory_roots import DirectoryRoots
from .dto import CreateDirResponse
def create_dir_use_case(path: Path, roots: DirectoryRoots) -> CreateDirResponse:
"""Create directory ``path`` (and any missing parents) provided it
lives under one of the configured roots.
Idempotent on the infra side: re-running on an existing directory
returns ``status="ok"``.
"""
if not roots.contains(path):
return CreateDirResponse(
status="error",
error=PATH_NOT_ALLOWED,
message=f"Path is outside configured roots: {path}",
)
try:
create_dir(path)
except FilesystemError as e:
return CreateDirResponse(status="error", error=code_for(e), message=str(e))
return CreateDirResponse(status="ok", path=path)
@@ -3,7 +3,7 @@
import logging import logging
from alfred.infrastructure.filesystem import FileManager from alfred.infrastructure.filesystem import FileManager
from alfred.infrastructure.persistence import get_memory from alfred.infrastructure.persistence_TO_CHECK import get_memory
from .dto import CreateSeedLinksResponse from .dto import CreateSeedLinksResponse
@@ -0,0 +1,56 @@
"""DirectoryRoots — VO carrying the configured filesystem roots.
Replaces the ad-hoc ``get_memory().ltm.workspace.<x>`` lookups that were
sprinkled across the filesystem use cases. By making roots an explicit
input, use cases become pure (no global state read) and easy to test.
The roots are read once at the tool wrapper boundary (where the agent
config lives) and threaded through the use cases.
"""
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
@dataclass(frozen=True)
class DirectoryRoots:
"""Configured roots of Alfred's filesystem.
All paths must be absolute and existing directories — validation is
expected at the boundary that builds this VO.
Attributes:
downloads: where qBittorrent drops finished torrents.
torrents: where seeding hard-links live (mirrors downloads/).
movies: library root for movies.
tv_shows: library root for TV shows.
"""
downloads: Path
torrents: Path
movies: Path
tv_shows: Path
def all(self) -> tuple[Path, ...]:
"""Return every configured root, in declaration order."""
return (self.downloads, self.torrents, self.movies, self.tv_shows)
def contains(self, path: Path) -> bool:
"""Return True if ``path`` is inside one of the configured roots.
Uses ``Path.resolve()`` to handle symlinks and ``..`` segments,
then ``relative_to`` for an exact within-root check.
"""
try:
resolved = path.resolve()
except OSError:
return False
for root in self.all():
try:
resolved.relative_to(root.resolve())
return True
except (ValueError, OSError):
continue
return False
+62 -164
View File
@@ -1,19 +1,28 @@
"""Filesystem application DTOs.""" """DTOs for the 5 atomic filesystem use cases.
Each use case returns a small frozen dataclass tagged with a ``status``
field. On error, ``error`` (machine-readable code) and ``message``
(human-readable) are populated; on success, the relevant payload
fields are.
Error codes mirror the infrastructure exception types (lowercased,
snake-cased) — e.g. ``SourceNotFound`` → ``"source_not_found"`` — plus
the application-layer ``"path_not_allowed"`` for guard violations.
"""
from __future__ import annotations from __future__ import annotations
from dataclasses import dataclass from dataclasses import dataclass, field
from pathlib import Path
@dataclass @dataclass(frozen=True)
class CopyMediaResponse: class ListDirResponse:
"""Response from copying a media file.""" """Response from ``list_dir_use_case``."""
status: str status: str # "ok" | "error"
source: str | None = None path: Path | None = None
destination: str | None = None entries: tuple[Path, ...] = ()
filename: str | None = None
size: int | None = None
error: str | None = None error: str | None = None
message: str | None = None message: str | None = None
@@ -22,22 +31,33 @@ class CopyMediaResponse:
return {"status": self.status, "error": self.error, "message": self.message} return {"status": self.status, "error": self.error, "message": self.message}
return { return {
"status": self.status, "status": self.status,
"source": self.source, "path": str(self.path) if self.path else None,
"destination": self.destination, "entries": [str(p) for p in self.entries],
"filename": self.filename,
"size": self.size,
} }
@dataclass @dataclass(frozen=True)
class MoveMediaResponse: class CreateDirResponse:
"""Response from moving a media file.""" """Response from ``create_dir_use_case``."""
status: str status: str
source: str | None = None path: Path | None = None
destination: str | None = None error: str | None = None
filename: str | None = None message: str | None = None
size: int | None = None
def to_dict(self) -> dict:
if self.error:
return {"status": self.status, "error": self.error, "message": self.message}
return {"status": self.status, "path": str(self.path) if self.path else None}
@dataclass(frozen=True)
class LinkFileResponse:
"""Response from ``link_file_use_case``."""
status: str
source: Path | None = None
destination: Path | None = None
error: str | None = None error: str | None = None
message: str | None = None message: str | None = None
@@ -46,125 +66,18 @@ class MoveMediaResponse:
return {"status": self.status, "error": self.error, "message": self.message} return {"status": self.status, "error": self.error, "message": self.message}
return { return {
"status": self.status, "status": self.status,
"source": self.source, "source": str(self.source) if self.source else None,
"destination": self.destination, "destination": str(self.destination) if self.destination else None,
"filename": self.filename,
"size": self.size,
} }
@dataclass @dataclass(frozen=True)
class SetFolderPathResponse: class MoveFileResponse:
"""Response from setting a folder path.""" """Response from ``move_file_use_case``."""
status: str status: str
folder_name: str | None = None source: Path | None = None
path: str | None = None destination: Path | None = None
error: str | None = None
message: str | None = None
def to_dict(self):
"""Convert to dict for agent compatibility."""
result = {"status": self.status}
if self.error:
result["error"] = self.error
result["message"] = self.message
else:
if self.folder_name:
result["folder_name"] = self.folder_name
if self.path:
result["path"] = self.path
return result
@dataclass
class PlacedSubtitle:
"""One subtitle file successfully placed."""
source: str
destination: str
filename: str
def to_dict(self) -> dict:
return {
"source": self.source,
"destination": self.destination,
"filename": self.filename,
}
@dataclass
class UnresolvedTrack:
"""A subtitle track that needs agent clarification before placement."""
raw_tokens: list[str]
file_path: str | None = None
file_size_kb: float | None = None
reason: str = "" # "unknown_language" | "low_confidence"
def to_dict(self) -> dict:
return {
"raw_tokens": self.raw_tokens,
"file_path": self.file_path,
"file_size_kb": self.file_size_kb,
"reason": self.reason,
}
@dataclass
class AvailableSubtitle:
"""One subtitle track available on an embedded media item."""
language: str # ISO 639-2 code
subtitle_type: str # "standard" | "sdh" | "forced" | "unknown"
def to_dict(self) -> dict:
return {"language": self.language, "type": self.subtitle_type}
@dataclass
class ManageSubtitlesResponse:
"""Response from the manage_subtitles use case."""
status: str # "ok" | "needs_clarification" | "error"
video_path: str | None = None
placed: list[PlacedSubtitle] | None = None
skipped_count: int = 0
unresolved: list[UnresolvedTrack] | None = None
available: list[AvailableSubtitle] | None = None # embedded tracks summary
error: str | None = None
message: str | None = None
def to_dict(self) -> dict:
if self.error:
return {"status": self.status, "error": self.error, "message": self.message}
result = {
"status": self.status,
"video_path": self.video_path,
"placed": [p.to_dict() for p in (self.placed or [])],
"placed_count": len(self.placed or []),
"skipped_count": self.skipped_count,
}
if self.unresolved:
result["unresolved"] = [u.to_dict() for u in self.unresolved]
result["unresolved_count"] = len(self.unresolved)
if self.available:
result["available"] = [a.to_dict() for a in self.available]
return result
@dataclass
class CreateSeedLinksResponse:
"""Response from creating seed links for a torrent."""
status: str
torrent_subfolder: str | None = None
linked_file: str | None = None
copied_files: list[str] | None = None
copied_count: int = 0
skipped: list[str] | None = None
error: str | None = None error: str | None = None
message: str | None = None message: str | None = None
@@ -173,41 +86,26 @@ class CreateSeedLinksResponse:
return {"status": self.status, "error": self.error, "message": self.message} return {"status": self.status, "error": self.error, "message": self.message}
return { return {
"status": self.status, "status": self.status,
"torrent_subfolder": self.torrent_subfolder, "source": str(self.source) if self.source else None,
"linked_file": self.linked_file, "destination": str(self.destination) if self.destination else None,
"copied_files": self.copied_files or [],
"copied_count": self.copied_count,
"skipped": self.skipped or [],
} }
@dataclass @dataclass(frozen=True)
class ListFolderResponse: class MoveDirResponse:
"""Response from listing a folder.""" """Response from ``move_dir_use_case``."""
status: str status: str
folder_type: str | None = None source: Path | None = None
path: str | None = None destination: Path | None = None
entries: list[str] | None = None
count: int | None = None
error: str | None = None error: str | None = None
message: str | None = None message: str | None = None
def to_dict(self): def to_dict(self) -> dict:
"""Convert to dict for agent compatibility."""
result = {"status": self.status}
if self.error: if self.error:
result["error"] = self.error return {"status": self.status, "error": self.error, "message": self.message}
result["message"] = self.message return {
else: "status": self.status,
if self.folder_type: "source": str(self.source) if self.source else None,
result["folder_type"] = self.folder_type "destination": str(self.destination) if self.destination else None,
if self.path: }
result["path"] = self.path
if self.entries is not None:
result["entries"] = self.entries
if self.count is not None:
result["count"] = self.count
return result
+188
View File
@@ -0,0 +1,188 @@
"""Filesystem application DTOs."""
from __future__ import annotations
from dataclasses import dataclass
@dataclass
class CopyMediaResponse:
"""Response from copying a media file."""
status: str
source: str | None = None
destination: str | None = None
filename: str | None = None
size: int | None = None
error: str | None = None
message: str | None = None
def to_dict(self) -> dict:
if self.error:
return {"status": self.status, "error": self.error, "message": self.message}
return {
"status": self.status,
"source": self.source,
"destination": self.destination,
"filename": self.filename,
"size": self.size,
}
@dataclass
class MoveMediaResponse:
"""Response from moving a media file."""
status: str
source: str | None = None
destination: str | None = None
filename: str | None = None
size: int | None = None
error: str | None = None
message: str | None = None
def to_dict(self) -> dict:
if self.error:
return {"status": self.status, "error": self.error, "message": self.message}
return {
"status": self.status,
"source": self.source,
"destination": self.destination,
"filename": self.filename,
"size": self.size,
}
@dataclass
class PlacedSubtitle:
"""One subtitle file successfully placed."""
source: str
destination: str
filename: str
def to_dict(self) -> dict:
return {
"source": self.source,
"destination": self.destination,
"filename": self.filename,
}
@dataclass
class UnresolvedTrack:
"""A subtitle track that needs agent clarification before placement."""
raw_tokens: list[str]
file_path: str | None = None
file_size_kb: float | None = None
reason: str = "" # "unknown_language" | "low_confidence"
def to_dict(self) -> dict:
return {
"raw_tokens": self.raw_tokens,
"file_path": self.file_path,
"file_size_kb": self.file_size_kb,
"reason": self.reason,
}
@dataclass
class AvailableSubtitle:
"""One subtitle track available on an embedded media item."""
language: str # ISO 639-2 code
subtitle_type: str # "standard" | "sdh" | "forced" | "unknown"
def to_dict(self) -> dict:
return {"language": self.language, "type": self.subtitle_type}
@dataclass
class ManageSubtitlesResponse:
"""Response from the manage_subtitles use case."""
status: str # "ok" | "needs_clarification" | "error"
video_path: str | None = None
placed: list[PlacedSubtitle] | None = None
skipped_count: int = 0
unresolved: list[UnresolvedTrack] | None = None
available: list[AvailableSubtitle] | None = None # embedded tracks summary
error: str | None = None
message: str | None = None
def to_dict(self) -> dict:
if self.error:
return {"status": self.status, "error": self.error, "message": self.message}
result = {
"status": self.status,
"video_path": self.video_path,
"placed": [p.to_dict() for p in (self.placed or [])],
"placed_count": len(self.placed or []),
"skipped_count": self.skipped_count,
}
if self.unresolved:
result["unresolved"] = [u.to_dict() for u in self.unresolved]
result["unresolved_count"] = len(self.unresolved)
if self.available:
result["available"] = [a.to_dict() for a in self.available]
return result
@dataclass
class CreateSeedLinksResponse:
"""Response from creating seed links for a torrent."""
status: str
torrent_subfolder: str | None = None
linked_file: str | None = None
copied_files: list[str] | None = None
copied_count: int = 0
skipped: list[str] | None = None
error: str | None = None
message: str | None = None
def to_dict(self) -> dict:
if self.error:
return {"status": self.status, "error": self.error, "message": self.message}
return {
"status": self.status,
"torrent_subfolder": self.torrent_subfolder,
"linked_file": self.linked_file,
"copied_files": self.copied_files or [],
"copied_count": self.copied_count,
"skipped": self.skipped or [],
}
@dataclass
class ListFolderResponse:
"""Response from listing a folder."""
status: str
folder_type: str | None = None # SHOULD BE A PROPERTY
path: str | None = None # NOT NONE - Should be path
entries: list[str] | None = None # NOT NONE - Empty list of path
count: int | None = None # USELESS
error: str | None = None
message: str | None = None
def to_dict(self):
"""Convert to dict for agent compatibility."""
result = {"status": self.status}
if self.error:
result["error"] = self.error
result["message"] = self.message
else:
if self.folder_type:
result["folder_type"] = self.folder_type
if self.path:
result["path"] = self.path
if self.entries is not None:
result["entries"] = self.entries
if self.count is not None:
result["count"] = self.count
return result
@@ -1,82 +0,0 @@
"""enrich_from_probe — fill missing ParsedRelease fields from MediaInfo."""
from __future__ import annotations
from alfred.domain.release.value_objects import ParsedRelease
from alfred.domain.shared.media import MediaInfo
# Map ffprobe codec names to scene-style codec tokens
_VIDEO_CODEC_MAP = {
"hevc": "x265",
"h264": "x264",
"h265": "x265",
"av1": "AV1",
"vp9": "VP9",
"mpeg4": "XviD",
}
# Map ffprobe audio codec names to scene-style tokens
_AUDIO_CODEC_MAP = {
"eac3": "EAC3",
"ac3": "AC3",
"dts": "DTS",
"truehd": "TrueHD",
"aac": "AAC",
"flac": "FLAC",
"opus": "OPUS",
"mp3": "MP3",
"pcm_s16l": "PCM",
"pcm_s24l": "PCM",
}
# Map channel count to standard layout string
_CHANNEL_MAP = {
8: "7.1",
6: "5.1",
2: "2.0",
1: "1.0",
}
def enrich_from_probe(parsed: ParsedRelease, info: MediaInfo) -> None:
"""
Fill None fields in parsed using data from ffprobe MediaInfo.
Only overwrites fields that are currently None — token-level values
from the release name always take priority.
Mutates parsed in place.
"""
if parsed.quality is None and info.resolution:
parsed.quality = info.resolution
if parsed.codec is None and info.video_codec:
parsed.codec = _VIDEO_CODEC_MAP.get(
info.video_codec.lower(), info.video_codec.upper()
)
if parsed.bit_depth is None and info.video_codec:
# ffprobe exposes bit depth via pix_fmt — not in MediaInfo yet, skip for now
pass
# Audio — use the default track, fallback to first
default_track = next((t for t in info.audio_tracks if t.is_default), None)
track = default_track or (info.audio_tracks[0] if info.audio_tracks else None)
if track:
if parsed.audio_codec is None and track.codec:
parsed.audio_codec = _AUDIO_CODEC_MAP.get(
track.codec.lower(), track.codec.upper()
)
if parsed.audio_channels is None and track.channels:
parsed.audio_channels = _CHANNEL_MAP.get(
track.channels, f"{track.channels}ch"
)
# Languages — merge ffprobe languages with token-level ones
# "und" = undetermined, not useful
if info.audio_languages:
existing = set(parsed.languages)
for lang in info.audio_languages:
if lang.lower() != "und" and lang.upper() not in existing:
parsed.languages.append(lang)
@@ -0,0 +1,40 @@
"""link_file use case — hard-link a file from one root to another."""
from __future__ import annotations
from pathlib import Path
from alfred.infrastructure.filesystem import FilesystemError, link_file
from ._errors import PATH_NOT_ALLOWED, code_for
from .directory_roots import DirectoryRoots
from .dto import LinkFileResponse
def link_file_use_case(
src: Path, dst: Path, roots: DirectoryRoots
) -> LinkFileResponse:
"""Hard-link ``src`` to ``dst``. Both must be under configured roots.
The destination parent must already exist — the caller is expected
to have created it via ``create_dir_use_case`` if needed.
"""
if not roots.contains(src):
return LinkFileResponse(
status="error",
error=PATH_NOT_ALLOWED,
message=f"Source is outside configured roots: {src}",
)
if not roots.contains(dst):
return LinkFileResponse(
status="error",
error=PATH_NOT_ALLOWED,
message=f"Destination is outside configured roots: {dst}",
)
try:
link_file(src, dst)
except FilesystemError as e:
return LinkFileResponse(status="error", error=code_for(e), message=str(e))
return LinkFileResponse(status="ok", source=src, destination=dst)
+34
View File
@@ -0,0 +1,34 @@
"""list_dir use case — list a directory after guarding it within roots."""
from __future__ import annotations
from pathlib import Path
from alfred.infrastructure.filesystem import FilesystemError, list_dir
from ._errors import PATH_NOT_ALLOWED, code_for
from .directory_roots import DirectoryRoots
from .dto import ListDirResponse
def list_dir_use_case(path: Path, roots: DirectoryRoots) -> ListDirResponse:
"""List the immediate children of ``path`` if it lives under one of
the configured roots.
Returns a :class:`ListDirResponse`. On guard failure, status is
``"error"`` with ``error="path_not_allowed"``. On infra failure,
status is ``"error"`` with a code mapped from the raised exception.
"""
if not roots.contains(path):
return ListDirResponse(
status="error",
error=PATH_NOT_ALLOWED,
message=f"Path is outside configured roots: {path}",
)
try:
entries = list_dir(path)
except FilesystemError as e:
return ListDirResponse(status="error", error=code_for(e), message=str(e))
return ListDirResponse(status="ok", path=path, entries=tuple(entries))
@@ -3,25 +3,25 @@
import logging import logging
from pathlib import Path from pathlib import Path
from alfred.domain.shared.value_objects import ImdbId from alfred.application.subtitles_TO_CHECK.placer import (
from alfred.domain.subtitles.entities import SubtitleCandidate
from alfred.domain.subtitles.services.identifier import SubtitleIdentifier
from alfred.domain.subtitles.services.matcher import SubtitleMatcher
from alfred.domain.subtitles.services.pattern_detector import PatternDetector
from alfred.application.subtitles.placer import (
PlacedTrack, PlacedTrack,
SubtitlePlacer, SubtitlePlacer,
_build_dest_name, _build_dest_name,
) )
from alfred.domain.subtitles.services.utils import available_subtitles from alfred.domain.shared_TO_CHECK.value_objects import ImdbId
from alfred.domain.subtitles.value_objects import ScanStrategy from alfred.domain.subtitles_TO_CHECK.entities import SubtitleScanResult
from alfred.domain.subtitles_TO_CHECK.services.identifier import SubtitleIdentifier
from alfred.domain.subtitles_TO_CHECK.services.matcher import SubtitleMatcher
from alfred.domain.subtitles_TO_CHECK.services.pattern_detector import PatternDetector
from alfred.domain.subtitles_TO_CHECK.services.utils import available_subtitles
from alfred.domain.subtitles_TO_CHECK.value_objects import ScanStrategy
from alfred.infrastructure.filesystem.scanner import PathlibFilesystemScanner from alfred.infrastructure.filesystem.scanner import PathlibFilesystemScanner
from alfred.infrastructure.knowledge.subtitles.base import SubtitleKnowledgeBase from alfred.infrastructure.knowledge_TO_CHECK.subtitles.base import SubtitleKnowledgeBase
from alfred.infrastructure.knowledge.subtitles.loader import KnowledgeLoader from alfred.infrastructure.knowledge_TO_CHECK.subtitles.loader import KnowledgeLoader
from alfred.infrastructure.persistence.context import get_memory from alfred.infrastructure.persistence_TO_CHECK.context import get_memory
from alfred.infrastructure.probe.ffprobe_prober import FfprobeMediaProber from alfred.infrastructure.probe_TO_CHECK.ffprobe_prober import FfprobeMediaProber
from alfred.infrastructure.subtitle.metadata_store import SubtitleMetadataStore from alfred.infrastructure.subtitle_TO_CHECK.metadata_store import SubtitleMetadataStore
from alfred.infrastructure.subtitle.rule_repository import RuleSetRepository from alfred.infrastructure.subtitle_TO_CHECK.rule_repository import RuleSetRepository
from .dto import ( from .dto import (
AvailableSubtitle, AvailableSubtitle,
@@ -278,7 +278,7 @@ class ManageSubtitlesUseCase:
def _to_unresolved_dto( def _to_unresolved_dto(
track: SubtitleCandidate, min_confidence: float = 0.7 track: SubtitleScanResult, min_confidence: float = 0.7
) -> UnresolvedTrack: ) -> UnresolvedTrack:
reason = "unknown_language" if track.language is None else "low_confidence" reason = "unknown_language" if track.language is None else "low_confidence"
return UnresolvedTrack( return UnresolvedTrack(
@@ -291,10 +291,10 @@ def _to_unresolved_dto(
def _pair_placed_with_tracks( def _pair_placed_with_tracks(
placed: list[PlacedTrack], placed: list[PlacedTrack],
tracks: list[SubtitleCandidate], tracks: list[SubtitleScanResult],
) -> list[tuple[PlacedTrack, SubtitleCandidate]]: ) -> list[tuple[PlacedTrack, SubtitleScanResult]]:
""" """
Pair each PlacedTrack with its originating SubtitleCandidate by source path. Pair each PlacedTrack with its originating SubtitleScanResult by source path.
Falls back to positional matching if paths don't align. Falls back to positional matching if paths don't align.
""" """
track_by_path = {t.file_path: t for t in tracks if t.file_path} track_by_path = {t.file_path: t for t in tracks if t.file_path}
+36
View File
@@ -0,0 +1,36 @@
"""move_dir use case — move a directory tree between configured roots."""
from __future__ import annotations
from pathlib import Path
from alfred.infrastructure.filesystem import FilesystemError, move_dir
from ._errors import PATH_NOT_ALLOWED, code_for
from .directory_roots import DirectoryRoots
from .dto import MoveDirResponse
def move_dir_use_case(
src: Path, dst: Path, roots: DirectoryRoots
) -> MoveDirResponse:
"""Move directory ``src`` to ``dst``. Both must be under configured roots."""
if not roots.contains(src):
return MoveDirResponse(
status="error",
error=PATH_NOT_ALLOWED,
message=f"Source is outside configured roots: {src}",
)
if not roots.contains(dst):
return MoveDirResponse(
status="error",
error=PATH_NOT_ALLOWED,
message=f"Destination is outside configured roots: {dst}",
)
try:
move_dir(src, dst)
except FilesystemError as e:
return MoveDirResponse(status="error", error=code_for(e), message=str(e))
return MoveDirResponse(status="ok", source=src, destination=dst)
@@ -0,0 +1,36 @@
"""move_file use case — move a file between configured roots."""
from __future__ import annotations
from pathlib import Path
from alfred.infrastructure.filesystem import FilesystemError, move_file
from ._errors import PATH_NOT_ALLOWED, code_for
from .directory_roots import DirectoryRoots
from .dto import MoveFileResponse
def move_file_use_case(
src: Path, dst: Path, roots: DirectoryRoots
) -> MoveFileResponse:
"""Move file ``src`` to ``dst``. Both must be under configured roots."""
if not roots.contains(src):
return MoveFileResponse(
status="error",
error=PATH_NOT_ALLOWED,
message=f"Source is outside configured roots: {src}",
)
if not roots.contains(dst):
return MoveFileResponse(
status="error",
error=PATH_NOT_ALLOWED,
message=f"Destination is outside configured roots: {dst}",
)
try:
move_file(src, dst)
except FilesystemError as e:
return MoveFileResponse(status="error", error=code_for(e), message=str(e))
return MoveFileResponse(status="ok", source=src, destination=dst)
@@ -22,16 +22,35 @@ import logging
from dataclasses import dataclass from dataclasses import dataclass
from pathlib import Path from pathlib import Path
from alfred.application.release_TO_CHECK import inspect_release
from alfred.domain.release import parse_release from alfred.domain.release import parse_release
from alfred.domain.release.ports import ReleaseKnowledge from alfred.domain.releases_TO_CHECK.ports import ReleaseKnowledge
from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge from alfred.domain.release.value_objects import ParsedRelease
from alfred.infrastructure.persistence import get_memory from alfred.domain.shared_TO_CHECK.ports import MediaProber
from alfred.infrastructure.persistence_TO_CHECK import get_memory
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
# Single module-level knowledge instance. YAML is loaded once at first import.
# Tests that need a custom KB can monkeypatch this attribute. def _resolve_parsed(
_KB: ReleaseKnowledge = YamlReleaseKnowledge() release_name: str,
source_path: str | None,
kb: ReleaseKnowledge,
prober: MediaProber,
) -> ParsedRelease:
"""Pick the right entry point depending on whether we have a path.
When ``source_path`` is provided and points to something that exists,
we run the full inspection pipeline so probe data can refresh tech
fields (which feed every filename builder). Otherwise we fall back
to a parse-only path same behavior as before.
"""
if source_path:
path = Path(source_path)
if path.exists():
return inspect_release(release_name, path, kb, prober).parsed
parsed, _ = parse_release(release_name, kb)
return parsed
def _find_existing_tvshow_folders( def _find_existing_tvshow_folders(
@@ -236,13 +255,20 @@ def resolve_season_destination(
release_name: str, release_name: str,
tmdb_title: str, tmdb_title: str,
tmdb_year: int, tmdb_year: int,
kb: ReleaseKnowledge,
prober: MediaProber,
confirmed_folder: str | None = None, confirmed_folder: str | None = None,
source_path: str | None = None,
) -> ResolvedSeasonDestination: ) -> ResolvedSeasonDestination:
""" """
Compute destination paths for a season pack. Compute destination paths for a season pack.
Returns series_folder + season_folder. No file paths the whole Returns series_folder + season_folder. No file paths the whole
source folder is moved as-is into season_folder. source folder is moved as-is into season_folder.
When ``source_path`` points to the release on disk, the parser is
augmented with ffprobe data so tech tokens missing from the release
name (quality / codec) end up in the folder names.
""" """
tv_root = _get_tv_root() tv_root = _get_tv_root()
if not tv_root: if not tv_root:
@@ -252,8 +278,8 @@ def resolve_season_destination(
message="TV show library path is not configured.", message="TV show library path is not configured.",
) )
parsed = parse_release(release_name, _KB) parsed = _resolve_parsed(release_name, source_path, kb, prober)
tmdb_title_safe = _KB.sanitize_for_fs(tmdb_title) tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year) computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)
resolved = _resolve_series_folder( resolved = _resolve_series_folder(
@@ -286,6 +312,8 @@ def resolve_episode_destination(
source_file: str, source_file: str,
tmdb_title: str, tmdb_title: str,
tmdb_year: int, tmdb_year: int,
kb: ReleaseKnowledge,
prober: MediaProber,
tmdb_episode_title: str | None = None, tmdb_episode_title: str | None = None,
confirmed_folder: str | None = None, confirmed_folder: str | None = None,
) -> ResolvedEpisodeDestination: ) -> ResolvedEpisodeDestination:
@@ -293,6 +321,8 @@ def resolve_episode_destination(
Compute destination paths for a single episode file. Compute destination paths for a single episode file.
Returns series_folder + season_folder + library_file (full path to .mkv). Returns series_folder + season_folder + library_file (full path to .mkv).
``source_file`` doubles as the inspection target when it exists,
ffprobe enrichment refreshes tech tokens missing from the release name.
""" """
tv_root = _get_tv_root() tv_root = _get_tv_root()
if not tv_root: if not tv_root:
@@ -302,11 +332,11 @@ def resolve_episode_destination(
message="TV show library path is not configured.", message="TV show library path is not configured.",
) )
parsed = parse_release(release_name, _KB) parsed = _resolve_parsed(release_name, source_file, kb, prober)
ext = Path(source_file).suffix ext = Path(source_file).suffix
tmdb_title_safe = _KB.sanitize_for_fs(tmdb_title) tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
tmdb_episode_title_safe = ( tmdb_episode_title_safe = (
_KB.sanitize_for_fs(tmdb_episode_title) if tmdb_episode_title else None kb.sanitize_for_fs(tmdb_episode_title) if tmdb_episode_title else None
) )
computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year) computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)
@@ -345,11 +375,15 @@ def resolve_movie_destination(
source_file: str, source_file: str,
tmdb_title: str, tmdb_title: str,
tmdb_year: int, tmdb_year: int,
kb: ReleaseKnowledge,
prober: MediaProber,
) -> ResolvedMovieDestination: ) -> ResolvedMovieDestination:
""" """
Compute destination paths for a movie file. Compute destination paths for a movie file.
Returns movie_folder + library_file (full path to .mkv). Returns movie_folder + library_file (full path to .mkv).
``source_file`` doubles as the inspection target when it exists,
ffprobe enrichment refreshes tech tokens missing from the release name.
""" """
memory = get_memory() memory = get_memory()
movies_root = memory.ltm.library_paths.get("movie") movies_root = memory.ltm.library_paths.get("movie")
@@ -360,9 +394,9 @@ def resolve_movie_destination(
message="Movie library path is not configured.", message="Movie library path is not configured.",
) )
parsed = parse_release(release_name, _KB) parsed = _resolve_parsed(release_name, source_file, kb, prober)
ext = Path(source_file).suffix ext = Path(source_file).suffix
tmdb_title_safe = _KB.sanitize_for_fs(tmdb_title) tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
folder_name = parsed.movie_folder_name(tmdb_title_safe, tmdb_year) folder_name = parsed.movie_folder_name(tmdb_title_safe, tmdb_year)
filename = parsed.movie_filename(tmdb_title_safe, tmdb_year, ext) filename = parsed.movie_filename(tmdb_title_safe, tmdb_year, ext)
@@ -384,12 +418,18 @@ def resolve_series_destination(
release_name: str, release_name: str,
tmdb_title: str, tmdb_title: str,
tmdb_year: int, tmdb_year: int,
kb: ReleaseKnowledge,
prober: MediaProber,
confirmed_folder: str | None = None, confirmed_folder: str | None = None,
source_path: str | None = None,
) -> ResolvedSeriesDestination: ) -> ResolvedSeriesDestination:
""" """
Compute destination path for a complete multi-season series pack. Compute destination path for a complete multi-season series pack.
Returns only series_folder the whole pack lands directly inside it. Returns only series_folder the whole pack lands directly inside it.
When ``source_path`` points to the release on disk, ffprobe
enrichment refreshes tech tokens missing from the release name.
""" """
tv_root = _get_tv_root() tv_root = _get_tv_root()
if not tv_root: if not tv_root:
@@ -399,8 +439,8 @@ def resolve_series_destination(
message="TV show library path is not configured.", message="TV show library path is not configured.",
) )
parsed = parse_release(release_name, _KB) parsed = _resolve_parsed(release_name, source_path, kb, prober)
tmdb_title_safe = _KB.sanitize_for_fs(tmdb_title) tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year) computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)
resolved = _resolve_series_folder( resolved = _resolve_series_folder(
@@ -1,50 +0,0 @@
"""Set folder path use case."""
import logging
from alfred.infrastructure.filesystem import FileManager
from .dto import SetFolderPathResponse
logger = logging.getLogger(__name__)
class SetFolderPathUseCase:
"""
Use case for setting a folder path in configuration.
This orchestrates the FileManager to set folder paths.
"""
def __init__(self, file_manager: FileManager):
"""
Initialize use case.
Args:
file_manager: FileManager instance
"""
self.file_manager = file_manager
def execute(self, folder_name: str, path_value: str) -> SetFolderPathResponse:
"""
Set a folder path in configuration.
Args:
folder_name: Name of folder to set (download, tvshow, movie, torrent)
path_value: Absolute path to the folder
Returns:
SetFolderPathResponse with success or error information
"""
result = self.file_manager.set_folder_path(folder_name, path_value)
if result.get("status") == "ok":
return SetFolderPathResponse(
status="ok",
folder_name=result.get("folder_name"),
path=result.get("path"),
)
else:
return SetFolderPathResponse(
status="error", error=result.get("error"), message=result.get("message")
)
-44
View File
@@ -1,44 +0,0 @@
"""Movie application DTOs."""
from dataclasses import dataclass
@dataclass
class SearchMovieResponse:
"""Response from searching for a movie."""
status: str
imdb_id: str | None = None
title: str | None = None
media_type: str | None = None
tmdb_id: int | None = None
overview: str | None = None
release_date: str | None = None
vote_average: float | None = None
error: str | None = None
message: str | None = None
def to_dict(self):
"""Convert to dict for agent compatibility."""
result = {"status": self.status}
if self.error:
result["error"] = self.error
result["message"] = self.message
else:
if self.imdb_id:
result["imdb_id"] = self.imdb_id
if self.title:
result["title"] = self.title
if self.media_type:
result["media_type"] = self.media_type
if self.tmdb_id:
result["tmdb_id"] = self.tmdb_id
if self.overview:
result["overview"] = self.overview
if self.release_date:
result["release_date"] = self.release_date
if self.vote_average:
result["vote_average"] = self.vote_average
return result
-93
View File
@@ -1,93 +0,0 @@
"""Search movie use case."""
import logging
from alfred.infrastructure.api.tmdb import (
TMDBAPIError,
TMDBClient,
TMDBConfigurationError,
TMDBNotFoundError,
)
from .dto import SearchMovieResponse
logger = logging.getLogger(__name__)
class SearchMovieUseCase:
"""
Use case for searching a movie and retrieving its IMDb ID.
This orchestrates the TMDB API client to find movie information.
"""
def __init__(self, tmdb_client: TMDBClient):
"""
Initialize use case.
Args:
tmdb_client: TMDB API client
"""
self.tmdb_client = tmdb_client
def execute(self, media_title: str) -> SearchMovieResponse:
"""
Search for a movie by title.
Args:
media_title: Title of the movie to search for
Returns:
SearchMovieResponse with movie information or error
"""
try:
# Use the TMDB client to search for media
result = self.tmdb_client.search_media(media_title)
# Check if IMDb ID was found
if result.imdb_id:
logger.info(f"IMDb ID found for '{media_title}': {result.imdb_id}")
return SearchMovieResponse(
status="ok",
imdb_id=result.imdb_id,
title=result.title,
media_type=result.media_type,
tmdb_id=result.tmdb_id,
overview=result.overview,
release_date=result.release_date,
vote_average=result.vote_average,
)
else:
logger.warning(f"No IMDb ID available for '{media_title}'")
return SearchMovieResponse(
status="ok",
title=result.title,
media_type=result.media_type,
tmdb_id=result.tmdb_id,
error="no_imdb_id",
message=f"No IMDb ID available for '{result.title}'",
)
except TMDBNotFoundError as e:
logger.info(f"Media not found: {e}")
return SearchMovieResponse(
status="error", error="not_found", message=str(e)
)
except TMDBConfigurationError as e:
logger.error(f"TMDB configuration error: {e}")
return SearchMovieResponse(
status="error", error="configuration_error", message=str(e)
)
except TMDBAPIError as e:
logger.error(f"TMDB API error: {e}")
return SearchMovieResponse(
status="error", error="api_error", message=str(e)
)
except ValueError as e:
logger.error(f"Validation error: {e}")
return SearchMovieResponse(
status="error", error="validation_failed", message=str(e)
)
@@ -1,9 +1,10 @@
"""Movie use cases.""" """Movie use cases."""
from .dto import SearchMovieResponse from .dto import MovieHit, SearchMovieResponse
from .search_movie import SearchMovieUseCase from .search_movie import SearchMovieUseCase
__all__ = [ __all__ = [
"SearchMovieUseCase", "MovieHit",
"SearchMovieResponse", "SearchMovieResponse",
"SearchMovieUseCase",
] ]
+40
View File
@@ -0,0 +1,40 @@
"""Movie application DTOs."""
from dataclasses import dataclass, field
@dataclass(frozen=True)
class MovieHit:
"""One movie hit, flattened for transport to the agent."""
tmdb_id: int
title: str
release_year: int | None = None
def to_dict(self) -> dict:
out: dict = {"tmdb_id": self.tmdb_id, "title": self.title}
if self.release_year is not None:
out["release_year"] = self.release_year
return out
@dataclass
class SearchMovieResponse:
"""Response from searching for a movie."""
status: str
hits: list[MovieHit] = field(default_factory=list)
error: str | None = None
message: str | None = None
def to_dict(self):
"""Convert to dict for agent compatibility."""
result: dict = {"status": self.status}
if self.error:
result["error"] = self.error
result["message"] = self.message
else:
result["hits"] = [h.to_dict() for h in self.hits]
return result
@@ -0,0 +1,60 @@
"""Search movie use case."""
import logging
from alfred.infrastructure.api_TO_CHECK.tmdb import (
TMDBAPIError,
TMDBClient,
TMDBConfigurationError,
)
from .dto import MovieHit, SearchMovieResponse
logger = logging.getLogger(__name__)
class SearchMovieUseCase:
"""List movies matching a free-text query via TMDB ``/search/movie``.
The use case is a thin orchestrator: it asks the client for hits,
flattens domain VOs into agent-friendly primitives, and wraps
errors. It deliberately does **not** look up ``imdb_id`` —
enrichment is the caller's job (via :meth:`TMDBClient.get_movie_info`
on a chosen ``tmdb_id``).
"""
def __init__(self, tmdb_client: TMDBClient):
self.tmdb_client = tmdb_client
def execute(self, media_title: str) -> SearchMovieResponse:
try:
results = self.tmdb_client.search_movies(media_title)
hits = [
MovieHit(
tmdb_id=r.tmdb_id.value,
title=str(r.title),
release_year=r.release_year.value if r.release_year else None,
)
for r in results
]
logger.info(f"search_movies({media_title!r}) → {len(hits)} hits")
return SearchMovieResponse(status="ok", hits=hits)
except TMDBConfigurationError as e:
logger.error(f"TMDB configuration error: {e}")
return SearchMovieResponse(
status="error", error="configuration_error", message=str(e)
)
except TMDBAPIError as e:
logger.error(f"TMDB API error: {e}")
return SearchMovieResponse(
status="error", error="api_error", message=str(e)
)
except ValueError as e:
logger.error(f"Validation error: {e}")
return SearchMovieResponse(
status="error", error="validation_failed", message=str(e)
)
@@ -0,0 +1,20 @@
"""Release application layer — orchestrators sitting between domain
parsing and infrastructure I/O.
Public surface:
- :func:`is_supported_video` / :func:`find_main_video` — pre-pipeline
filesystem helpers (extension-only filtering, top-level video pick).
- :func:`inspect_release` / :class:`InspectedResult` — full inspection
pipeline combining parse + filesystem refinement + probe enrichment.
"""
from .inspect import InspectedResult, inspect_release
from .supported_media import find_main_video, is_supported_video
__all__ = [
"InspectedResult",
"find_main_video",
"inspect_release",
"is_supported_video",
]
@@ -19,7 +19,7 @@ from __future__ import annotations
from pathlib import Path from pathlib import Path
from alfred.domain.release.ports import ReleaseKnowledge from alfred.domain.releases_TO_CHECK.ports import ReleaseKnowledge
from alfred.domain.release.value_objects import ParsedRelease from alfred.domain.release.value_objects import ParsedRelease
@@ -0,0 +1,74 @@
"""enrich_from_probe — fill missing ParsedRelease fields from MediaInfo."""
from __future__ import annotations
from dataclasses import replace
from alfred.domain.releases_TO_CHECK.ports import ReleaseKnowledge
from alfred.domain.release.value_objects import ParsedRelease
from alfred.domain.shared_TO_CHECK.media import MediaInfo
def enrich_from_probe(
parsed: ParsedRelease, info: MediaInfo, kb: ReleaseKnowledge
) -> ParsedRelease:
"""
Return a new ParsedRelease with None fields filled from ffprobe MediaInfo.
Only overwrites fields that are currently None — token-level values
from the release name always take priority. ``ParsedRelease`` is
frozen; this returns a new instance via :func:`dataclasses.replace`.
Translation tables (ffprobe codec name → scene token, channel count
→ layout) live in ``kb.probe_mappings`` (loaded from
``alfred/knowledge/release/probe_mappings.yaml``). When ffprobe
reports a value with no mapping entry, the fallback is the uppercase
raw value so unknown codecs still surface in a predictable form.
"""
mappings = kb.probe_mappings
video_codec_map: dict[str, str] = mappings.get("video_codec", {})
audio_codec_map: dict[str, str] = mappings.get("audio_codec", {})
channel_map: dict[int, str] = mappings.get("audio_channels", {})
updates: dict[str, object] = {}
if parsed.quality is None and info.resolution:
updates["quality"] = info.resolution
if parsed.codec is None and info.video_codec:
updates["codec"] = video_codec_map.get(
info.video_codec.lower(), info.video_codec.upper()
)
# bit_depth: ffprobe exposes it via pix_fmt — not in MediaInfo yet, skip.
# Audio — use the default track, fallback to first
default_track = next((t for t in info.audio_tracks if t.is_default), None)
track = default_track or (info.audio_tracks[0] if info.audio_tracks else None)
if track:
if parsed.audio_codec is None and track.codec:
updates["audio_codec"] = audio_codec_map.get(
track.codec.lower(), track.codec.upper()
)
if parsed.audio_channels is None and track.channels:
updates["audio_channels"] = channel_map.get(
track.channels, f"{track.channels}ch"
)
# Languages — merge ffprobe languages with token-level ones
# "und" = undetermined, not useful
if info.audio_languages:
existing_upper = {lang.upper() for lang in parsed.languages}
new_languages = list(parsed.languages)
for lang in info.audio_languages:
if lang.lower() != "und" and lang.upper() not in existing_upper:
new_languages.append(lang)
existing_upper.add(lang.upper())
if len(new_languages) != len(parsed.languages):
updates["languages"] = tuple(new_languages)
if not updates:
return parsed
return replace(parsed, **updates)
@@ -0,0 +1,192 @@
"""Release inspection orchestrator — the canonical "look at this thing"
entry point.
``inspect_release`` is the single composition of the four layers we
care about for a freshly-arrived release:
1. **Parse the name** — :func:`alfred.domain.release.services.parse_release`
gives a ``ParsedRelease`` plus a ``ParseReport`` (confidence + road).
2. **Pick the main video** — :func:`find_main_video` runs a top-level
scan over the source path. If nothing qualifies the result still
completes; downstream callers decide what to do with a videoless
release.
3. **Refine the media type** — :func:`detect_media_type` uses the
on-disk extension mix to override any token-level guess (e.g. a
bare ``.iso`` folder becomes ``"other"``). The refined value is
patched onto ``parsed`` in place — same convention as
``analyze_release`` had before.
4. **Probe the video** — the injected :class:`MediaProber` fills in
missing technical fields via :func:`enrich_from_probe`. Skipped
when there is no main video or when ``media_type`` ended up in
``{"unknown", "other"}`` (the probe would tell us nothing useful).
The return type is :class:`InspectedResult`, a frozen VO that bundles
everything downstream callers need (``analyze_release`` tool,
``resolve_destination``, future workflow stages) without forcing them
to redo the same four calls.
Design notes:
- **Application layer.** This module touches both domain
(``parse_release``) and infrastructure (``MediaProber`` port). That
is exactly application's job — orchestrate.
- **Knowledge base is injected.** ``inspect_release`` takes ``kb`` and
``prober`` as parameters; no module-level singletons here. Callers
(the tool wrapper, tests) decide what to plug in.
- **Mutation is contained.** We still mutate ``parsed.media_type`` and
let ``enrich_from_probe`` fill its ``None`` fields, because
``ParsedRelease`` is intentionally a mutable dataclass. The outer
``InspectedResult`` is frozen so the *bundle* is immutable from the
caller's perspective.
- **Never raises.** Filesystem / probe errors surface as ``None``
fields on the result, never as exceptions — same contract as the
underlying adapters.
"""
from __future__ import annotations
from dataclasses import dataclass, replace
from pathlib import Path
from alfred.application.release_TO_CHECK.detect_media_type import detect_media_type
from alfred.application.release_TO_CHECK.enrich_from_probe import enrich_from_probe
from alfred.application.release_TO_CHECK.supported_media import find_main_video
from alfred.domain.releases_TO_CHECK.ports import ReleaseKnowledge
from alfred.domain.releases_TO_CHECK.parser.services import parse_release
from alfred.domain.release.value_objects import (
MediaTypeToken,
ParsedRelease,
ParseReport,
)
from alfred.domain.shared_TO_CHECK.media import MediaInfo
from alfred.domain.shared_TO_CHECK.ports import MediaProber
# Media types for which a probe carries no useful information.
_NON_PROBABLE_MEDIA_TYPES = frozenset({"unknown", "other"})
# Media types for which there's nothing for the organizer to do.
# ``other`` covers things like games / ISOs / archives sitting on the
# downloads folder. ``unknown`` does NOT belong here — those need a
# user decision, not a skip.
_SKIPPABLE_MEDIA_TYPES = frozenset({"other"})
# Roads that signal the parser couldn't reach a confident answer on its
# own. ``Road`` values are kept as strings on the report to avoid a
# cross-package import here.
_ASK_USER_ROADS = frozenset({"path_of_pain"})
@dataclass(frozen=True)
class InspectedResult:
"""The full picture of a release: parsed name + filesystem reality.
Bundles everything the downstream pipeline needs after a single
inspection pass:
- ``parsed`` — :class:`ParsedRelease`, with ``media_type`` already
refined by :func:`detect_media_type` and ``None`` tech fields
filled in by :func:`enrich_from_probe` when a probe ran.
- ``report`` — :class:`ParseReport` from the parser (confidence +
road, untouched by inspection).
- ``source_path`` — the path the inspector was pointed at (file or
folder), as supplied by the caller.
- ``main_video`` — the canonical video file inside ``source_path``,
or ``None`` if no eligible file was found.
- ``media_info`` — the :class:`MediaInfo` snapshot when a probe
succeeded; ``None`` when no video was probed (no main video, or
``media_type`` in ``{"unknown", "other"}``) or when ffprobe
failed.
- ``probe_used`` — ``True`` iff ``media_info`` is non-``None`` and
``enrich_from_probe`` actually ran. Explicit flag so callers
don't have to re-derive the condition.
- ``recommended_action`` — derived hint for the orchestrator (see
property docstring). Encodes the exclusion / clarification /
go-ahead decision in one place so downstream callers don't
re-implement the same checks.
"""
parsed: ParsedRelease
report: ParseReport
source_path: Path
main_video: Path | None
media_info: MediaInfo | None
probe_used: bool
@property
def recommended_action(self) -> str:
"""Return one of ``"skip"`` / ``"ask_user"`` / ``"process"``.
- ``"skip"`` — nothing to organize:
* the source has no main video file, **or**
* ``media_type`` is ``"other"`` (games / ISOs / archives).
- ``"ask_user"`` — a decision is required before any action:
* ``media_type`` is ``"unknown"`` (parser couldn't classify), **or**
* the parse landed on ``Road.PATH_OF_PAIN``
(low-confidence, malformed name, etc.).
- ``"process"`` — everything else: a confident parse with a
usable media type and a main video on disk. The orchestrator
can move straight to the planning step.
The check ordering matters: ``"skip"`` wins over ``"ask_user"``
because if there's no video to organize, no question to the
user can change that. ``"ask_user"`` then wins over
``"process"`` because a confident parse alone isn't enough if
the type or road still flag uncertainty.
"""
if self.main_video is None:
return "skip"
if self.parsed.media_type.value in _SKIPPABLE_MEDIA_TYPES:
return "skip"
if self.parsed.media_type.value == "unknown":
return "ask_user"
if self.report.road in _ASK_USER_ROADS:
return "ask_user"
return "process"
def inspect_release(
release_name: str,
source_path: Path,
kb: ReleaseKnowledge,
prober: MediaProber,
) -> InspectedResult:
"""Run the full inspection pipeline on ``release_name`` /
``source_path``.
See module docstring for the four-step flow. ``kb`` and ``prober``
are injected so the caller controls the knowledge base layering
and the probe adapter (real ffprobe in production, stubs in tests).
Never raises. A missing or unreadable ``source_path`` simply
results in ``main_video=None`` and ``media_info=None``.
"""
parsed, report = parse_release(release_name, kb)
# Step 2: refine media_type from the on-disk extension mix.
# detect_media_type tolerates non-existent paths (returns parsed.media_type
# untouched), so no need to guard here. ParsedRelease is frozen — use
# dataclasses.replace to rebind with the refined value.
refined_media_type = MediaTypeToken(detect_media_type(parsed, source_path, kb))
if refined_media_type != parsed.media_type:
parsed = replace(parsed, media_type=refined_media_type)
# Step 3: pick the canonical main video (top-level scan only).
main_video = find_main_video(source_path, kb)
# Step 4: probe + enrich, when it makes sense.
media_info: MediaInfo | None = None
probe_used = False
if main_video is not None and parsed.media_type not in _NON_PROBABLE_MEDIA_TYPES:
media_info = prober.probe(main_video)
if media_info is not None:
parsed = enrich_from_probe(parsed, media_info, kb)
probe_used = True
return InspectedResult(
parsed=parsed,
report=report,
source_path=source_path,
main_video=main_video,
media_info=media_info,
probe_used=probe_used,
)
@@ -0,0 +1,74 @@
"""Pre-pipeline exclusion — decide which files are worth parsing.
These helpers live one notch above the domain: they touch the
filesystem (``Path.iterdir``, ``Path.suffix``) but carry no parsing
logic of their own. The goal is to filter out non-video files and pick
the canonical "main video" from a release folder *before* anything
hits :func:`~alfred.domain.release.parse_release`.
Design notes (Phase A bis, 2026-05-20):
- **Extension is the sole eligibility criterion.** A file is supported
iff its suffix is in ``kb.video_extensions``. No size threshold, no
filename heuristics ("sample", "trailer", …). If a release packs a
bloated featurette or names its sample alphabetically before the
main feature, that's PATH_OF_PAIN territory — not this layer's job.
- **Top-level scan only.** ``find_main_video`` does not descend into
subdirectories. Releases that wrap the main video in ``Sample/`` or
similar are non-scene-standard and handled by the orchestrator
upstream.
- **Lexicographic tie-break.** When several candidates qualify
(legitimate for season packs), we return the first by alphabetical
order. Deterministic, no size-based ranking.
- **Direct ``Path`` I/O.** No ``FilesystemScanner`` port — this layer
is application, not domain. If isolation becomes necessary for
testing scale, we'll introduce a port then.
"""
from __future__ import annotations
from pathlib import Path
from alfred.domain.releases_TO_CHECK.ports.knowledge import ReleaseKnowledge
def is_supported_video(path: Path, kb: ReleaseKnowledge) -> bool:
"""Return True when ``path`` is a video file the parser should
consider.
The check is purely extension-based: ``path.suffix.lower()`` must
belong to ``kb.video_extensions``. ``path`` must also be a regular
file — directories and broken symlinks return False.
"""
if not path.is_file():
return False
return path.suffix.lower() in kb.video_extensions
def find_main_video(folder: Path, kb: ReleaseKnowledge) -> Path | None:
"""Return the canonical main video file inside ``folder``, or
``None`` if there isn't one.
Behavior:
- Top-level scan only — subdirectories are ignored.
- Eligibility is :func:`is_supported_video`.
- When several files qualify, the lexicographically first one wins.
- When ``folder`` itself is a video file, it is returned as-is
(single-file releases are valid).
- When ``folder`` doesn't exist or isn't a directory (and isn't a
video file either), returns ``None``.
"""
if folder.is_file():
return folder if is_supported_video(folder, kb) else None
if not folder.is_dir():
return None
candidates = sorted(
child for child in folder.iterdir() if is_supported_video(child, kb)
)
return candidates[0] if candidates else None
@@ -5,13 +5,13 @@ import os
from dataclasses import dataclass from dataclasses import dataclass
from pathlib import Path from pathlib import Path
from alfred.domain.subtitles.entities import SubtitleCandidate from alfred.domain.subtitles_TO_CHECK.entities import SubtitleScanResult
from alfred.domain.subtitles.value_objects import SubtitleType from alfred.domain.subtitles_TO_CHECK.value_objects import SubtitleType
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
def _build_dest_name(track: SubtitleCandidate, video_stem: str) -> str: def _build_dest_name(track: SubtitleScanResult, video_stem: str) -> str:
""" """
Build the destination filename for a subtitle track. Build the destination filename for a subtitle track.
@@ -41,7 +41,7 @@ class PlacedTrack:
@dataclass @dataclass
class PlaceResult: class PlaceResult:
placed: list[PlacedTrack] placed: list[PlacedTrack]
skipped: list[tuple[SubtitleCandidate, str]] # (track, reason) skipped: list[tuple[SubtitleScanResult, str]] # (track, reason)
@property @property
def placed_count(self) -> int: def placed_count(self) -> int:
@@ -54,7 +54,7 @@ class PlaceResult:
class SubtitlePlacer: class SubtitlePlacer:
""" """
Hard-links matched SubtitleCandidate files next to a destination video. Hard-links matched SubtitleScanResult files next to a destination video.
Uses the same hard-link strategy as FileManager.copy_file: Uses the same hard-link strategy as FileManager.copy_file:
instant, no data duplication, qBittorrent keeps seeding. instant, no data duplication, qBittorrent keeps seeding.
@@ -64,11 +64,11 @@ class SubtitlePlacer:
def place( def place(
self, self,
tracks: list[SubtitleCandidate], tracks: list[SubtitleScanResult],
destination_video: Path, destination_video: Path,
) -> PlaceResult: ) -> PlaceResult:
placed: list[PlacedTrack] = [] placed: list[PlacedTrack] = []
skipped: list[tuple[SubtitleCandidate, str]] = [] skipped: list[tuple[SubtitleScanResult, str]] = []
dest_dir = destination_video.parent dest_dir = destination_video.parent
@@ -2,7 +2,7 @@
import logging import logging
from alfred.infrastructure.api.qbittorrent import ( from alfred.infrastructure.api_TO_CHECK.qbittorrent import (
QBittorrentAPIError, QBittorrentAPIError,
QBittorrentAuthError, QBittorrentAuthError,
QBittorrentClient, QBittorrentClient,
@@ -2,7 +2,7 @@
import logging import logging
from alfred.infrastructure.api.knaben import ( from alfred.infrastructure.api_TO_CHECK.knaben import (
KnabenAPIError, KnabenAPIError,
KnabenClient, KnabenClient,
KnabenNotFoundError, KnabenNotFoundError,
@@ -0,0 +1,21 @@
"""TV-show orchestrators — operate on the Alfred-managed TV library tree.
The TV library is a directory of show folders (one per TV show), each
holding season folders containing video files. Modules here walk this
tree and reconstruct on-disk :class:`SeriesRelease` aggregates by
reusing the existing release pipeline (``inspect_release``) rather
than duplicating its parse/probe logic.
"""
from .dto import SearchShowResponse, ShowHit
from .search_show import SearchShowUseCase
from .walker import SeasonFolder, ShowTree, walk_show
__all__ = [
"SearchShowResponse",
"SearchShowUseCase",
"SeasonFolder",
"ShowHit",
"ShowTree",
"walk_show",
]
@@ -0,0 +1,39 @@
"""TV show application DTOs."""
from dataclasses import dataclass, field
@dataclass(frozen=True)
class ShowHit:
"""One TV-show hit, flattened for transport to the agent."""
tmdb_id: int
name: str
first_air_year: int | None = None
def to_dict(self) -> dict:
out: dict = {"tmdb_id": self.tmdb_id, "name": self.name}
if self.first_air_year is not None:
out["first_air_year"] = self.first_air_year
return out
@dataclass
class SearchShowResponse:
"""Response from searching for a TV show."""
status: str
hits: list[ShowHit] = field(default_factory=list)
error: str | None = None
message: str | None = None
def to_dict(self):
result: dict = {"status": self.status}
if self.error:
result["error"] = self.error
result["message"] = self.message
else:
result["hits"] = [h.to_dict() for h in self.hits]
return result
@@ -0,0 +1,59 @@
"""Search TV show use case."""
import logging
from alfred.infrastructure.api_TO_CHECK.tmdb import (
TMDBAPIError,
TMDBClient,
TMDBConfigurationError,
)
from .dto import SearchShowResponse, ShowHit
logger = logging.getLogger(__name__)
class SearchShowUseCase:
"""List TV shows matching a free-text query via TMDB ``/search/tv``.
Symmetric to :class:`alfred.application.movies.SearchMovieUseCase`:
thin orchestrator, flattens domain VOs into agent-friendly
primitives, no ``imdb_id`` enrichment (caller follows up with
:meth:`TMDBClient.get_tv_show_info` on a chosen ``tmdb_id``).
"""
def __init__(self, tmdb_client: TMDBClient):
self.tmdb_client = tmdb_client
def execute(self, show_title: str) -> SearchShowResponse:
try:
results = self.tmdb_client.search_shows(show_title)
hits = [
ShowHit(
tmdb_id=r.tmdb_id.value,
name=r.name,
first_air_year=r.first_air_year,
)
for r in results
]
logger.info(f"search_shows({show_title!r}) → {len(hits)} hits")
return SearchShowResponse(status="ok", hits=hits)
except TMDBConfigurationError as e:
logger.error(f"TMDB configuration error: {e}")
return SearchShowResponse(
status="error", error="configuration_error", message=str(e)
)
except TMDBAPIError as e:
logger.error(f"TMDB API error: {e}")
return SearchShowResponse(
status="error", error="api_error", message=str(e)
)
except ValueError as e:
logger.error(f"Validation error: {e}")
return SearchShowResponse(
status="error", error="validation_failed", message=str(e)
)
@@ -0,0 +1,208 @@
"""Show tree walker — minimal filesystem traversal of a TV show folder.
The walker is intentionally dumb: it lists season folders, classifies
each one as PACK or EPISODIC by **inspecting its filesystem
structure**, and hands the orchestrator a flat list of video files
per season. It does not parse release names, run ffprobe, or
classify subtitle files. All of that intelligence lives in the
existing release pipeline (``inspect_release`` + downstream
services); the walker just hands the orchestrator the paths to feed
into that pipeline.
Folder convention
-----------------
Inside an Alfred-managed library, a show root looks like::
Foundation/
Foundation.S01.1080p.WEB-DL.x265-GROUP/ ← PACK season
Foundation.S01E01.1080p.WEB-DL.x265.mkv ← flat video
Foundation.S01E02.1080p.WEB-DL.x265.mkv
...
Foundation.S02/ ← EPISODIC season
Foundation.S02E01.1080p.WEB-DL.x265-GROUP/ ← episode subfolder
Foundation.S02E01.1080p.WEB-DL.x265-GROUP.mkv
Foundation.S02E02.1080p.WEB-DL.x265-OTHER/
Foundation.S02E02.1080p.WEB-DL.x265-OTHER.mkv
The walker recognizes a season folder by a ``Sxx`` token anywhere in
its name (case-insensitive). It does **not** care about Plex-style
names (``Season 01``, ``Specials``) — the Alfred library uses
release-style folder names only.
PACK vs EPISODIC is a **structural distinction**, not a naming one:
* **PACK** — season folder contains N flat video files. No
subfolders.
* **EPISODIC** — season folder contains N subfolders, each holding
exactly one video.
A season folder that mixes the two layouts (some flat videos AND
some subfolders) is malformed: the walker reports
``mode=None`` and an empty ``video_files`` tuple so the
orchestrator can warn and skip it.
"""
from __future__ import annotations
import logging
import re
from dataclasses import dataclass
from pathlib import Path
from alfred.domain.releases_TO_CHECK.ports import ReleaseKnowledge
from alfred.domain.releases_TO_CHECK.value_objects import ReleaseMode
from alfred.domain.shared_TO_CHECK.ports import FilesystemScanner
_LOG = logging.getLogger(__name__)
# Matches any ``Sxx`` token (1-2 digits) bounded by non-alphanumerics.
# Examples that match: ``Foundation.S01.1080p`` , ``S2.Pack`` , ``BBC.s10.bluray``.
# Examples that don't: ``Sample`` , ``Soundtrack`` , ``2024.S0E1`` (no S+digits boundary).
_SEASON_TOKEN_RE = re.compile(r"(?<![A-Za-z0-9])s(\d{1,2})(?![A-Za-z0-9])", re.IGNORECASE)
@dataclass(frozen=True)
class SeasonFolder:
"""One season folder discovered inside a show root.
``mode`` is set by the walker from the FS structure:
* :attr:`ReleaseMode.PACK` — ``video_files`` lists the season
folder's flat videos.
* :attr:`ReleaseMode.EPISODIC` — ``video_files`` lists each
episode subfolder's single video.
* ``None`` — the folder is empty, malformed (mixed layout), or
otherwise unclassifiable. ``video_files`` is empty. The
orchestrator decides whether to warn/skip.
"""
season_dir: Path
mode: ReleaseMode | None
video_files: tuple[Path, ...]
@dataclass(frozen=True)
class ShowTree:
"""The full structural snapshot of a show on disk."""
show_root: Path
season_folders: tuple[SeasonFolder, ...]
def walk_show(
show_root: Path,
*,
scanner: FilesystemScanner,
kb: ReleaseKnowledge,
) -> ShowTree:
"""Walk ``show_root`` and return its structural tree.
The walker:
* lists direct children of ``show_root``,
* keeps the directories whose name contains a ``Sxx`` token,
* classifies each season folder as PACK / EPISODIC / unknown by
inspecting its direct children (videos vs subfolders),
* for EPISODIC, descends one extra level into each episode
subfolder to collect its single video,
* sorts season folders by name and video files by name within
each folder.
The walker never raises — empty / unreadable / malformed
directories surface as a ``SeasonFolder`` with ``mode=None`` and
an empty ``video_files`` tuple.
"""
video_exts = {ext.lower() for ext in kb.video_extensions}
season_folders: list[SeasonFolder] = []
for entry in scanner.scan_dir(show_root):
if not entry.is_dir or not _SEASON_TOKEN_RE.search(entry.name):
continue
season_folders.append(
_classify_season(entry.path, scanner=scanner, video_exts=video_exts)
)
return ShowTree(
show_root=show_root, season_folders=tuple(season_folders)
)
# --------------------------------------------------------------------------- #
# Season-folder classification #
# --------------------------------------------------------------------------- #
def _classify_season(
season_dir: Path,
*,
scanner: FilesystemScanner,
video_exts: set[str],
) -> SeasonFolder:
"""Inspect one season folder and decide PACK / EPISODIC / unknown.
Looks only at direct children. For EPISODIC, descends one extra
level into each subfolder to collect its single video. Mixed
layouts (flat videos + subfolders) are reported as ``mode=None``
so the orchestrator can skip them with a warning.
"""
flat_videos: list[Path] = []
subdirs: list[Path] = []
for child in scanner.scan_dir(season_dir):
if child.is_file and child.suffix.lower() in video_exts:
flat_videos.append(child.path)
elif child.is_dir:
subdirs.append(child.path)
# Anything else (non-video files like .nfo, .srt at the season
# root) is ignored — it doesn't affect classification.
has_flat = bool(flat_videos)
has_subdirs = bool(subdirs)
if has_flat and has_subdirs:
_LOG.warning(
"walker: season folder %s mixes flat videos and subfolders — "
"malformed layout, skipping",
season_dir,
)
return SeasonFolder(season_dir=season_dir, mode=None, video_files=())
if has_flat:
return SeasonFolder(
season_dir=season_dir,
mode=ReleaseMode.PACK,
video_files=tuple(sorted(flat_videos)),
)
if has_subdirs:
episode_videos: list[Path] = []
for sub in sorted(subdirs):
videos_in_sub = [
child.path
for child in scanner.scan_dir(sub)
if child.is_file and child.suffix.lower() in video_exts
]
if len(videos_in_sub) == 0:
_LOG.warning(
"walker: episode subfolder %s contains no video — skipping",
sub,
)
continue
if len(videos_in_sub) > 1:
_LOG.warning(
"walker: episode subfolder %s contains %d videos — "
"malformed, skipping season %s",
sub,
len(videos_in_sub),
season_dir,
)
return SeasonFolder(
season_dir=season_dir, mode=None, video_files=()
)
episode_videos.append(videos_in_sub[0])
return SeasonFolder(
season_dir=season_dir,
mode=ReleaseMode.EPISODIC,
video_files=tuple(episode_videos),
)
# No flat videos, no subdirs → empty season folder.
return SeasonFolder(season_dir=season_dir, mode=None, video_files=())
-104
View File
@@ -1,104 +0,0 @@
"""Movie domain entities."""
from dataclasses import dataclass, field
from datetime import datetime
from ..shared.media import AudioTrack, MediaWithTracks, SubtitleTrack
from ..shared.value_objects import FilePath, FileSize, ImdbId
from .value_objects import MovieTitle, Quality, ReleaseYear
@dataclass(eq=False)
class Movie(MediaWithTracks):
"""
Movie aggregate root for the movies domain.
Carries file metadata (path, size) and the tracks discovered by the
ffprobe + subtitle scan pipeline. The track lists may be empty when the
movie is known but not yet scanned, or when no file is downloaded.
Track helpers follow the same "C+" contract as ``Episode``: pass a
``Language`` for cross-format matching, or a ``str`` for case-insensitive
direct comparison.
Equality is identity-based: two ``Movie`` instances are equal iff they
share the same ``imdb_id``, regardless of file/track contents. This is
the DDD aggregate invariant — the aggregate is identified by its root id.
"""
imdb_id: ImdbId
title: MovieTitle
release_year: ReleaseYear | None = None
quality: Quality = Quality.UNKNOWN
file_path: FilePath | None = None
file_size: FileSize | None = None
tmdb_id: int | None = None
added_at: datetime = field(default_factory=datetime.now)
audio_tracks: list[AudioTrack] = field(default_factory=list)
subtitle_tracks: list[SubtitleTrack] = field(default_factory=list)
def __post_init__(self):
"""Validate movie entity."""
# Ensure ImdbId is actually an ImdbId instance
if not isinstance(self.imdb_id, ImdbId):
if isinstance(self.imdb_id, str):
self.imdb_id = ImdbId(self.imdb_id)
else:
raise ValueError(
f"imdb_id must be ImdbId or str, got {type(self.imdb_id)}"
)
# Ensure MovieTitle is actually a MovieTitle instance
if not isinstance(self.title, MovieTitle):
if isinstance(self.title, str):
self.title = MovieTitle(self.title)
else:
raise ValueError(
f"title must be MovieTitle or str, got {type(self.title)}"
)
def __eq__(self, other: object) -> bool:
if not isinstance(other, Movie):
return NotImplemented
return self.imdb_id == other.imdb_id
def __hash__(self) -> int:
return hash(self.imdb_id)
# Track helpers (has_audio_in / audio_languages / has_subtitles_in /
# has_forced_subs / subtitle_languages) come from MediaWithTracks.
def get_folder_name(self) -> str:
"""
Get the folder name for this movie.
Format: "Title (Year)"
Example: "Inception (2010)"
"""
if self.release_year:
return f"{self.title.value} ({self.release_year.value})"
return self.title.value
def get_filename(self) -> str:
"""
Get the suggested filename for this movie.
Format: "Title.Year.Quality.ext"
Example: "Inception.2010.1080p.mkv"
"""
parts = [self.title.normalized()]
if self.release_year:
parts.append(str(self.release_year.value))
if self.quality != Quality.UNKNOWN:
parts.append(self.quality.value)
# Extension will be added based on actual file
return ".".join(parts)
def __str__(self) -> str:
return f"{self.title.value} ({self.release_year.value if self.release_year else 'Unknown'})"
def __repr__(self) -> str:
return f"Movie(imdb_id={self.imdb_id}, title='{self.title.value}')"
-73
View File
@@ -1,73 +0,0 @@
"""Movie repository interfaces (abstract)."""
from abc import ABC, abstractmethod
from ..shared.value_objects import ImdbId
from .entities import Movie
class MovieRepository(ABC):
"""
Abstract repository for movie persistence.
This defines the interface that infrastructure implementations must follow.
"""
@abstractmethod
def save(self, movie: Movie) -> None:
"""
Save a movie to the repository.
Args:
movie: Movie entity to save
"""
pass
@abstractmethod
def find_by_imdb_id(self, imdb_id: ImdbId) -> Movie | None:
"""
Find a movie by its IMDb ID.
Args:
imdb_id: IMDb ID to search for
Returns:
Movie if found, None otherwise
"""
pass
@abstractmethod
def find_all(self) -> list[Movie]:
"""
Get all movies in the repository.
Returns:
List of all movies
"""
pass
@abstractmethod
def delete(self, imdb_id: ImdbId) -> bool:
"""
Delete a movie from the repository.
Args:
imdb_id: IMDb ID of the movie to delete
Returns:
True if deleted, False if not found
"""
pass
@abstractmethod
def exists(self, imdb_id: ImdbId) -> bool:
"""
Check if a movie exists in the repository.
Args:
imdb_id: IMDb ID to check
Returns:
True if exists, False otherwise
"""
pass
+91
View File
@@ -0,0 +1,91 @@
"""Movie domain entities."""
from dataclasses import dataclass
from ..shared_TO_CHECK.value_objects import ImdbId, TmdbId
from .value_objects import MovieTitle, ReleaseYear
@dataclass(frozen=True, eq=False)
class Movie:
"""
Movie aggregate root for the movies domain.
TMDB-only aggregate: carries identity (``tmdb_id`` + optional
``imdb_id``) plus the catalog facts that come from TMDB (``title``,
``release_year``). Filesystem-side concerns (file path, quality,
tracks, ``added_at``) live on :class:`alfred.domain.releases.entities.
MovieRelease`, the per-movie release aggregate persisted alongside.
Frozen: rebuild via ``dataclasses.replace`` to project metadata
updates (e.g. a TMDB refresh) onto a new instance.
Equality is identity-based on ``tmdb_id``: two ``Movie`` instances
are equal iff they share the same primary key. ``imdb_id`` is a
secondary anchor and not part of the identity.
"""
tmdb_id: TmdbId
title: MovieTitle
imdb_id: ImdbId | None = None
release_year: ReleaseYear | None = None
def __post_init__(self) -> None:
if not isinstance(self.tmdb_id, TmdbId):
raise ValueError(
f"tmdb_id must be TmdbId, got {type(self.tmdb_id)}"
)
if not isinstance(self.title, MovieTitle):
if isinstance(self.title, str):
object.__setattr__(self, "title", MovieTitle(self.title))
else:
raise ValueError(
f"title must be MovieTitle or str, got {type(self.title)}"
)
if self.imdb_id is not None and not isinstance(self.imdb_id, ImdbId):
raise ValueError(
f"imdb_id must be ImdbId or None, got {type(self.imdb_id)}"
)
def __eq__(self, other: object) -> bool:
if not isinstance(other, Movie):
return NotImplemented
return self.tmdb_id == other.tmdb_id
def __hash__(self) -> int:
return hash(self.tmdb_id)
# WRONG
def get_folder_name(self) -> str:
"""
Get the folder name for this movie.
Format: "Title (Year)"
Example: "Inception (2010)"
"""
if self.release_year:
return f"{self.title.value} ({self.release_year.value})"
return self.title.value
# WRONG
def get_filename(self) -> str:
"""
Get the suggested base filename (without extension) for this movie.
Format: ``Title.Year`` (quality lives on
:class:`alfred.domain.releases.entities.MovieRelease` now and is
appended by the release-aware caller — typically the rescan /
organize flow, after Phase 4).
Example: ``Inception.2010``.
"""
parts = [self.title.normalized()]
if self.release_year:
parts.append(str(self.release_year.value))
return ".".join(parts)
def __str__(self) -> str:
return f"{self.title.value} ({self.release_year.value if self.release_year else 'Unknown'})"
def __repr__(self) -> str:
return f"Movie(tmdb_id={self.tmdb_id}, title='{self.title.value}')"
@@ -1,6 +1,6 @@
"""Movie domain exceptions.""" """Movie domain exceptions."""
from ..shared.exceptions import DomainException, NotFoundError from ..shared_TO_CHECK.exceptions import DomainException, NotFoundError
class MovieNotFound(NotFoundError): class MovieNotFound(NotFoundError):
@@ -3,8 +3,7 @@
from dataclasses import dataclass from dataclasses import dataclass
from enum import Enum from enum import Enum
from ..shared.exceptions import ValidationError from ..shared_TO_CHECK.exceptions import ValidationError
from ..shared.value_objects import to_dot_folder_name
class Quality(Enum): class Quality(Enum):
@@ -56,18 +55,11 @@ class MovieTitle:
f"Movie title must be a string, got {type(self.value)}" f"Movie title must be a string, got {type(self.value)}"
) )
if len(self.value) > 500: if len(self.value) > 150:
raise ValidationError( raise ValidationError(
f"Movie title too long: {len(self.value)} characters (max 500)" f"Movie title too long: {len(self.value)} characters (max 150)"
) )
def normalized(self) -> str:
"""
Return normalized title for file system usage.
Removes special characters and replaces spaces with dots.
"""
return to_dot_folder_name(self.value)
def __str__(self) -> str: def __str__(self) -> str:
return self.value return self.value
@@ -93,10 +85,6 @@ class ReleaseYear:
f"Release year must be an integer, got {type(self.value)}" f"Release year must be an integer, got {type(self.value)}"
) )
# Movies started around 1888, and we shouldn't have movies from the future
if self.value < 1888 or self.value > 2100:
raise ValidationError(f"Invalid release year: {self.value}")
def __str__(self) -> str: def __str__(self) -> str:
return str(self.value) return str(self.value)
-6
View File
@@ -1,6 +0,0 @@
"""Release domain — release name parsing and naming conventions."""
from .services import parse_release
from .value_objects import ParsedRelease
__all__ = ["ParsedRelease", "parse_release"]
-52
View File
@@ -1,52 +0,0 @@
"""ReleaseKnowledge port — the read-only query surface that
``parse_release`` and ``ParsedRelease`` need from the release knowledge
base, expressed as a structural Protocol so the domain never imports any
concrete loader.
The concrete YAML-backed implementation lives in
``alfred/infrastructure/knowledge/release_kb.py``. Tests can supply any
object that satisfies this shape (e.g. a simple dataclass).
"""
from __future__ import annotations
from typing import Protocol
class ReleaseKnowledge(Protocol):
"""Read-only snapshot of release-name parsing knowledge."""
# --- Token sets used by the tokenizer / matchers ---
resolutions: set[str]
sources: set[str]
codecs: set[str]
language_tokens: set[str]
forbidden_chars: set[str]
hdr_extra: set[str]
# --- Structured knowledge (loaded from YAML as dicts) ---
audio: dict
video_meta: dict
editions: dict
media_type_tokens: dict
# --- Tokenizer separators ---
separators: list[str]
# --- File-extension sets (used by application/infra modules that work
# directly with filesystem paths, e.g. media-type detection, video
# lookup). Domain parsing itself doesn't touch these. ---
video_extensions: set[str]
non_video_extensions: set[str]
subtitle_extensions: set[str]
metadata_extensions: set[str]
# --- Filesystem sanitization (Option B: pre-sanitize at parse time) ---
def sanitize_for_fs(self, text: str) -> str:
"""Strip filesystem-forbidden characters from ``text``."""
...
-506
View File
@@ -1,506 +0,0 @@
"""Release domain — parsing service."""
from __future__ import annotations
import re
from .ports import ReleaseKnowledge
from .value_objects import MediaTypeToken, ParsedRelease, ParsePath
def _tokenize(name: str, kb: ReleaseKnowledge) -> list[str]:
"""Split a release name on the configured separators, dropping empty tokens."""
pattern = "[" + re.escape("".join(kb.separators)) + "]+"
return [t for t in re.split(pattern, name) if t]
def parse_release(name: str, kb: ReleaseKnowledge) -> ParsedRelease:
"""
Parse a release name and return a ParsedRelease.
Flow:
1. Strip a leading/trailing [site.tag] if present (sets parse_path="sanitized").
2. Check the remainder for truly forbidden chars (anything not in the
configured separators list). If any remain → media_type="unknown",
parse_path="ai", and the LLM handles it.
3. Tokenize using the configured separators (".", " ", "[", "]", "(", ")", "_", ...)
and run token-level matchers (season/episode, tech, languages, audio,
video, edition, title, year).
"""
parse_path = ParsePath.DIRECT.value
# Always try to extract a bracket-enclosed site tag first.
clean, site_tag = _strip_site_tag(name)
if site_tag is not None:
parse_path = ParsePath.SANITIZED.value
if not _is_well_formed(clean, kb):
return ParsedRelease(
raw=name,
normalised=clean,
title=clean,
title_sanitized=kb.sanitize_for_fs(clean),
year=None,
season=None,
episode=None,
episode_end=None,
quality=None,
source=None,
codec=None,
group="UNKNOWN",
tech_string="",
media_type=MediaTypeToken.UNKNOWN.value,
site_tag=site_tag,
parse_path=ParsePath.AI.value,
)
name = clean
tokens = _tokenize(name, kb)
season, episode, episode_end = _extract_season_episode(tokens)
quality, source, codec, group, tech_tokens = _extract_tech(tokens, kb)
languages, lang_tokens = _extract_languages(tokens, kb)
audio_codec, audio_channels, audio_tokens = _extract_audio(tokens, kb)
bit_depth, hdr_format, video_tokens = _extract_video_meta(tokens, kb)
edition, edition_tokens = _extract_edition(tokens, kb)
title = _extract_title(
tokens,
tech_tokens | lang_tokens | audio_tokens | video_tokens | edition_tokens,
kb,
)
year = _extract_year(tokens, title)
media_type = _infer_media_type(
season, quality, source, codec, year, edition, tokens, kb
)
tech_parts = [p for p in [quality, source, codec] if p]
tech_string = ".".join(tech_parts)
return ParsedRelease(
raw=name,
normalised=name,
title=title,
title_sanitized=kb.sanitize_for_fs(title),
year=year,
season=season,
episode=episode,
episode_end=episode_end,
quality=quality,
source=source,
codec=codec,
group=group,
tech_string=tech_string,
media_type=media_type,
site_tag=site_tag,
parse_path=parse_path,
languages=languages,
audio_codec=audio_codec,
audio_channels=audio_channels,
bit_depth=bit_depth,
hdr_format=hdr_format,
edition=edition,
)
def _infer_media_type(
season: int | None,
quality: str | None,
source: str | None,
codec: str | None,
year: int | None,
edition: str | None,
tokens: list[str],
kb: ReleaseKnowledge,
) -> str:
"""
Infer media_type from token-level evidence only (no filesystem access).
- documentary : DOC token present
- concert : CONCERT token present
- tv_complete : INTEGRALE/COMPLETE token, no season
- tv_show : season token found
- movie : no season, at least one tech marker
- unknown : no conclusive evidence
"""
upper_tokens = {t.upper() for t in tokens}
doc_tokens = {t.upper() for t in kb.media_type_tokens.get("doc", [])}
concert_tokens = {t.upper() for t in kb.media_type_tokens.get("concert", [])}
integrale_tokens = {t.upper() for t in kb.media_type_tokens.get("integrale", [])}
if upper_tokens & doc_tokens:
return MediaTypeToken.DOCUMENTARY.value
if upper_tokens & concert_tokens:
return MediaTypeToken.CONCERT.value
if (
edition in {"COMPLETE", "INTEGRALE", "COLLECTION"}
or upper_tokens & integrale_tokens
) and season is None:
return MediaTypeToken.TV_COMPLETE.value
if season is not None:
return MediaTypeToken.TV_SHOW.value
if any([quality, source, codec, year]):
return MediaTypeToken.MOVIE.value
return MediaTypeToken.UNKNOWN.value
def _is_well_formed(name: str, kb: ReleaseKnowledge) -> bool:
"""Return True if name contains no forbidden characters per scene naming rules.
Characters listed as token separators (spaces, brackets, parens, …) are NOT
considered malforming — the tokenizer handles them. Only truly broken chars
like '@', '#', '!', '%' make a name malformed.
"""
tokenizable = set(kb.separators)
return not any(c in name for c in kb.forbidden_chars if c not in tokenizable)
def _strip_site_tag(name: str) -> tuple[str, str | None]:
"""
Strip a site watermark tag from the release name and return (clean_name, tag).
Handles two positions:
- Prefix: "[ OxTorrent.vc ] The.Title.S01..."
- Suffix: "The.Title.S01...-NTb[TGx]"
Anything between [...] is treated as a site tag.
Returns (original_name, None) if no tag found.
"""
s = name.strip()
if s.startswith("["):
close = s.find("]")
if close != -1:
tag = s[1:close].strip()
remainder = s[close + 1 :].strip()
if tag and remainder:
return remainder, tag
if s.endswith("]"):
open_bracket = s.rfind("[")
if open_bracket != -1:
tag = s[open_bracket + 1 : -1].strip()
remainder = s[:open_bracket].strip()
if tag and remainder:
return remainder, tag
return s, None
def _parse_season_episode(tok: str) -> tuple[int, int | None, int | None] | None:
"""
Parse a single token as a season/episode marker.
Handles:
- SxxExx / SxxExxExx / Sxx (canonical scene form)
- NxNN / NxNNxNN (alt form: 1x05, 12x07x08)
Returns (season, episode, episode_end) or None if not a season token.
"""
upper = tok.upper()
# SxxExx form
if len(upper) >= 3 and upper[0] == "S" and upper[1:3].isdigit():
season = int(upper[1:3])
rest = upper[3:]
if not rest:
return season, None, None
episodes: list[int] = []
while rest.startswith("E") and len(rest) >= 3 and rest[1:3].isdigit():
episodes.append(int(rest[1:3]))
rest = rest[3:]
if not episodes:
return None # malformed token like "S03XYZ"
return season, episodes[0], episodes[1] if len(episodes) >= 2 else None
# NxNN form — split on "X" (uppercased), all parts must be digits
if "X" in upper:
parts = upper.split("X")
if len(parts) >= 2 and all(p.isdigit() and p for p in parts):
season = int(parts[0])
episode = int(parts[1])
episode_end = int(parts[2]) if len(parts) >= 3 else None
return season, episode, episode_end
return None
def _extract_season_episode(
tokens: list[str],
) -> tuple[int | None, int | None, int | None]:
for tok in tokens:
parsed = _parse_season_episode(tok)
if parsed is not None:
return parsed
return None, None, None
def _extract_tech(
tokens: list[str],
kb: ReleaseKnowledge,
) -> tuple[str | None, str | None, str | None, str, set[str]]:
"""
Extract quality, source, codec, group from tokens.
Returns (quality, source, codec, group, tech_token_set).
Group extraction strategy (in priority order):
1. Token where prefix is a known codec: x265-GROUP
2. Rightmost token with a dash that isn't a known source
"""
quality: str | None = None
source: str | None = None
codec: str | None = None
group = "UNKNOWN"
tech_tokens: set[str] = set()
for tok in tokens:
tl = tok.lower()
if tl in kb.resolutions:
quality = tok
tech_tokens.add(tok)
continue
if tl in kb.sources:
source = tok
tech_tokens.add(tok)
continue
if "-" in tok:
parts = tok.rsplit("-", 1)
# codec-GROUP (highest priority for group)
if parts[0].lower() in kb.codecs:
codec = parts[0]
group = parts[1] if parts[1] else "UNKNOWN"
tech_tokens.add(tok)
continue
# source with dash: Web-DL, WEB-DL, etc.
if parts[0].lower() in kb.sources or tok.lower().replace("-", "") in kb.sources:
source = tok
tech_tokens.add(tok)
continue
if tl in kb.codecs:
codec = tok
tech_tokens.add(tok)
# Fallback: rightmost token with a dash that isn't a known source
if group == "UNKNOWN":
for tok in reversed(tokens):
if "-" in tok:
parts = tok.rsplit("-", 1)
tl = tok.lower()
if tl in kb.sources or tok.lower().replace("-", "") in kb.sources:
continue
if parts[1]:
group = parts[1]
break
return quality, source, codec, group, tech_tokens
def _is_year_token(tok: str) -> bool:
"""Return True if tok is a 4-digit year between 1900 and 2099."""
return len(tok) == 4 and tok.isdigit() and 1900 <= int(tok) <= 2099
def _extract_title(
tokens: list[str], tech_tokens: set[str], kb: ReleaseKnowledge
) -> str:
"""Extract the title portion: everything before the first season/year/tech token."""
title_parts = []
known_tech = kb.resolutions | kb.sources | kb.codecs
for tok in tokens:
if _parse_season_episode(tok) is not None:
break
if _is_year_token(tok):
break
if tok in tech_tokens or tok.lower() in known_tech:
break
if "-" in tok and any(p.lower() in kb.codecs | kb.sources for p in tok.split("-")):
break
title_parts.append(tok)
return ".".join(title_parts) if title_parts else tokens[0]
def _extract_year(tokens: list[str], title: str) -> int | None:
"""Extract a 4-digit year from tokens (only after the title)."""
title_len = len(title.split("."))
for tok in tokens[title_len:]:
if _is_year_token(tok):
return int(tok)
return None
# ---------------------------------------------------------------------------
# Sequence matcher
# ---------------------------------------------------------------------------
def _match_sequences(
tokens: list[str],
sequences: list[dict],
key: str,
) -> tuple[str | None, set[str]]:
"""
Try to match multi-token sequences against consecutive tokens.
Returns (matched_value, set_of_matched_tokens) or (None, empty_set).
Sequences must be ordered most-specific first in the YAML.
"""
upper_tokens = [t.upper() for t in tokens]
for seq in sequences:
seq_upper = [s.upper() for s in seq["tokens"]]
n = len(seq_upper)
for i in range(len(upper_tokens) - n + 1):
if upper_tokens[i : i + n] == seq_upper:
matched = set(tokens[i : i + n])
return seq[key], matched
return None, set()
# ---------------------------------------------------------------------------
# Language extraction
# ---------------------------------------------------------------------------
def _extract_languages(
tokens: list[str], kb: ReleaseKnowledge
) -> tuple[list[str], set[str]]:
"""Extract language tokens. Returns (languages, matched_token_set)."""
languages = []
lang_tokens: set[str] = set()
for tok in tokens:
if tok.upper() in kb.language_tokens:
languages.append(tok.upper())
lang_tokens.add(tok)
return languages, lang_tokens
# ---------------------------------------------------------------------------
# Audio extraction
# ---------------------------------------------------------------------------
def _extract_audio(
tokens: list[str], kb: ReleaseKnowledge,
) -> tuple[str | None, str | None, set[str]]:
"""
Extract audio codec and channel layout.
Returns (audio_codec, audio_channels, matched_token_set).
Sequences are tried first (DTS.HD.MA, TrueHD.Atmos, …), then single tokens.
"""
audio_codec: str | None = None
audio_channels: str | None = None
audio_tokens: set[str] = set()
known_codecs = {c.upper() for c in kb.audio.get("codecs", [])}
known_channels = set(kb.audio.get("channels", []))
# Try multi-token sequences first
matched_codec, matched_set = _match_sequences(
tokens, kb.audio.get("sequences", []), "codec"
)
if matched_codec:
audio_codec = matched_codec
audio_tokens |= matched_set
# Channel layouts like "5.1" or "7.1" are split into two tokens by normalize —
# detect them as consecutive pairs "X" + "Y" where "X.Y" is a known channel.
# The second token may have a "-GROUP" suffix (e.g. "1-KTH" → strip it).
for i in range(len(tokens) - 1):
second = tokens[i + 1].split("-")[0]
candidate = f"{tokens[i]}.{second}"
if candidate in known_channels and audio_channels is None:
audio_channels = candidate
audio_tokens.add(tokens[i])
audio_tokens.add(tokens[i + 1])
for tok in tokens:
if tok in audio_tokens:
continue
if tok.upper() in known_codecs and audio_codec is None:
audio_codec = tok
audio_tokens.add(tok)
elif tok in known_channels and audio_channels is None:
audio_channels = tok
audio_tokens.add(tok)
return audio_codec, audio_channels, audio_tokens
# ---------------------------------------------------------------------------
# Video metadata extraction (bit depth, HDR)
# ---------------------------------------------------------------------------
def _extract_video_meta(
tokens: list[str], kb: ReleaseKnowledge,
) -> tuple[str | None, str | None, set[str]]:
"""
Extract bit depth and HDR format.
Returns (bit_depth, hdr_format, matched_token_set).
"""
bit_depth: str | None = None
hdr_format: str | None = None
video_tokens: set[str] = set()
known_hdr = {h.upper() for h in kb.video_meta.get("hdr", [])} | kb.hdr_extra
known_depth = {d.lower() for d in kb.video_meta.get("bit_depth", [])}
# Try HDR sequences first
matched_hdr, matched_set = _match_sequences(
tokens, kb.video_meta.get("sequences", []), "hdr"
)
if matched_hdr:
hdr_format = matched_hdr
video_tokens |= matched_set
for tok in tokens:
if tok in video_tokens:
continue
if tok.upper() in known_hdr and hdr_format is None:
hdr_format = tok.upper()
video_tokens.add(tok)
elif tok.lower() in known_depth and bit_depth is None:
bit_depth = tok.lower()
video_tokens.add(tok)
return bit_depth, hdr_format, video_tokens
# ---------------------------------------------------------------------------
# Edition extraction
# ---------------------------------------------------------------------------
def _extract_edition(
tokens: list[str], kb: ReleaseKnowledge
) -> tuple[str | None, set[str]]:
"""
Extract release edition (UNRATED, EXTENDED, DIRECTORS.CUT, …).
Returns (edition, matched_token_set).
"""
known_tokens = {t.upper() for t in kb.editions.get("tokens", [])}
# Try multi-token sequences first
matched_edition, matched_set = _match_sequences(
tokens, kb.editions.get("sequences", []), "edition"
)
if matched_edition:
return matched_edition, matched_set
for tok in tokens:
if tok.upper() in known_tokens:
return tok.upper(), {tok}
return None, set()
@@ -0,0 +1,38 @@
"""Filesystem release aggregates — what the user owns on disk.
This bounded context is intentionally separated from
``alfred.domain.tv_shows`` / ``alfred.domain.movies`` (TMDB identity).
A :class:`SeriesRelease` describes the physical files on disk for one
show; a :class:`TVShow` describes the work as catalogued by TMDB. The
two are linked by :class:`~alfred.domain.shared.value_objects.TmdbId`
in the persistence layer, never by direct reference.
Not to be confused with ``alfred.domain.release`` (singular) which
parses release **names** (strings → tokens). The two packages may be
merged later; for now they coexist as separate concerns.
"""
from .builders import SeasonReleaseBuilder, SeriesReleaseBuilder
from .entities import (
EpisodeRelease,
MovieRelease,
SeasonRelease,
SeriesRelease,
TrackProfile,
)
from .repositories import MovieReleaseRepository, SeriesReleaseRepository
from .value_objects import EpisodeRange, ReleaseMode
__all__ = [
"EpisodeRange",
"EpisodeRelease",
"MovieRelease",
"MovieReleaseRepository",
"ReleaseMode",
"SeasonRelease",
"SeasonReleaseBuilder",
"SeriesRelease",
"SeriesReleaseBuilder",
"SeriesReleaseRepository",
"TrackProfile",
]
+243
View File
@@ -0,0 +1,243 @@
"""Builders for the filesystem release aggregates.
The aggregates are frozen — :class:`SeriesRelease`, :class:`SeasonRelease`,
and :class:`EpisodeRelease` are ``@dataclass(frozen=True)`` and offer no
mutation methods. All construction goes through these builders, which
assemble the aggregate piece by piece and emit a frozen instance via
``build()``.
Typical usage during a filesystem walk::
builder = SeriesReleaseBuilder(tmdb_id=TmdbId(84958), imdb_id=ImdbId("tt0804484"))
sb = builder.season_builder(SeasonNumber(1), folder="Show.S01", mode=ReleaseMode.PACK)
sb.add_episode(EpisodeRelease(
episodes=EpisodeRange(EpisodeNumber(1), EpisodeNumber(1)),
file_path=FilePath("Show.S01/Show.S01E01.mkv"),
tracks=TrackProfile(),
))
release = builder.build()
Builders are **single-use scratchpads**: they hold mutable state during
construction, then produce an immutable aggregate.
Invariants enforced at ``build()`` time:
* Seasons are emitted sorted by ``season_number``.
* Episodes within each season are emitted sorted by their
``EpisodeRange.start`` (so a season with ``E01-E03`` + ``E04`` is
emitted in that order).
* No two ``EpisodeRelease`` within a season may overlap (same TMDB
episode covered by two distinct files) — raises ``ValidationError``.
"""
from __future__ import annotations
from ..shared_TO_CHECK.exceptions import ValidationError
from ..shared_TO_CHECK.value_objects import ImdbId, TmdbId
from ..tv_shows.value_objects import SeasonNumber
from .entities import (
EpisodeRelease,
SeasonRelease,
SeriesRelease,
)
from .value_objects import ReleaseMode
# ════════════════════════════════════════════════════════════════════════════
# MovieReleaseBuilder
# ════════════════════════════════════════════════════════════════════════════
# ...
# ════════════════════════════════════════════════════════════════════════════
# SeasonReleaseBuilder
# ════════════════════════════════════════════════════════════════════════════
class SeasonReleaseBuilder:
"""
Mutable scratchpad for a :class:`SeasonRelease`.
Episodes are appended in arbitrary order; ``build()`` sorts them by
their range start before emitting the frozen aggregate and verifies
there are no overlapping ranges.
"""
def __init__(
self,
season_number: SeasonNumber | int,
*,
folder: str,
mode: ReleaseMode,
) -> None:
if isinstance(season_number, int):
season_number = SeasonNumber(season_number)
self._season_number: SeasonNumber = season_number
self._folder: str = folder
self._mode: ReleaseMode = mode
self._episodes: list[EpisodeRelease] = []
@classmethod
def from_existing(cls, season: SeasonRelease) -> SeasonReleaseBuilder:
"""Seed a builder from an existing frozen :class:`SeasonRelease`."""
builder = cls(
season.season_number,
folder=season.folder,
mode=season.mode,
)
builder._episodes = list(season.episodes)
return builder
@property
def season_number(self) -> SeasonNumber:
return self._season_number
@property
def mode(self) -> ReleaseMode:
return self._mode
def set_folder(self, folder: str) -> SeasonReleaseBuilder:
self._folder = folder
return self
def set_mode(self, mode: ReleaseMode) -> SeasonReleaseBuilder:
self._mode = mode
return self
def add_episode(self, episode: EpisodeRelease) -> SeasonReleaseBuilder:
"""Append a physical-file :class:`EpisodeRelease` to this season."""
self._episodes.append(episode)
return self
def build(self) -> SeasonRelease:
"""Emit a frozen :class:`SeasonRelease` with episodes sorted.
Raises :class:`ValidationError` if any two episode ranges overlap
(same TMDB slot claimed by two distinct files).
"""
ordered = tuple(
sorted(self._episodes, key=lambda ep: ep.episodes.start.value)
)
# Overlap check — ranges are inclusive on both ends, sorted by start.
for prev, curr in zip(ordered, ordered[1:], strict=False):
if curr.episodes.start.value <= prev.episodes.end.value:
raise ValidationError(
f"SeasonRelease season {self._season_number}: overlapping "
f"episode ranges {prev.episodes} and {curr.episodes}"
)
return SeasonRelease(
season_number=self._season_number,
folder=self._folder,
mode=self._mode,
episodes=ordered,
)
# ════════════════════════════════════════════════════════════════════════════
# SeriesReleaseBuilder
# ════════════════════════════════════════════════════════════════════════════
class SeriesReleaseBuilder:
"""
Mutable scratchpad for the :class:`SeriesRelease` aggregate root.
Seasons are tracked via internal :class:`SeasonReleaseBuilder`
instances keyed by :class:`SeasonNumber`.
"""
def __init__(
self,
*,
tmdb_id: TmdbId | int,
imdb_id: ImdbId | str | None = None,
) -> None:
if isinstance(tmdb_id, int):
tmdb_id = TmdbId(tmdb_id)
if isinstance(imdb_id, str):
imdb_id = ImdbId(imdb_id)
self._tmdb_id: TmdbId = tmdb_id
self._imdb_id: ImdbId | None = imdb_id
self._season_builders: dict[SeasonNumber, SeasonReleaseBuilder] = {}
@classmethod
def from_existing(cls, release: SeriesRelease) -> SeriesReleaseBuilder:
"""Seed a builder from an existing frozen :class:`SeriesRelease`."""
builder = cls(
tmdb_id=release.tmdb_id,
imdb_id=release.imdb_id,
)
for season in release.seasons:
builder._season_builders[season.season_number] = (
SeasonReleaseBuilder.from_existing(season)
)
return builder
# ── Top-level mutators ─────────────────────────────────────────────────
def set_imdb_id(self, imdb_id: ImdbId | str | None) -> SeriesReleaseBuilder:
if isinstance(imdb_id, str):
imdb_id = ImdbId(imdb_id)
self._imdb_id = imdb_id
return self
# ── Content ────────────────────────────────────────────────────────────
def season_builder(
self,
season_number: SeasonNumber | int,
*,
folder: str | None = None,
mode: ReleaseMode | None = None,
) -> SeasonReleaseBuilder:
"""
Return (creating if needed) the :class:`SeasonReleaseBuilder` for a
season.
``folder`` and ``mode`` are required when the builder does not yet
exist for this season; subsequent calls may pass them to override.
"""
if isinstance(season_number, int):
season_number = SeasonNumber(season_number)
sb = self._season_builders.get(season_number)
if sb is None:
if folder is None or mode is None:
raise ValidationError(
f"season_builder({season_number}): folder and mode "
f"are required to create a new season builder"
)
sb = SeasonReleaseBuilder(season_number, folder=folder, mode=mode)
self._season_builders[season_number] = sb
else:
if folder is not None:
sb.set_folder(folder)
if mode is not None:
sb.set_mode(mode)
return sb
def add_season(self, season: SeasonRelease) -> SeriesReleaseBuilder:
"""
Attach (or replace) a fully-built :class:`SeasonRelease`.
Replaces any existing season with the same number.
"""
self._season_builders[season.season_number] = (
SeasonReleaseBuilder.from_existing(season)
)
return self
# ── Emit ───────────────────────────────────────────────────────────────
def build(self) -> SeriesRelease:
"""Emit a frozen :class:`SeriesRelease` with seasons sorted by number."""
ordered_seasons = tuple(
self._season_builders[n].build()
for n in sorted(self._season_builders, key=lambda x: x.value)
)
return SeriesRelease(
tmdb_id=self._tmdb_id,
imdb_id=self._imdb_id,
seasons=ordered_seasons,
)
+217
View File
@@ -0,0 +1,217 @@
"""Filesystem release aggregates.
The release domain models what the user owns on disk — one
:class:`SeriesRelease` per show, one :class:`MovieRelease` per movie.
TMDB identity (title, status, episode_count, …) lives in the
``tv_shows`` / ``movies`` domains and is linked via the
:class:`~alfred.domain.shared.value_objects.TmdbId` natural key.
All entities are frozen. Mutation goes through the builders in
:mod:`alfred.domain.releases.builders`.
"""
from __future__ import annotations
from dataclasses import dataclass
from datetime import datetime
from ..shared_TO_CHECK.exceptions import ValidationError
from ..shared_TO_CHECK.media import AudioTrack, SubtitleTrack
from ..shared_TO_CHECK.value_objects import FilePath, ImdbId, TmdbId
from ..tv_shows.value_objects import SeasonNumber
from .value_objects import EpisodeRange, ReleaseMode
__all__ = [
"EpisodeRelease",
"MovieRelease",
"SeasonRelease",
"SeriesRelease",
"TrackProfile",
]
@dataclass(frozen=True)
class TrackProfile:
"""
Audio + subtitle tracks of one physical file.
Tracks live per-file (not per-season): every ``EpisodeRelease`` and
``MovieRelease`` carries its own ``TrackProfile``. Season-level
aggregation is computed by the caller when needed.
"""
audio_tracks: tuple[AudioTrack, ...] = ()
subtitle_tracks: tuple[SubtitleTrack, ...] = ()
@dataclass(frozen=True)
class EpisodeRelease:
"""
One physical episode file (or multi-episode file) on disk.
:attr:`episodes` is an :class:`EpisodeRange` — a single ``.mkv``
that covers ``S01E02E03`` carries ``EpisodeRange(start=E02, end=E03)``
and is recorded once. The library index lists it under each covered
slot (``E02``, ``E03``) for symmetric lookups.
:attr:`file_path` is **relative to the show root** (e.g.
``"Show.S01/Show.S01E02.mkv"`` for PACK,
``"Show.S01/Show.S01E02-RG/Show.S01E02-RG.mkv"`` for EPISODIC).
The caller (repository) prepends the absolute show root when
needed.
"""
episodes: EpisodeRange
file_path: FilePath
tracks: TrackProfile = TrackProfile()
@dataclass(frozen=True)
class SeasonRelease:
"""
All physical files on disk for one season of a show.
The :attr:`mode` flag records the filesystem layout:
* :attr:`ReleaseMode.PACK` — the season folder contains N video
files directly. ``episodes`` lists each ``.mkv`` in the folder.
* :attr:`ReleaseMode.EPISODIC` — the season folder contains N
sub-folders, each with one episode. ``episodes`` lists each
``(subfolder, file)`` pair.
:attr:`folder` is the season folder name, relative to the show root.
Invariant: every ``EpisodeRelease.episodes`` range stays within
sane bounds (validated at construction). Cross-episode duplicate
detection (two files claiming the same TMDB slot) is the
builder's job, not the entity's.
"""
season_number: SeasonNumber
folder: str
mode: ReleaseMode
episodes: tuple[EpisodeRelease, ...] = ()
def __post_init__(self) -> None:
if not isinstance(self.season_number, SeasonNumber):
raise ValidationError(
f"SeasonRelease.season_number must be SeasonNumber, "
f"got {type(self.season_number)}"
)
if not isinstance(self.mode, ReleaseMode):
raise ValidationError(
f"SeasonRelease.mode must be ReleaseMode, got {type(self.mode)}"
)
if not isinstance(self.folder, str) or not self.folder:
raise ValidationError(
f"SeasonRelease.folder must be a non-empty string, "
f"got {self.folder!r}"
)
def episode_count(self) -> int:
"""
Total number of TMDB episode slots covered by all physical files.
Sums each :meth:`EpisodeRange.count` — a season with two files
``E01`` + ``E02-E03`` returns ``3`` (one slot from the first
file, two from the second).
Compared by the caller against the library index's TMDB
``episode_count`` to detect incomplete seasons.
"""
return sum(ep.episodes.count() for ep in self.episodes)
@dataclass(frozen=True)
class SeriesRelease:
"""
All physical seasons on disk for one show.
Anchored to TMDB by :attr:`tmdb_id` (primary key). :attr:`imdb_id`
is optional and stored as a secondary anchor — useful for the
occasional show without TMDB coverage, and for cross-checking
when both ids are known.
Seasons are exposed sorted by ``season_number`` (the builder
enforces this on emit). No duplicate ``season_number`` is
permitted across :attr:`seasons`.
"""
tmdb_id: TmdbId
imdb_id: ImdbId | None
seasons: tuple[SeasonRelease, ...] = ()
def __post_init__(self) -> None:
if not isinstance(self.tmdb_id, TmdbId):
raise ValidationError(
f"SeriesRelease.tmdb_id must be TmdbId, got {type(self.tmdb_id)}"
)
if self.imdb_id is not None and not isinstance(self.imdb_id, ImdbId):
raise ValidationError(
f"SeriesRelease.imdb_id must be ImdbId or None, "
f"got {type(self.imdb_id)}"
)
seen: set[int] = set()
for s in self.seasons:
if s.season_number.value in seen:
raise ValidationError(
f"SeriesRelease has duplicate season "
f"{s.season_number}"
)
seen.add(s.season_number.value)
def get_season(self, season_number: SeasonNumber) -> SeasonRelease | None:
"""Return the :class:`SeasonRelease` for ``season_number`` or ``None``."""
for s in self.seasons:
if s.season_number == season_number:
return s
return None
@dataclass(frozen=True)
class MovieRelease:
"""
A single physical movie file on disk.
Anchored to TMDB by :attr:`tmdb_id`; :attr:`imdb_id` optional
secondary anchor.
:attr:`folder` is the movie folder name relative to the
``movies/`` library root. :attr:`file_path` is the video file
name relative to the folder (movies are one folder, one file in
Alfred's layout — no sub-folders).
:attr:`added_at` is the UTC timestamp at which the release was
first observed in the library — set by the caller (organizer /
rescan) when the aggregate is built. Persisted by the v2 movie
sidecar; not derived from the filesystem (mtime drifts across
moves and hard-links).
"""
tmdb_id: TmdbId
imdb_id: ImdbId | None
folder: str
file_path: FilePath
added_at: datetime
tracks: TrackProfile = TrackProfile()
def __post_init__(self) -> None:
if not isinstance(self.tmdb_id, TmdbId):
raise ValidationError(
f"MovieRelease.tmdb_id must be TmdbId, got {type(self.tmdb_id)}"
)
if self.imdb_id is not None and not isinstance(self.imdb_id, ImdbId):
raise ValidationError(
f"MovieRelease.imdb_id must be ImdbId or None, "
f"got {type(self.imdb_id)}"
)
if not isinstance(self.folder, str) or not self.folder:
raise ValidationError(
f"MovieRelease.folder must be a non-empty string, "
f"got {self.folder!r}"
)
if not isinstance(self.added_at, datetime):
raise ValidationError(
f"MovieRelease.added_at must be datetime, "
f"got {type(self.added_at)}"
)
@@ -0,0 +1,27 @@
"""Release parser v2 — annotate-based pipeline.
This package is the future home of ``parse_release``. It restructures the
parsing logic around a **tokenize → annotate → assemble** pipeline:
1. **tokenize**: split the release name into atomic tokens.
2. **annotate**: walk tokens left-to-right, assigning each one a
:class:`TokenRole` (TITLE, YEAR, SEASON, RESOLUTION, …) using the
injected :class:`~alfred.domain.release.ports.knowledge.ReleaseKnowledge`.
3. **assemble**: fold the annotated tokens into a :class:`ParsedRelease`.
The pipeline has three internal paths driven by the detected release group:
- **EASY**: known group (KONTRAST, RARBG, …) with a schema-driven layout
declared in ``knowledge/release/release_groups/<group>.yaml``.
- **SHITTY**: unknown group, best-effort matching against the global
knowledge sets, with a 0-100 confidence score.
- **PATH OF PAIN**: score below threshold OR critical chunks missing —
signaled to the caller, who decides whether to involve the LLM/user.
"""
from __future__ import annotations
from .schema import GroupSchema, SchemaChunk
from .tokens import Token, TokenRole
__all__ = ["GroupSchema", "SchemaChunk", "Token", "TokenRole"]
@@ -0,0 +1,762 @@
"""Annotate-based pipeline.
Three stages:
1. :func:`tokenize` — release name → ``list[Token]`` (all UNKNOWN), plus
a separately-returned site tag (e.g. ``[YTS.MX]``) that is never
tokenized.
2. :func:`annotate` — promote each token's :class:`TokenRole` using the
injected knowledge base. Two sub-passes:
a. **Structural** (schema-driven, EASY only). Detects the group at
the right end, looks up its :class:`GroupSchema`, then matches
the schema's chunk sequence against the token stream. Between
two structural chunks, any number of unmatched tokens may
remain — they are left UNKNOWN for the enricher pass to handle.
b. **Enrichers** (non-positional). Walks UNKNOWN tokens and tags
audio / video-meta / edition / language roles. Multi-token
sequences (``DTS.HD.MA``, ``DV.HDR10``, ``DIRECTORS.CUT``) are
matched first, single tokens after.
3. :func:`assemble` — fold annotated tokens into a
:class:`~alfred.domain.release.value_objects.ParsedRelease`-compatible
dict.
The pipeline is **pure**: no I/O, no TMDB, no probe. All knowledge
arrives through ``kb: ReleaseKnowledge``.
"""
from __future__ import annotations
from ..ports.knowledge import ReleaseKnowledge
from alfred.domain.releases_TO_CHECK.value_objects_old_question_mark import MediaTypeToken
from .schema import GroupSchema
from .tokens import Token, TokenRole
# ---------------------------------------------------------------------------
# Stage 1 — tokenize
# ---------------------------------------------------------------------------
def strip_site_tag(name: str) -> tuple[str, str | None]:
"""Split off a ``[site.tag]`` prefix or suffix.
Returns ``(clean_name, tag)``. If no tag is found, returns
``(name.strip(), None)``.
"""
s = name.strip()
if s.startswith("["):
close = s.find("]")
if close != -1:
tag = s[1:close].strip()
remainder = s[close + 1 :].strip()
if tag and remainder:
return remainder, tag
if s.endswith("]"):
open_bracket = s.rfind("[")
if open_bracket != -1:
tag = s[open_bracket + 1 : -1].strip()
remainder = s[:open_bracket].strip()
if tag and remainder:
return remainder, tag
return s, None
def tokenize(name: str, kb: ReleaseKnowledge) -> tuple[list[Token], str | None]:
"""Split ``name`` into tokens after stripping any site tag.
String-ops style: replace every configured separator with a single
NUL byte then split. NUL cannot legally appear in a release name, so
it's a safe sentinel.
"""
clean, site_tag = strip_site_tag(name)
DELIM = "\x00"
buf = clean
for sep in kb.separators:
if sep != DELIM:
buf = buf.replace(sep, DELIM)
pieces = [p for p in buf.split(DELIM) if p]
tokens = [Token(text=p, index=i) for i, p in enumerate(pieces)]
return tokens, site_tag
# ---------------------------------------------------------------------------
# Helpers shared across passes
# ---------------------------------------------------------------------------
def _parse_season_episode(text: str) -> tuple[int, int | None, int | None] | None:
"""Parse a single token as ``SxxExx`` / ``SxxExxExx`` / ``Sxx`` /
``Sxx-yy`` (season range) / ``NxNN``.
Returns ``(season, episode, episode_end)`` or ``None`` if the token
is not a season/episode marker. For ``Sxx-yy``, returns the first
season with no episode info — the caller is expected to detect the
range form and promote ``media_type`` to ``tv_complete`` separately.
"""
upper = text.upper()
# SxxExx form (and Sxx, Sxx-yy)
if len(upper) >= 3 and upper[0] == "S" and upper[1:3].isdigit():
season = int(upper[1:3])
rest = upper[3:]
if not rest:
return season, None, None
# Sxx-yy season-range form: capture the first season, treat as a
# complete-series marker (no episode info).
if (
len(rest) == 3
and rest[0] == "-"
and rest[1:3].isdigit()
):
return season, None, None
episodes: list[int] = []
while rest.startswith("E") and len(rest) >= 3 and rest[1:3].isdigit():
episodes.append(int(rest[1:3]))
rest = rest[3:]
if not episodes:
return None
# For chained multi-episode markers (E09E10E11), the range is the
# first → last episode. Intermediate values are implied.
return season, episodes[0], episodes[-1] if len(episodes) >= 2 else None
# NxNN form
if "X" in upper:
parts = upper.split("X")
if len(parts) >= 2 and all(p.isdigit() and p for p in parts):
season = int(parts[0])
episode = int(parts[1])
episode_end = int(parts[2]) if len(parts) >= 3 else None
return season, episode, episode_end
return None
def _is_year(text: str) -> bool:
"""Return True if ``text`` is a 4-digit year in [1900, 2099]."""
return len(text) == 4 and text.isdigit() and 1900 <= int(text) <= 2099
def _split_codec_group(text: str, kb: ReleaseKnowledge) -> tuple[str, str] | None:
"""Split a ``codec-GROUP`` token into ``(codec, group)`` if it fits.
Returns ``None`` if the token doesn't match the ``codec-GROUP``
shape. Handles the empty-group case (``x265-``) as ``(codec, "")``.
"""
if "-" not in text:
return None
head, _, tail = text.rpartition("-")
if head.lower() in kb.codecs:
return head, tail
return None
def _match_role(text: str, role: TokenRole, kb: ReleaseKnowledge) -> TokenRole | None:
"""Return ``role`` if ``text`` matches it under ``kb``, else ``None``."""
lower = text.lower()
if role is TokenRole.YEAR:
return TokenRole.YEAR if _is_year(text) else None
if role is TokenRole.SEASON_EPISODE:
return (
TokenRole.SEASON_EPISODE
if _parse_season_episode(text) is not None
else None
)
if role is TokenRole.RESOLUTION:
return TokenRole.RESOLUTION if lower in kb.resolutions else None
if role is TokenRole.SOURCE:
return TokenRole.SOURCE if lower in kb.sources else None
if role is TokenRole.CODEC:
return TokenRole.CODEC if lower in kb.codecs else None
return None
# ---------------------------------------------------------------------------
# Stage 2a — group detection
# ---------------------------------------------------------------------------
def _detect_group(tokens: list[Token], kb: ReleaseKnowledge) -> tuple[str, int | None]:
"""Identify the release group by walking tokens right-to-left.
Returns ``(group_name, token_index_carrying_group)``. ``index`` is
``None`` when the group is absent (no trailing ``-`` in the stream).
"""
# Priority 1: codec-GROUP shape (clearest signal).
for tok in reversed(tokens):
split = _split_codec_group(tok.text, kb)
if split is not None:
_, group = split
return (group or "UNKNOWN"), tok.index
# Priority 2: rightmost dash, excluding dashed sources (Web-DL, etc.).
for tok in reversed(tokens):
if "-" not in tok.text:
continue
head, _, tail = tok.text.rpartition("-")
if (
head.lower() in kb.sources
or tok.text.lower().replace("-", "") in kb.sources
):
continue
if tail:
return tail, tok.index
return "UNKNOWN", None
# ---------------------------------------------------------------------------
# Stage 2b — structural annotation (schema-driven)
# ---------------------------------------------------------------------------
def _annotate_structural(
tokens: list[Token],
kb: ReleaseKnowledge,
schema: GroupSchema,
group_token_index: int,
) -> list[Token] | None:
"""Annotate structural tokens following a known group schema.
Walks the schema's chunks against the body (tokens up to the group
token). For each chunk, scans forward in the body for a matching
token — tokens passed over without match are left UNKNOWN (the
enricher pass will handle them).
Returns ``None`` if any mandatory chunk fails to find a match.
"""
result = list(tokens)
# The codec-GROUP token carries CODEC + GROUP. Split it now so the
# schema walk knows the codec is "pre-consumed" at the end.
group_token = result[group_token_index]
cg_split = _split_codec_group(group_token.text, kb)
codec_pre_consumed = False
if cg_split is not None:
codec, group = cg_split
result[group_token_index] = group_token.with_role(
TokenRole.CODEC, codec=codec, group=group or "UNKNOWN"
)
codec_pre_consumed = True
else:
head, _, tail = group_token.text.rpartition("-")
result[group_token_index] = group_token.with_role(
TokenRole.GROUP, group=tail or "UNKNOWN", prefix=head
)
body_end = group_token_index # exclusive
tok_idx = 0
chunk_idx = 0
# 1) TITLE — leftmost contiguous tokens up to the first structural
# boundary. Title is special because it can be multi-token.
while (
chunk_idx < len(schema.chunks)
and schema.chunks[chunk_idx].role is TokenRole.TITLE
):
title_end = _find_title_end(result, body_end, kb)
for i in range(tok_idx, title_end):
result[i] = result[i].with_role(TokenRole.TITLE)
tok_idx = title_end
chunk_idx += 1
# 2) Remaining structural chunks. For each, scan forward in the body
# for a matching token; tokens passed over remain UNKNOWN.
for chunk in schema.chunks[chunk_idx:]:
if chunk.role is TokenRole.GROUP:
continue
if chunk.role is TokenRole.CODEC and codec_pre_consumed:
continue
match_idx = _find_chunk(result, tok_idx, body_end, chunk.role, kb)
if match_idx is None:
if chunk.optional:
continue
return None
result[match_idx] = result[match_idx].with_role(chunk.role)
tok_idx = match_idx + 1
return result
def _find_title_end(
tokens: list[Token], body_end: int, kb: ReleaseKnowledge
) -> int:
"""Return the exclusive index where the title ends.
The title is the leftmost run of tokens whose text does not match
any structural role (year, season/episode, resolution, source,
codec). Enricher tokens (audio, HDR, language) are *not* boundaries
because they can appear in the middle of the structural sequence;
however, in canonical scene names they don't appear inside the title
itself, so this heuristic holds in practice.
"""
for i in range(body_end):
text = tokens[i].text
if _parse_season_episode(text) is not None:
return i
if _is_year(text):
return i
lower = text.lower()
if lower in kb.resolutions:
return i
if lower in kb.sources:
return i
if lower in kb.codecs:
return i
# codec-GROUP token (e.g. "x265-KONTRAST") or dashed source (Web-DL).
if "-" in text:
head, _, _ = text.rpartition("-")
if (
head.lower() in kb.codecs
or head.lower() in kb.sources
or text.lower().replace("-", "") in kb.sources
):
return i
return body_end
def _find_chunk(
tokens: list[Token],
start: int,
end: int,
role: TokenRole,
kb: ReleaseKnowledge,
) -> int | None:
"""Return the first index in ``[start, end)`` whose token matches ``role``.
Returns ``None`` if no token in the range matches. Tokens already
annotated (non-UNKNOWN) are skipped — they belong to another chunk.
"""
for i in range(start, end):
if tokens[i].role is not TokenRole.UNKNOWN:
continue
if _match_role(tokens[i].text, role, kb) is not None:
return i
return None
# ---------------------------------------------------------------------------
# Stage 2b' — SHITTY annotation (schema-less heuristic)
# ---------------------------------------------------------------------------
def _annotate_shitty(
tokens: list[Token],
kb: ReleaseKnowledge,
group_index: int | None,
) -> list[Token]:
"""Schema-less, dictionary-driven annotation.
SHITTY's job is narrow: for releases that *look* like scene names
but don't have a registered group schema, tag every token whose text
falls into a known YAML bucket (resolutions, codecs, sources, …).
Anything we can't classify stays UNKNOWN. The leftmost run of
UNKNOWN tokens becomes the title. Done.
Anything that requires more reasoning (parenthesized tech blocks,
bare-dashed title fragments, year-disguised slug suffixes, …) is
PATH OF PAIN territory and stays out of here on purpose.
"""
result = list(tokens)
# 1) Group token — split codec-GROUP or tag GROUP. Same logic as EASY.
if group_index is not None:
gt = result[group_index]
cg_split = _split_codec_group(gt.text, kb)
if cg_split is not None:
codec, group = cg_split
result[group_index] = gt.with_role(
TokenRole.CODEC, codec=codec, group=group or "UNKNOWN"
)
else:
_, _, tail = gt.text.rpartition("-")
result[group_index] = gt.with_role(
TokenRole.GROUP, group=tail or "UNKNOWN"
)
# 2) Enrichers (audio / video-meta / edition / language).
result = _annotate_enrichers(result, kb)
# 3) Single pass: tag each UNKNOWN token by looking it up in the kb
# buckets. First match wins per token, first occurrence wins per
# role (we don't overwrite an already-tagged role).
matchers: list[tuple[TokenRole, callable]] = [
(TokenRole.SEASON_EPISODE, lambda t: _parse_season_episode(t) is not None),
(TokenRole.YEAR, _is_year),
(TokenRole.RESOLUTION, lambda t: t.lower() in kb.resolutions),
(TokenRole.DISTRIBUTOR, lambda t: t.upper() in kb.distributors),
(TokenRole.SOURCE, lambda t: t.lower() in kb.sources),
(TokenRole.CODEC, lambda t: t.lower() in kb.codecs),
]
seen: set[TokenRole] = set()
for i, tok in enumerate(result):
if tok.role is not TokenRole.UNKNOWN:
continue
for role, matches in matchers:
if role in seen:
continue
if matches(tok.text):
result[i] = tok.with_role(role)
seen.add(role)
break
# 4) Title = leftmost contiguous UNKNOWN tokens.
for i, tok in enumerate(result):
if tok.role is not TokenRole.UNKNOWN:
break
result[i] = tok.with_role(TokenRole.TITLE)
return result
# ---------------------------------------------------------------------------
# Stage 2c — enricher pass (non-positional roles)
# ---------------------------------------------------------------------------
def _annotate_enrichers(tokens: list[Token], kb: ReleaseKnowledge) -> list[Token]:
"""Tag the remaining UNKNOWN tokens with non-positional roles.
Multi-token sequences are matched first (so ``DTS.HD.MA`` wins over
a single-token ``DTS``). For each sequence match, the first token
receives the role + ``extra["sequence"]`` (the canonical joined
value), and the trailing members are marked with the same role +
``extra["sequence_member"]=True`` so :func:`assemble` extracts the
value only from the primary.
"""
result = list(tokens)
# Multi-token sequences first.
_apply_sequences(
result, kb.audio.get("sequences", []), "codec", TokenRole.AUDIO_CODEC
)
_apply_sequences(
result, kb.video_meta.get("sequences", []), "hdr", TokenRole.HDR
)
_apply_sequences(
result, kb.editions.get("sequences", []), "edition", TokenRole.EDITION
)
# Single tokens.
known_audio_codecs = {c.upper() for c in kb.audio.get("codecs", [])}
known_audio_channels = set(kb.audio.get("channels", []))
known_hdr = {h.upper() for h in kb.video_meta.get("hdr", [])} | kb.hdr_extra
known_bit_depth = {d.lower() for d in kb.video_meta.get("bit_depth", [])}
known_editions = {t.upper() for t in kb.editions.get("tokens", [])}
# Channel layouts like "5.1" are tokenized as two tokens ("5", "1")
# because "." is a separator. Detect consecutive pairs whose joined
# value (without any trailing "-GROUP") is in the channel set.
_detect_channel_pairs(result, known_audio_channels)
for i, tok in enumerate(result):
if tok.role is not TokenRole.UNKNOWN:
continue
text = tok.text
upper = text.upper()
lower = text.lower()
if upper in known_audio_codecs:
result[i] = tok.with_role(TokenRole.AUDIO_CODEC)
continue
if text in known_audio_channels:
result[i] = tok.with_role(TokenRole.AUDIO_CHANNELS)
continue
if upper in known_hdr:
result[i] = tok.with_role(TokenRole.HDR)
continue
if lower in known_bit_depth:
result[i] = tok.with_role(TokenRole.BIT_DEPTH)
continue
if upper in known_editions:
result[i] = tok.with_role(TokenRole.EDITION)
continue
if upper in kb.language_tokens:
result[i] = tok.with_role(TokenRole.LANGUAGE)
continue
if upper in kb.distributors:
result[i] = tok.with_role(TokenRole.DISTRIBUTOR)
continue
return result
def _apply_sequences(
tokens: list[Token],
sequences: list[dict],
value_key: str,
role: TokenRole,
) -> None:
"""Mark the first occurrence of each sequence in place.
Mutates ``tokens`` (replacing entries with new role-tagged Token
instances). Sequences in the YAML must be ordered most-specific
first; the first match wins per starting position.
"""
if not sequences:
return
upper_texts = [t.text.upper() for t in tokens]
consumed: set[int] = set()
for seq in sequences:
seq_upper = [s.upper() for s in seq["tokens"]]
n = len(seq_upper)
for start in range(len(tokens) - n + 1):
if any(idx in consumed for idx in range(start, start + n)):
continue
if any(
tokens[start + k].role is not TokenRole.UNKNOWN for k in range(n)
):
continue
if upper_texts[start : start + n] == seq_upper:
tokens[start] = tokens[start].with_role(
role, sequence=seq[value_key]
)
for k in range(1, n):
tokens[start + k] = tokens[start + k].with_role(
role, sequence_member="True"
)
consumed.update(range(start, start + n))
def _detect_channel_pairs(
tokens: list[Token], known_channels: set[str]
) -> None:
"""Spot two consecutive numeric tokens that form a channel layout.
Example: ``["5", "1-KTH"]`` → joined ``"5.1"`` (after stripping the
``-GROUP`` suffix on the second). The second token may be the trailing
codec-GROUP token, in which case it's already tagged CODEC and we
skip — we'd corrupt its role.
"""
for i in range(len(tokens) - 1):
first = tokens[i]
second = tokens[i + 1]
if first.role is not TokenRole.UNKNOWN:
continue
# Strip a "-GROUP" suffix on the second token before joining.
second_text = second.text.split("-")[0]
candidate = f"{first.text}.{second_text}"
if candidate not in known_channels:
continue
# Only tag the first token (carries the channel value). The
# second token may legitimately remain UNKNOWN (or be the
# codec-GROUP token, already tagged CODEC).
tokens[i] = first.with_role(
TokenRole.AUDIO_CHANNELS, sequence=candidate
)
if second.role is TokenRole.UNKNOWN:
tokens[i + 1] = second.with_role(
TokenRole.AUDIO_CHANNELS, sequence_member="True"
)
# ---------------------------------------------------------------------------
# Stage 2 entry point
# ---------------------------------------------------------------------------
def annotate(tokens: list[Token], kb: ReleaseKnowledge) -> list[Token]:
"""Annotate token roles.
Dispatch:
* If a group is detected AND has a known schema, run the EASY
structural walk. If the schema walk aborts on a mandatory chunk
mismatch, fall through to SHITTY (the heuristic still does better
than giving up).
* Otherwise run SHITTY — schema-less, best-effort, never aborts.
The enricher pass runs in both cases. The pipeline always returns a
populated token list; downstream callers don't need to distinguish
EASY vs SHITTY at this layer (the parse_path is decided in the
service based on whether a schema matched).
"""
group_name, group_index = _detect_group(tokens, kb)
schema = kb.group_schema(group_name) if group_index is not None else None
if schema is not None and group_index is not None:
structural = _annotate_structural(tokens, kb, schema, group_index)
if structural is not None:
return _annotate_enrichers(structural, kb)
# SHITTY fallback — heuristic positional pass. ``_annotate_shitty``
# runs its own enricher pass internally (it has to, so the title
# scan can skip enricher-tagged tokens).
return _annotate_shitty(tokens, kb, group_index)
def has_known_schema(tokens: list[Token], kb: ReleaseKnowledge) -> bool:
"""Return True if ``tokens`` would take the EASY path in :func:`annotate`."""
group_name, group_index = _detect_group(tokens, kb)
if group_index is None:
return False
return kb.group_schema(group_name) is not None
# ---------------------------------------------------------------------------
# Stage 3 — assemble
# ---------------------------------------------------------------------------
def assemble(
annotated: list[Token],
site_tag: str | None,
raw_name: str,
kb: ReleaseKnowledge,
) -> dict:
"""Fold annotated tokens into a ``ParsedRelease``-compatible dict.
Returns a dict (not a ``ParsedRelease`` instance) so the caller can
layer in additional fields (``parse_path``, ``raw``, …) before
instantiation.
"""
# Pure-punctuation tokens (e.g. a stray "-" left by ` - ` separators in
# human-friendly release names) carry no title content and would leak
# into the joined title as ``"Show.-.Episode"``. Drop them here.
title_parts = [
t.text
for t in annotated
if t.role is TokenRole.TITLE and any(c.isalnum() for c in t.text)
]
title = ".".join(title_parts) if title_parts else (
annotated[0].text if annotated else raw_name
)
year: int | None = None
season: int | None = None
episode: int | None = None
episode_end: int | None = None
quality: str | None = None
source: str | None = None
codec: str | None = None
group = "UNKNOWN"
audio_codec: str | None = None
audio_channels: str | None = None
bit_depth: str | None = None
hdr_format: str | None = None
edition: str | None = None
distributor: str | None = None
languages: list[str] = []
is_season_range = False
for tok in annotated:
# Skip non-primary members of a multi-token sequence.
if tok.extra.get("sequence_member") == "True":
continue
role = tok.role
if role is TokenRole.YEAR:
year = int(tok.text)
elif role is TokenRole.SEASON_EPISODE:
parsed = _parse_season_episode(tok.text)
if parsed is not None:
season, episode, episode_end = parsed
# Detect Sxx-yy range form to flag it as a multi-season pack.
upper = tok.text.upper()
if (
len(upper) == 6
and upper[0] == "S"
and upper[1:3].isdigit()
and upper[3] == "-"
and upper[4:6].isdigit()
):
is_season_range = True
elif role is TokenRole.RESOLUTION:
quality = tok.text
elif role is TokenRole.SOURCE:
source = tok.text
elif role is TokenRole.CODEC:
codec = tok.extra.get("codec", tok.text)
if "group" in tok.extra:
group = tok.extra["group"] or "UNKNOWN"
elif role is TokenRole.GROUP:
group = tok.extra.get("group", tok.text) or "UNKNOWN"
elif role is TokenRole.AUDIO_CODEC:
if audio_codec is None:
audio_codec = tok.extra.get("sequence", tok.text)
elif role is TokenRole.AUDIO_CHANNELS:
if audio_channels is None:
audio_channels = tok.extra.get("sequence", tok.text)
elif role is TokenRole.BIT_DEPTH:
if bit_depth is None:
bit_depth = tok.text.lower()
elif role is TokenRole.HDR:
if hdr_format is None:
hdr_format = tok.extra.get("sequence", tok.text.upper())
elif role is TokenRole.EDITION:
if edition is None:
edition = tok.extra.get("sequence", tok.text.upper())
elif role is TokenRole.LANGUAGE:
languages.append(tok.text.upper())
elif role is TokenRole.DISTRIBUTOR:
if distributor is None:
distributor = tok.text.upper()
# Media type heuristic. Doc/concert/integrale tokens win over the
# generic tech-based fallback. We look across all tokens (not just
# annotated ones) because these markers may be tagged UNKNOWN by the
# structural pass — only the assemble step cares about them.
upper_tokens = {tok.text.upper() for tok in annotated}
doc_tokens = {t.upper() for t in kb.media_type_tokens.get("doc", [])}
concert_tokens = {t.upper() for t in kb.media_type_tokens.get("concert", [])}
integrale_tokens = {t.upper() for t in kb.media_type_tokens.get("integrale", [])}
if upper_tokens & doc_tokens:
media_type = MediaTypeToken.DOCUMENTARY
elif upper_tokens & concert_tokens:
media_type = MediaTypeToken.CONCERT
elif is_season_range:
media_type = MediaTypeToken.TV_COMPLETE
elif (
edition in {"COMPLETE", "INTEGRALE", "COLLECTION"}
or upper_tokens & integrale_tokens
) and season is None:
media_type = MediaTypeToken.TV_COMPLETE
elif season is not None:
media_type = MediaTypeToken.TV_SHOW
elif any((quality, source, codec, year)):
media_type = MediaTypeToken.MOVIE
else:
media_type = MediaTypeToken.UNKNOWN
return {
"title": title,
"title_sanitized": kb.sanitize_for_fs(title),
"year": year,
"season": season,
"episode": episode,
"episode_end": episode_end,
"quality": quality,
"source": source,
"codec": codec,
"group": group,
"media_type": media_type,
"site_tag": site_tag,
"languages": tuple(languages),
"audio_codec": audio_codec,
"audio_channels": audio_channels,
"bit_depth": bit_depth,
"hdr_format": hdr_format,
"edition": edition,
"distributor": distributor,
}
@@ -0,0 +1,47 @@
"""Group schema value objects.
A :class:`GroupSchema` describes the canonical chunk layout of releases
from a known group (KONTRAST, RARBG, ELiTE, …). It is the EASY-road
contract: when a release ends in ``-<GROUP>`` and we know the group,
the annotator walks the schema instead of running the heuristic SHITTY
matchers.
Schemas are loaded from ``knowledge/release/release_groups/<group>.yaml``
by an infrastructure adapter and surfaced via the
:class:`~alfred.domain.release.ports.knowledge.ReleaseKnowledge` port.
"""
from __future__ import annotations
from dataclasses import dataclass
from .tokens import TokenRole
@dataclass(frozen=True)
class SchemaChunk:
"""One entry in a group's chunk order.
``role`` is the :class:`TokenRole` the chunk maps to. ``optional``
is True for chunks that may be absent (e.g. ``year`` on TV releases,
``source`` on bare ELiTE TV releases).
"""
role: TokenRole
optional: bool = False
@dataclass(frozen=True)
class GroupSchema:
"""Schema for a known release group.
``chunks`` is the left-to-right canonical order. The annotator walks
tokens and chunks in lockstep: an optional chunk that doesn't match
the current token is skipped (the chunk index advances, the token
index stays), a mandatory chunk that doesn't match aborts the EASY
path and falls back to SHITTY.
"""
name: str
separator: str
chunks: tuple[SchemaChunk, ...]
@@ -0,0 +1,139 @@
"""Parse-confidence scoring.
``parse_release`` returns a :class:`ParseReport` alongside its
:class:`ParsedRelease`. The report carries:
- ``confidence``: integer 0100 derived from which structural and
technical fields got populated, minus a penalty per UNKNOWN token
left in the annotated stream.
- ``road``: which of the three roads the parse took
(:class:`Road.EASY` / :class:`Road.SHITTY` / :class:`Road.PATH_OF_PAIN`).
- ``unknown_tokens``: textual residue, useful for diagnostics.
- ``missing_critical``: structural fields the score-tally found absent
(e.g. ``("year", "media_type")``) — the caller can use this to drive
PoP recovery (questions, LLM call).
All weights, penalties and thresholds come from the injected knowledge
base (``kb.scoring``), itself loaded from
``alfred/knowledge/release/scoring.yaml``. No magic numbers here.
The scoring functions are pure — they consume the annotated token list
and the resulting :class:`ParsedRelease` and return the report. They are
called by ``services.parse_release`` after ``assemble`` has run.
"""
from __future__ import annotations
from enum import Enum
from ..ports.knowledge import ReleaseKnowledge
from alfred.domain.releases_TO_CHECK.value_objects_old_question_mark import ParsedRelease
from .tokens import Token, TokenRole
class Road(str, Enum):
"""How the parser handled a given release name.
Distinct from :class:`~alfred.domain.release.value_objects.TokenizationRoute`,
which records the tokenization route (DIRECT / SANITIZED / AI). Road
is about confidence in the *result*, not the *method*.
"""
EASY = "easy" # group schema matched — structural annotation
SHITTY = "shitty" # no schema, dict-driven annotation, score ≥ threshold
PATH_OF_PAIN = "path_of_pain" # score below threshold, needs help
# Critical structural fields — their absence drives the
# ``missing_critical`` list in the report.
_CRITICAL_FIELDS: tuple[str, ...] = ("title", "media_type", "year")
def _is_tv_shaped(parsed: ParsedRelease) -> bool:
"""Season/episode weights only count for releases that *look* like TV."""
return parsed.season is not None
def compute_score(
parsed: ParsedRelease,
annotated: list[Token],
kb: ReleaseKnowledge,
) -> int:
"""Compute a 0100 confidence score for the parse.
Each populated field contributes its weight from
``kb.scoring["weights"]``. Season/episode only count when the parse
looks like TV. ``group == "UNKNOWN"`` is treated as absent.
Then a penalty is subtracted per residual UNKNOWN token in
``annotated``, capped at ``penalties["max_unknown_penalty"]``.
Result is clamped to ``[0, 100]``.
"""
weights = kb.scoring["weights"]
penalties = kb.scoring["penalties"]
score = 0
if parsed.title:
score += weights.get("title", 0)
if parsed.media_type and parsed.media_type.value != "unknown":
score += weights.get("media_type", 0)
if parsed.year is not None:
score += weights.get("year", 0)
if _is_tv_shaped(parsed):
if parsed.season is not None:
score += weights.get("season", 0)
if parsed.episode is not None:
score += weights.get("episode", 0)
if parsed.quality:
score += weights.get("resolution", 0)
if parsed.source:
score += weights.get("source", 0)
if parsed.codec:
score += weights.get("codec", 0)
if parsed.group and parsed.group != "UNKNOWN":
score += weights.get("group", 0)
unknown_count = sum(1 for t in annotated if t.role is TokenRole.UNKNOWN)
raw_penalty = unknown_count * penalties.get("unknown_token", 0)
capped_penalty = min(raw_penalty, penalties.get("max_unknown_penalty", 0))
score -= capped_penalty
return max(0, min(100, score))
def collect_unknown_tokens(annotated: list[Token]) -> tuple[str, ...]:
"""Return the text of every token still tagged UNKNOWN."""
return tuple(t.text for t in annotated if t.role is TokenRole.UNKNOWN)
def collect_missing_critical(parsed: ParsedRelease) -> tuple[str, ...]:
"""Return the names of critical structural fields that are absent."""
missing: list[str] = []
if not parsed.title:
missing.append("title")
if not parsed.media_type or parsed.media_type.value == "unknown":
missing.append("media_type")
if parsed.year is None:
missing.append("year")
return tuple(missing)
def decide_road(
score: int,
has_schema: bool,
kb: ReleaseKnowledge,
) -> Road:
"""Pick the road the parse took.
EASY is decided structurally: if a known group schema matched, the
annotation walked the schema, and that's enough — the score does not
veto EASY. Otherwise the score decides between SHITTY and
PATH_OF_PAIN using ``kb.scoring["thresholds"]["shitty_min"]``.
"""
if has_schema:
return Road.EASY
threshold = kb.scoring["thresholds"].get("shitty_min", 60)
if score >= threshold:
return Road.SHITTY
return Road.PATH_OF_PAIN
@@ -0,0 +1,120 @@
"""Release domain — parsing service.
Thin orchestrator over the annotate-based pipeline in
:mod:`alfred.domain.release.parser.pipeline`. Responsibilities:
* Strip a leading/trailing ``[site.tag]`` and decide ``parse_path``.
* Reject malformed names (forbidden characters) → ``parse_path=AI`` so
the LLM can clean them up.
* Otherwise call the v2 pipeline (tokenize → annotate → assemble) and
wrap the result in :class:`ParsedRelease`.
* Score the result and decide the road (EASY / SHITTY / PATH_OF_PAIN)
via :mod:`alfred.domain.release.parser.scoring`.
The public entry point is :func:`parse_release`, which returns
``(ParsedRelease, ParseReport)``. The report carries the confidence
score, the road, and diagnostic info for downstream callers.
"""
from __future__ import annotations
from alfred.domain.releases_TO_CHECK.parser import scoring as _scoring, pipeline as _v2
from alfred.domain.releases_TO_CHECK.ports import ReleaseKnowledge
from alfred.domain.releases_TO_CHECK.value_objects_old_question_mark import MediaTypeToken, ParsedRelease, ParseReport, TokenizationRoute
def parse_release(
name: str, kb: ReleaseKnowledge
) -> tuple[ParsedRelease, ParseReport]:
"""Parse a release name.
Returns a tuple ``(ParsedRelease, ParseReport)``. The structural VO
is unchanged from the previous single-return contract; the report
is new and carries the confidence score + road decision.
Flow:
1. Strip a leading/trailing ``[site.tag]`` if present (sets
``parse_path="sanitized"``).
2. If the remainder still contains truly forbidden chars (anything
not in the configured separators), short-circuit to
``media_type="unknown"`` / ``parse_path="ai"`` and emit a
PATH_OF_PAIN report — the LLM handles these.
3. Otherwise run the v2 pipeline: tokenize → annotate (EASY when a
group schema is known, SHITTY otherwise) → assemble → score.
"""
parse_path = TokenizationRoute.DIRECT
# Apostrophes inside titles ("Don't", "L'avare") are common and should
# not push the release through the AI fallback. Strip them up front so
# both strip_site_tag and tokenize see "Dont" / "Lavare", which is good
# enough for token-level matching. The raw name is preserved on the VO.
working_name = name
if "'" in working_name:
working_name = working_name.replace("'", "")
parse_path = TokenizationRoute.SANITIZED
clean, site_tag = _v2.strip_site_tag(working_name)
if site_tag is not None:
parse_path = TokenizationRoute.SANITIZED
if not _is_well_formed(clean, kb):
parsed = ParsedRelease(
raw=name,
clean=clean,
title=clean,
title_sanitized=kb.sanitize_for_fs(clean),
year=None,
season=None,
episode=None,
episode_end=None,
quality=None,
source=None,
codec=None,
group="UNKNOWN",
media_type=MediaTypeToken.UNKNOWN,
site_tag=site_tag,
parse_path=TokenizationRoute.AI,
)
report = ParseReport(
confidence=0,
road=_scoring.Road.PATH_OF_PAIN.value,
unknown_tokens=(clean,),
missing_critical=("title", "media_type", "year"),
)
return parsed, report
tokens, v2_tag = _v2.tokenize(working_name, kb)
annotated = _v2.annotate(tokens, kb)
fields = _v2.assemble(annotated, v2_tag, name, kb)
parsed = ParsedRelease(
raw=name,
clean=clean,
parse_path=parse_path,
**fields,
)
has_schema = _v2.has_known_schema(tokens, kb)
score = _scoring.compute_score(parsed, annotated, kb)
road = _scoring.decide_road(score, has_schema, kb)
report = ParseReport(
confidence=score,
road=road.value,
unknown_tokens=_scoring.collect_unknown_tokens(annotated),
missing_critical=_scoring.collect_missing_critical(parsed),
)
return parsed, report
def _is_well_formed(name: str, kb: ReleaseKnowledge) -> bool:
"""Return True if ``name`` contains no forbidden characters per scene
naming rules.
Characters listed as token separators (spaces, brackets, parens, …)
are NOT considered malforming — the tokenizer handles them. Only
truly broken chars like ``@``, ``#``, ``!``, ``%`` make a name
malformed.
"""
tokenizable = set(kb.separators)
return not any(c in name for c in kb.forbidden_chars if c not in tokenizable)

Some files were not shown because too many files have changed in this diff Show More