Compare commits
73 Commits
9f10f4e0ad
...
unfuck
| Author | SHA1 | Date | |
|---|---|---|---|
| 745dec39f5 | |||
| 42fa6139ed | |||
| 2df7843d8b | |||
| 28304bb162 | |||
| c62ae81275 | |||
| cffafa2e60 | |||
| b3abad4da4 | |||
| 7ff2e6bc4e | |||
| 8f31f880aa | |||
| 1efe9a82c1 | |||
| 0dc053881a | |||
| 97dc799a26 | |||
| fe9857aaed | |||
| cc334a7951 | |||
| 86222d95d1 | |||
| 9e48c70b8a | |||
| 7da0f887e7 | |||
| c22b2b78eb | |||
| 2f160644da | |||
| e65c1df229 | |||
| c0f6d01048 | |||
| de7030fa9c | |||
| 3622c95154 | |||
| c7c11180d9 | |||
| b0e275bd11 | |||
| 6c12c18a27 | |||
| 1427c8a54b | |||
| 8491edac22 | |||
| 02e478a157 | |||
| 3dc73a5214 | |||
| 88f156b7a4 | |||
| 5107cb32c0 | |||
| b7979c0f8b | |||
| 9f1ce94690 | |||
| 5e0ed11672 | |||
| 0246f85ef8 | |||
| e62dc90bd1 | |||
| 688c37bbec | |||
| 757e4045ee | |||
| c3767aacb6 | |||
| 5bcf22b408 | |||
| cfa9f54d9f | |||
| f0aaf50c97 | |||
| a09262b33f | |||
| 9c7cd66d2b | |||
| 83dbed887b | |||
| 0c9489e16b | |||
| 621bb96995 | |||
| 448ef3b79c | |||
| b1c7f35ffb | |||
| 5bbdc9081f | |||
| 5d7b214af2 | |||
| 18267d0165 | |||
| 19fe8a519a | |||
| a0d1846ff2 | |||
| 0fb59a4581 | |||
| e79ca462b8 | |||
| 03aa844d7d | |||
| c303efea48 | |||
| 5db350a1df | |||
| 12dc796ea2 | |||
| 9ddd85929e | |||
| ed7680b58f | |||
| b4c9efd13b | |||
| 98c688f29b | |||
| fcd80763e2 | |||
| 629387591f | |||
| 230a7ab88a | |||
| 3737f66851 | |||
| fd3bd1ad8c | |||
| 7dc7f0c241 | |||
| 075a827b0e | |||
| a2c917618f |
@@ -74,5 +74,11 @@ docs/
|
||||
# .md files (project-level Markdown is brol-y; allow-list the ones we track)
|
||||
*.md
|
||||
!CHANGELOG.md
|
||||
!/README.md
|
||||
!specs/
|
||||
!specs/**/*.md
|
||||
|
||||
# Private dev docs (separate git repo inside; see .claude/CLAUDE.md)
|
||||
/.claude/
|
||||
|
||||
#
|
||||
|
||||
+880
@@ -15,8 +15,872 @@ callers).
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
### Changed
|
||||
|
||||
- **`filesystem` infra + application rewritten as 5 atomic free
|
||||
functions.** On branch `unfuck`. Replaces the monolithic
|
||||
`FileManager` class + scattered helpers with five small, pure ops in
|
||||
`alfred/infrastructure/filesystem/`: `list_dir`, `create_dir`,
|
||||
`link_file`, `move_file`, `move_dir`. Each takes `pathlib.Path`
|
||||
arguments and raises typed exceptions from a dedicated hierarchy
|
||||
(`FilesystemError` → `SourceNotFound` / `DestinationExists` /
|
||||
`NotADirectory` / `NotAFile` / `PermissionDenied` / `CrossDevice` /
|
||||
`FilesystemOSError`) — no more `{"status": "ok" | "error"}` dicts at
|
||||
the infra boundary, no more `get_memory()` reads.
|
||||
- **`filesystem` application: 5 use cases as free functions.** A
|
||||
matching `<op>_use_case(path, …, roots: DirectoryRoots)` wraps each
|
||||
infra op, guards inputs against escaping a new `DirectoryRoots` VO
|
||||
(downloads / torrents / movies / tv_shows), catches infra exceptions,
|
||||
and returns a frozen `<Op>Response` DTO. Roots are now injected, not
|
||||
pulled from the global memory singleton.
|
||||
|
||||
- **Agent tool wrappers partially re-wired** to the new use cases.
|
||||
`list_folder` now delegates to `list_dir_use_case`; `move_media`
|
||||
to `move_file_use_case`; `move_to_destination` chains
|
||||
`create_dir_use_case` + `move_file_use_case`; a new
|
||||
`create_directory` tool wraps `create_dir_use_case`. Roots are
|
||||
loaded once via a module-level `_load_directory_roots()` helper
|
||||
that reads the persisted memory (no more per-call singleton
|
||||
reads inside the use cases themselves).
|
||||
|
||||
### Removed
|
||||
|
||||
- `FileManager` / `MediaOrganizer` / `create_folder` / `move` from the
|
||||
public API of `alfred.infrastructure.filesystem`. Their files remain
|
||||
on disk renamed with an `_OLD` suffix (e.g. `file_manager_OLD.py`) so
|
||||
the migration can finish on a follow-up commit without losing
|
||||
reference material. They are no longer re-exported from `__init__`.
|
||||
- `CreateSeedLinksUseCase` / `ListFolderUseCase` / `MoveMediaUseCase` /
|
||||
`ManageSubtitlesUseCase` / `resolve_destination` from the public API
|
||||
of `alfred.application.filesystem`. Same `_OLD` rename treatment.
|
||||
This intentionally breaks current tool wrappers and tests downstream
|
||||
— re-wiring is the next chunk of work on this branch.
|
||||
- **Agent tools dropped during the refactor** (to be reintroduced
|
||||
when the matching domain/application code lands):
|
||||
`manage_subtitles`, `set_path_for_folder`, `create_seed_links`,
|
||||
`resolve_season_destination`, `resolve_episode_destination`,
|
||||
`resolve_movie_destination`, `resolve_series_destination`.
|
||||
Their wrappers are removed from `alfred.agent.tools.filesystem`;
|
||||
`alfred.agent.tools.__init__` now re-exports only what still
|
||||
imports cleanly. `find_media_imdb_id` (already broken before this
|
||||
branch — name no longer exported by `tools.api`) was also dropped
|
||||
from the package re-exports.
|
||||
|
||||
### Added
|
||||
|
||||
- **`.alfred` v2 — Phase 4: v2-shaped `rescan_show` + new
|
||||
`rescan_movie` + index anchor-warning + `tmdb_cache_ttl_days`
|
||||
setting.** Fourth and final structural phase of
|
||||
`specs/dot_alfred_v2.md` on branch `refactor/dot-alfred-v2`. The TV
|
||||
+ movie rescan orchestrators now write v2 release aggregates
|
||||
(`SeriesRelease` / `MovieRelease`) via the concrete v2
|
||||
repositories; the library index keeps auto-healing from the new
|
||||
sidecars on its next read (no TMDB call from rescan — that stays
|
||||
Phase 5).
|
||||
- **`rescan_show`** moves from `alfred/application/library/` to
|
||||
`alfred/application/tv_shows/` (symmetry with the new
|
||||
`alfred/application/movies/`). New signature:
|
||||
`(show_root, *, tmdb_id: TmdbId, imdb_id: ImdbId | None = None,
|
||||
series_repo, scanner, prober, kb) -> SeriesRelease`.
|
||||
- **`rescan_movie`** (new — `alfred/application/movies/rescan.py`)
|
||||
locates the main video via `find_video_file`, runs
|
||||
`inspect_release` once, and writes the per-movie `.alfred`
|
||||
sidecar. `added_at = datetime.now(UTC)` on every rescan (the
|
||||
sidecar records reconciliation time, not filesystem mtime).
|
||||
Raises `MovieRescanFailed` when no video is found in the folder.
|
||||
- **PACK semantics in `rescan_show`**: a single-video + no-episode
|
||||
season becomes `SeasonRelease(mode=PACK, folder=…, episodes=())`.
|
||||
The slot map stays empty until the Phase 5 TMDB sync supplies
|
||||
`episode_count` — no fabricated `EpisodeRange` lands in the
|
||||
sidecar. *(Superseded by Phase 4b — see Fixed.)*
|
||||
- **`Settings.tmdb_cache_ttl_days: int = 14`** — placeholder for the
|
||||
Phase 5 TTL policy on library-index entries (`fetched_at + TTL`
|
||||
drives refresh decisions).
|
||||
- **Library-index anchor-mismatch warning** — both
|
||||
`DotAlfredTVShowLibraryIndex` and `DotAlfredMovieLibraryIndex` now
|
||||
cross-check each entry's `metadata.path` against the on-disk
|
||||
folder layout right after a successful parse. Drift is logged as a
|
||||
`WARNING` (one per missing folder, with `tmdb_id`); the heal path
|
||||
stays silent by construction (it always synthesizes from real
|
||||
folder names).
|
||||
- **`.alfred` v2 — Phase 5: TMDB sync orchestrators.** Fifth phase
|
||||
of `specs/dot_alfred_v2.md` on branch `refactor/dot-alfred-v2`.
|
||||
Two new orchestrators refresh the library-root index's
|
||||
TMDB-cached fields from on-disk truth + a single TMDB call:
|
||||
- **`sync_show`** (`alfred/application/tv_shows/sync.py`) calls
|
||||
`TMDBClient.get_tv_show_info`, loads the release via
|
||||
`DotAlfredSeriesReleaseRepository.load_by_tmdb_id`, and upserts
|
||||
the result into `DotAlfredTVShowLibraryIndex`. Honors
|
||||
`Settings.tmdb_cache_ttl_days`; placeholder entries (auto-healed,
|
||||
`status == "unknown"`) always refresh; `force=True` overrides
|
||||
both gates. Raises `ShowNotFoundInLibrary` when neither index nor
|
||||
sidecar carry `tmdb_id`. Indexed shows with a missing per-show
|
||||
sidecar still get a fresh TMDB pass — slot map clears until
|
||||
rescan repopulates it.
|
||||
- **`sync_movie`** (`alfred/application/movies/sync.py`) is the
|
||||
movie-side parallel. Placeholder signature is `name ==
|
||||
metadata.path` (auto-heal copies the folder name into `name`;
|
||||
the sidecar schema requires `name` non-empty so we can't use
|
||||
`name == ""`). When the per-movie sidecar is gone but the
|
||||
index entry remains, sync warns and returns the existing entry
|
||||
unchanged (no upsert possible without a release).
|
||||
- **`TmdbMovieInfo` DTO + `TMDBClient.get_movie_info`** — symmetric
|
||||
to the existing `TmdbShowInfo` / `get_tv_show_info` pair. Carries
|
||||
`tmdb_id`, `imdb_id`, `title`, and `release_year` (parsed from
|
||||
TMDB's `release_date`).
|
||||
- **`load_by_tmdb_id` on the v2 release repositories.** The series
|
||||
repo returns `(SeriesRelease, show_folder_name)` so the sync
|
||||
orchestrator can feed `DotAlfredTVShowLibraryIndex.upsert(...,
|
||||
path=...)`; the movie repo returns `MovieRelease` alone (folder is
|
||||
on `release.folder` already) and is provided as a semantic alias
|
||||
of `find_by_tmdb_id` for symmetry.
|
||||
- **`alfred/application/exceptions.py`** — new module for the two
|
||||
shared `*NotFoundInLibrary` exceptions raised by the sync
|
||||
orchestrators (`ShowNotFoundInLibrary`, `MovieNotFoundInLibrary`).
|
||||
|
||||
### Fixed
|
||||
|
||||
- **PACK vs EPISODIC classification (Phase 4b).** The Phase 4
|
||||
walker + `rescan_show` logic classified seasons by parser output
|
||||
(does the filename carry `Exx`?), but PACK vs EPISODIC is a
|
||||
*structural* distinction:
|
||||
- **PACK** = season folder with N flat `SxxEyy` videos.
|
||||
- **EPISODIC** = season folder with N subfolders, each holding
|
||||
one video.
|
||||
The walker now descends two levels under `show_root` and
|
||||
classifies per season folder. Mixed (flat + subfolders) is
|
||||
malformed — warn and skip. `rescan_show` trusts the walker's
|
||||
mode and stops conflating "single un-numbered video" with PACK
|
||||
(that case is now skipped as malformed too). Tests rewritten
|
||||
against the real model. Supersedes the PACK-semantics bullet
|
||||
above in Added.
|
||||
|
||||
### Removed
|
||||
|
||||
- **v1 dot_alfred stack and its abstract domain ports.** Deleted
|
||||
`alfred/infrastructure/persistence/dot_alfred/{bridge,repository,
|
||||
serializer,sidecar}.py`, plus the
|
||||
`alfred/domain/{tv_shows,movies}/repositories.py` ABCs
|
||||
(`TVShowRepository` / `MovieRepository`) — zero callers after
|
||||
Phase 4. `dot_alfred/__init__.py` is rewritten as a v2-only
|
||||
re-export (four concrete repositories + `ShowFolderUnknown`).
|
||||
- **`alfred/application/library/` package** (rescan + walker moved
|
||||
to `alfred/application/tv_shows/`).
|
||||
- The two Phase 3 module-level test skips
|
||||
(`test_repository.py`, `test_serializer.py`) are lifted by
|
||||
deleting the quarantined files.
|
||||
- **`MediaWithTracks` mixin + `track_lang_matches` helper** in
|
||||
`alfred.domain.shared.media`. Parked in Phase 4 pending a
|
||||
Phase 5 decision; zero callers across `alfred/` and `tests/`
|
||||
after the v2 aggregates landed, so both go.
|
||||
|
||||
### Internal
|
||||
|
||||
- **Suite**: 1233 → 1277 passing; 10 → 8 skips (only LLM-not-running
|
||||
skips remain — the Phase 3 quarantines are gone with their files).
|
||||
- Phase 5 cleanup sweep returns zero hits for `MediaWithTracks`,
|
||||
v1 dot_alfred symbols, v1 sidecar names, and `alfred.application.
|
||||
library` — the v2 surface is the only one left.
|
||||
|
||||
### Changed
|
||||
|
||||
- **`.alfred` v2 — Phase 3: `TVShow` / `Movie` aggregates become
|
||||
TMDB-only.** Third phase of `specs/dot_alfred_v2.md` on branch
|
||||
`refactor/dot-alfred-v2`. Filesystem-side concerns (file paths,
|
||||
tracks, quality, mode, `added_at`) move to the `releases/` domain
|
||||
added in Phase 1; the TMDB aggregates now carry only identity +
|
||||
TMDB catalog facts.
|
||||
- **`TVShow`** — `tmdb_id: TmdbId` is now the **required primary
|
||||
key**; `imdb_id: ImdbId | None` is the optional secondary anchor.
|
||||
Added `status: str = "unknown"` (raw TMDB string, default matches
|
||||
the v2 library-index auto-heal placeholder). `episode_count`
|
||||
aggregates the TMDB-cached counts on each `Season` (was: sum of
|
||||
materialized `Episode` objects).
|
||||
- **`Season`** — added `episode_count: int = 0` (TMDB-cached,
|
||||
authoritative). **Removed**: `audio_tracks`, `subtitle_tracks`,
|
||||
and the `mode` property (release mode now lives only on
|
||||
`SeasonRelease.mode` — single source of truth).
|
||||
- **`Episode`** — slimmed to identity + title. **Removed**:
|
||||
`file_path`, `file_size`, `audio_tracks`, `subtitle_tracks`. The
|
||||
`MediaWithTracks` mixin is no longer in `Episode`'s MRO; on-disk
|
||||
facts live on the matching `EpisodeRelease` keyed by
|
||||
`(season_number, episode_number)`.
|
||||
- **`Movie`** — `tmdb_id: TmdbId` required, `imdb_id` optional.
|
||||
**Removed**: `file_path`, `file_size`, `quality`, `added_at`,
|
||||
`audio_tracks`, `subtitle_tracks`. `get_filename()` now returns
|
||||
`"Title.Year"` (quality lives on `MovieRelease` and is appended
|
||||
by a release-aware caller — Phase 4 wires this through
|
||||
`MediaOrganizer`).
|
||||
- **`TVShowBuilder` / `SeasonBuilder`** — constructor requires
|
||||
`tmdb_id: TmdbId`; `imdb_id` and `status` are optional.
|
||||
`SeasonBuilder.set_episode_count(int)` replaces the old
|
||||
`set_audio_tracks` / `set_subtitle_tracks` (tracks no longer
|
||||
persisted on `Season`).
|
||||
- **`MovieRelease` carries `added_at: datetime`** (required).
|
||||
Bumped `dot_alfred/v2` `SCHEMA_VERSION` from `1` → `2` to add
|
||||
`added_at: datetime` to `MovieReleaseSidecar`. Round-trip via
|
||||
Pydantic `mode="json"` (datetime ↔ ISO 8601 string). No migration
|
||||
code shipped — no v2.1 sidecars exist in the wild yet.
|
||||
- **No-coercion `TmdbId` contract.** `TVShow(tmdb_id=1396)` now raises
|
||||
— callers pass `TmdbId(1396)`. Same for `imdb_id: ImdbId | None`
|
||||
on `TVShow`/`Movie`. Honest type contract, no ergonomic shim.
|
||||
|
||||
### Removed
|
||||
|
||||
- `Season.mode` property (derive from `SeasonRelease.mode` instead).
|
||||
- `Episode.file_path` / `file_size` / `audio_tracks` /
|
||||
`subtitle_tracks`.
|
||||
- `Movie.file_path` / `file_size` / `quality` / `added_at` /
|
||||
`audio_tracks` / `subtitle_tracks`.
|
||||
|
||||
### Internal
|
||||
|
||||
- v1 dot_alfred package (`bridge.py`, `repository.py`,
|
||||
`serializer.py`, `sidecar.py`), the abstract `TVShowRepository` /
|
||||
`MovieRepository` ports typed against the pre-Phase-3 aggregates,
|
||||
and `alfred/application/library/rescan.py` are **intentionally
|
||||
left in tree as a known-red island**. Their tests
|
||||
(`tests/infrastructure/persistence/dot_alfred/test_repository.py`,
|
||||
`test_serializer.py`, `tests/application/library/test_rescan.py`)
|
||||
are module-level skipped with a Phase 4 reference. Phase 4 rewrites
|
||||
`rescan_show` / introduces `rescan_movie` on top of the v2
|
||||
release repositories + library index, then deletes the v1 stack +
|
||||
the abstract ports + the quarantined tests in one swing.
|
||||
- Test suite: 1216 passed, 11 skipped (8 pre-existing + 3 Phase-3
|
||||
quarantines), 4 xfailed. v2 round-trip tests now reference
|
||||
`SCHEMA_VERSION` instead of hard-coded `1` for future-proofing.
|
||||
|
||||
### Added
|
||||
|
||||
- **`.alfred` v2 — Phase 2: new persistence package + TMDB client
|
||||
extensions.** Second phase of `specs/dot_alfred_v2.md` on branch
|
||||
`refactor/dot-alfred-v2`. The new
|
||||
`alfred/infrastructure/persistence/dot_alfred/v2/` package ships
|
||||
the full v2 sidecar stack while leaving v1 (and the existing
|
||||
`TVShow` aggregate) untouched — Phase 3 is the cutover.
|
||||
- **Pydantic DTOs** — `SeriesReleaseSidecar` /
|
||||
`MovieReleaseSidecar` (per-item), `TVShowLibraryIndexSidecar` /
|
||||
`MovieLibraryIndexSidecar` (library-root index). All built on a
|
||||
common `_Strict` base (`extra="forbid"`, `frozen=True`) with a
|
||||
`@model_validator` enforcing `schema_version == 1`.
|
||||
- **Track entries** — `AudioTrackEntry` / `SubtitleEntry` (sidecar
|
||||
cache shape, slimmed from the domain track types). `SubtitleEntry`
|
||||
carries `is_forced` + `is_sdh` as explicit booleans (v1's
|
||||
`type: "sdh"` overload is gone).
|
||||
- **Serializer** — `read_yaml` / `atomic_write_yaml` helpers
|
||||
centralize YAML I/O and atomic writes (`.tmp + os.replace`).
|
||||
`SidecarSchemaError` wraps both YAML parse errors and Pydantic
|
||||
validation errors for uniform catch-and-skip semantics.
|
||||
- **Bridge** — lossless `domain ↔ sidecar` conversion for
|
||||
`SeriesRelease` / `MovieRelease` (round-trippable, including
|
||||
multi-episode ranges and `is_sdh` subtitles); one-way projection
|
||||
for library-index entries (`show_index_entry_from`,
|
||||
`movie_index_entry_from`) that flattens multi-episode files into
|
||||
per-TMDB-slot maps in `seasons[*].episodes`.
|
||||
- **Repositories** —
|
||||
`DotAlfredSeriesReleaseRepository` /
|
||||
`DotAlfredMovieReleaseRepository` walk `library_root/*/` with
|
||||
log+skip on corruption; **`DotAlfredTVShowLibraryIndex`** /
|
||||
**`DotAlfredMovieLibraryIndex`** auto-heal silently on missing or
|
||||
corrupt index files by rebuilding from the per-item sidecars
|
||||
(healed entries keep TMDB-cached fields as placeholders until the
|
||||
next sync repopulates them). Writes are atomic and never auto-heal
|
||||
(read paths handle that).
|
||||
- **TMDB client extensions** — `TmdbSeasonInfo` / `TmdbShowInfo`
|
||||
DTOs + `TMDBClient.get_tv_show_info(tmdb_id)` aggregating
|
||||
`/tv/{id}` + `/tv/{id}/external_ids`. The parsing logic is a pure
|
||||
function (`parse_tv_show_info`) testable without HTTP, with an
|
||||
injectable reference date for deterministic `aired` flag tests.
|
||||
- **`is_sdh` flag on `SubtitleTrack`.** Added to
|
||||
`alfred/domain/shared/media.py::SubtitleTrack` to mirror ffprobe's
|
||||
`hearing_impaired` disposition. Wired through the ffprobe layer
|
||||
(`ffprobe_prober.py`) and the v2 sidecar bridge so SDH information
|
||||
round-trips end-to-end. Defaults to `False` — backwards-compatible
|
||||
for every existing caller.
|
||||
- **37 v2 integration tests** on `tmp_path` covering round-trips
|
||||
(domain ↔ sidecar ↔ YAML ↔ domain), atomic writes (no `.tmp`
|
||||
leftovers), per-item log+skip on corruption / schema mismatch,
|
||||
movie anchor-mismatch warning, full upsert / find / delete on both
|
||||
library indexes, and the auto-heal path on missing / corrupt /
|
||||
schema-mismatched index files. **16 TMDB DTO tests** for the new
|
||||
`parse_tv_show_info` pure function.
|
||||
|
||||
- **`.alfred` v2 — Phase 1: new `releases/` domain.** First step of
|
||||
`specs/dot_alfred_v2.md` on branch `refactor/dot-alfred-v2`. The
|
||||
new `alfred/domain/releases/` package introduces a filesystem-only
|
||||
bounded context separated from TMDB identity (the existing
|
||||
`tv_shows` / `movies` domains). It hosts:
|
||||
- **`EpisodeRange` VO** — covers single-episode files
|
||||
(`EpisodeRange(E02, E02)`) and multi-episode files
|
||||
(`EpisodeRange(E02, E04)` for `SxxE02E03E04.mkv`), with
|
||||
`count()` / `numbers()` / `is_single()` helpers.
|
||||
- **`ReleaseMode` enum** — `PACK` (N video files directly in the
|
||||
season folder) vs `EPISODIC` (N sub-folders, one episode each);
|
||||
classified by the walker, never re-derived.
|
||||
- **Aggregates** — `TrackProfile`, `EpisodeRelease`,
|
||||
`SeasonRelease` (with `episode_count()` summing each file's
|
||||
range), `SeriesRelease`, `MovieRelease`. All frozen
|
||||
dataclasses; mutation via `SeasonReleaseBuilder` /
|
||||
`SeriesReleaseBuilder` (mirror the v1 `TVShowBuilder` pattern,
|
||||
including `from_existing()` round-trip).
|
||||
- **Abstract ports** — `SeriesReleaseRepository`,
|
||||
`MovieReleaseRepository` (concrete `DotAlfred*` arrive in
|
||||
Phase 2).
|
||||
- **`TmdbId` VO** added to `alfred/domain/shared/value_objects.py`
|
||||
(positive int, rejects bool/str/float — symmetry with `ImdbId`).
|
||||
- 73 unit tests covering VO validation, entity invariants, builder
|
||||
sort + overlap detection, and `from_existing()` round-trips. v1
|
||||
code paths untouched at this stage; new domain coexists.
|
||||
|
||||
- **`rescan_show` orchestrator
|
||||
(`alfred/application/library/rescan.py`).** Step 4 of the
|
||||
`specs/dot_alfred.md` plan. Walks an Alfred-managed show folder,
|
||||
runs the existing `inspect_release` pipeline on every video file it
|
||||
finds, and assembles a frozen `TVShow` aggregate persisted via the
|
||||
injected `TVShowRepository`. Reuses the release parser + ffprobe
|
||||
path verbatim — no duplicated parse/probe logic at the library
|
||||
layer. PACK vs EPISODIC inferred per season folder from the
|
||||
on-disk file count + parser output: a single video whose name
|
||||
carries no `Exx` token becomes a PACK season (tracks lifted to the
|
||||
season-level `audio_tracks` / `subtitle_tracks`), anything else
|
||||
becomes EPISODIC (one `Episode` per file). Episode paths are
|
||||
stored relative to the show root for portability. Files that fail
|
||||
to parse a season/episode number, or seasons with mixed numbers,
|
||||
are logged and skipped — the orchestrator never raises. Embedded
|
||||
subtitle tracks are captured from `ffprobe`; adjacent `.srt`
|
||||
files, multi-episode entries (`S01E01E02`), and TMDB-driven PACK
|
||||
detection are tracked as tech debt for a dedicated subtitles /
|
||||
ShowTracker session. 7 integration tests on `tmp_path` with the
|
||||
Foundation layout (S01 EPISODIC + S02 PACK) cover the round-trip
|
||||
through the real `.alfred` repository.
|
||||
- **Show tree walker (`alfred/application/library/walker.py`).**
|
||||
Step 4a foundation. `walk_show(show_root, scanner, kb)` returns a
|
||||
`ShowTree(show_root, season_folders=tuple[SeasonFolder, ...])` —
|
||||
pure structural snapshot, no parsing, no probing. Season folders
|
||||
are detected by a `\bS\d{1,2}\b` token anywhere in the directory
|
||||
name (release-style naming, no Plex `Season 01` / `Specials`
|
||||
conventions). Video files are filtered against
|
||||
`kb.video_extensions`; no recursion into sub-sub-folders. 11 unit
|
||||
tests on `tmp_path` cover detection (case-insensitive, in-word
|
||||
rejection), filtering (subs, NFO, sample files), and edge cases
|
||||
(empty / missing show root).
|
||||
- **Season-level audio/subtitle tracks
|
||||
(`alfred/domain/tv_shows/entities.py`,
|
||||
`alfred/domain/tv_shows/builders.py`).** `Season` now inherits
|
||||
from `MediaWithTracks` and carries `audio_tracks` /
|
||||
`subtitle_tracks` tuples (empty by default). Populated only in
|
||||
PACK mode (the single release covering the whole season); empty in
|
||||
EPISODIC mode where tracks live per-episode. `SeasonBuilder`
|
||||
gains `set_audio_tracks()` / `set_subtitle_tracks()` and forwards
|
||||
them through `from_existing()`. The bridge writes / reads them in
|
||||
the PACK branch via shared `_synth_audio_tracks` /
|
||||
`_synth_subtitle_tracks` helpers used for episodes too.
|
||||
|
||||
- **`DotAlfredTVShowRepository` — filesystem-backed implementation of
|
||||
the `TVShowRepository` port
|
||||
(`alfred/infrastructure/persistence/dot_alfred/repository.py`).**
|
||||
Step 3 of the `specs/dot_alfred.md` plan. Reads and writes one
|
||||
`.alfred` YAML file per show under a configurable `library_root`.
|
||||
`save(show)` writes atomically (`.alfred.tmp` + `os.replace`) into a
|
||||
folder that **must already exist** — the repository never invents a
|
||||
folder name (the upstream `MediaOrganizer` is in charge of placing
|
||||
files; the repo writes the sidecar next to them). `find_by_imdb_id` /
|
||||
`find_all` walk `library_root/*/`, loading each readable sidecar;
|
||||
folders without a sidecar return `None` / are skipped (no implicit
|
||||
cold scan — that is the job of the upcoming `rescan_show` tool).
|
||||
Corrupted YAML and schema violations are logged and skipped, never
|
||||
raised, so a single bad folder does not break the rest of the
|
||||
library. The repo keeps a tiny in-memory `imdb_id → folder_name`
|
||||
index populated on every successful read/save, so subsequent saves
|
||||
find the right destination without re-walking — useful when the show
|
||||
folder name diverges from `show.get_folder_name()` (custom 1080p / 4K
|
||||
variants). 20 integration tests on `tmp_path` cover the round-trip,
|
||||
cold folder / unknown id returns, multi-show `find_all`, corrupted /
|
||||
wrong-schema skipping, atomic write (no `.alfred.tmp` left behind),
|
||||
overwrite, and folder-name fallbacks.
|
||||
- **Sidecar ↔ TVShow bridge
|
||||
(`alfred/infrastructure/persistence/dot_alfred/bridge.py`).**
|
||||
`to_sidecar(show, folder_paths=...)` summarizes the rich domain
|
||||
`AudioTrack` / `SubtitleTrack` to the sidecar's compact form (unique
|
||||
audio languages in track order; subtitle entries derived from
|
||||
`is_forced` and assumed `source="embedded"`). `from_sidecar(sidecar,
|
||||
title=...)` reconstructs the domain `TVShow` with synthesized tracks
|
||||
— one `AudioTrack` per language, one `SubtitleTrack` per entry, with
|
||||
ffprobe-only fields (`codec`, `channels`, `channel_layout`) left as
|
||||
`None`. The bridge is intentionally lossy on probe minutiae the
|
||||
sidecar does not store; this is the documented trade-off from the
|
||||
factual-only spec.
|
||||
|
||||
- **`.alfred` sidecar serializer
|
||||
(`alfred/infrastructure/persistence/dot_alfred/`).** Implements step 2
|
||||
of the `specs/dot_alfred.md` plan. Pure-dict in/out
|
||||
(`serialize(sidecar) -> dict`, `deserialize(data) -> ShowSidecar`) —
|
||||
YAML I/O lives in the repository layer (step 3) and is kept out for
|
||||
trivial testability. Ships the DTOs that mirror the YAML schema
|
||||
field-for-field (`ShowSidecar`, `SeasonSidecar`, `EpisodeSidecar`,
|
||||
`SubtitleEntry`). The sidecar acts as a **scan cache**: it stores
|
||||
only what is genuinely costly to recompute — folder/file paths
|
||||
(skipping the FS walk) and probed track metadata (skipping ffprobe).
|
||||
Release identifiers (group, source, quality, codec) live in folder
|
||||
and file names and are derived on demand by the parser — they are
|
||||
deliberately absent from the schema and rejected on deserialize. The
|
||||
serializer is **strict on schema**: unknown keys at any level raise
|
||||
`SidecarSchemaError`, missing required fields raise clearly, and
|
||||
`bool` cannot sneak in as a season/episode number. Optional fields
|
||||
(`tmdb_id`, empty `audio`/`subtitles`/`episodes`) are omitted from
|
||||
the output rather than emitted as `null` / `[]`. Tests cover
|
||||
round-trip equivalence (DTO → dict → DTO and DTO → YAML text → DTO),
|
||||
the Foundation S01 PACK case (real-world fixture with mixed sub
|
||||
types — superset captured at season scope), and a Breaking Bad S05
|
||||
EPISODIC case. An on-disk `tmp_path` fixture recreates the Foundation
|
||||
folder structure with placeholder files, ready to be reused by the
|
||||
upcoming repository walk tests in step 3.
|
||||
|
||||
- **`TVShowBuilder` / `SeasonBuilder` — sole construction surface for the
|
||||
TVShow aggregate** (`alfred/domain/tv_shows/builders.py`). The aggregate
|
||||
is now fully frozen; building goes through a mutable scratchpad that
|
||||
emits an immutable `TVShow` via `build()`. Both builders offer a
|
||||
`from_existing()` classmethod to seed from a current frozen aggregate
|
||||
and apply modifications. Episodes are emitted sorted by number within a
|
||||
season, seasons sorted by number within the show.
|
||||
- **`SeasonMode` enum** (`PACK` / `EPISODIC`) in
|
||||
`alfred/domain/tv_shows/value_objects.py`. Computed at read time from
|
||||
the season's structural shape (`Season.mode` property): a season with
|
||||
no explicit episodes is `PACK` (a single release covering the whole
|
||||
season), a season with episodes is `EPISODIC` (currently airing, one
|
||||
release per episode). Never stored — the YAML sidecar encodes the
|
||||
mode via the presence/absence of the `episodes:` block.
|
||||
|
||||
### Changed
|
||||
|
||||
- **TVShow aggregate is now frozen all the way down.** `TVShow`,
|
||||
`Season` and `Episode` are all `@dataclass(frozen=True)`. Children
|
||||
are stored as ordered tuples (`tuple[Season, ...]`,
|
||||
`tuple[Episode, ...]`) sorted by their respective numbers, replacing
|
||||
the previous mutable dicts. Lookup helpers `TVShow.get_season(n)` and
|
||||
`Season.get_episode(n)` traverse the tuple lazily via `next()`. The
|
||||
former `add_episode` / `add_season` mutation methods are gone — all
|
||||
construction goes through `TVShowBuilder` / `SeasonBuilder`.
|
||||
|
||||
### Removed
|
||||
|
||||
- **ShowTracker-territory fields stripped from the TVShow aggregate.**
|
||||
The aggregate now models only what the `.alfred` sidecar stores
|
||||
(filesystem-observable facts + immutable identity). Dropped from the
|
||||
domain:
|
||||
- `TVShow.status` (`ShowStatus`) and the `ShowStatus` enum entirely,
|
||||
along with its TMDB string mapping (`from_string`).
|
||||
- `TVShow.expected_seasons`, `Season.expected_episodes`,
|
||||
`Season.aired_episodes`, `Season.name`.
|
||||
- `TVShow.collection_status()`, `is_complete_series()`,
|
||||
`missing_episodes()`, `is_ongoing()`, `is_ended()` and the
|
||||
`CollectionStatus` enum.
|
||||
- `Season.is_complete()`, `is_fully_aired()`, `missing_episodes()`
|
||||
and the `aired ≤ expected` validation.
|
||||
- `TVShow.add_episode()` / `TVShow.add_season()` /
|
||||
`Season.add_episode()` — replaced by the builder API.
|
||||
These concerns will reappear in a dedicated `ShowTracker` layer (to
|
||||
be designed) that combines the `.alfred` sidecar with live TMDB data
|
||||
to answer questions like "is this show complete?" or "are new
|
||||
episodes out?". Keeping volatile/derived state out of the aggregate
|
||||
matches the factuel-only philosophy locked in `specs/dot_alfred.md`.
|
||||
|
||||
### Internal
|
||||
|
||||
- **Test suite rewritten for the new aggregate shape.**
|
||||
`tests/domain/test_tv_shows.py` now covers frozen invariants, builder
|
||||
ordering, last-write-wins on duplicates, `from_existing` round-trip,
|
||||
and `SeasonMode` derivation. `tests/infrastructure/test_filesystem_extras.py`
|
||||
helper simplified (no more `ShowStatus.ENDED` / `expected_seasons` on
|
||||
test shows). 1078 tests still green.
|
||||
|
||||
- **Design doc for `.alfred/` sidecar persistence
|
||||
(`specs/dot_alfred.md`).** First entry in the new `specs/` directory.
|
||||
Specifies a per-show `.alfred/` directory holding a `show.yaml` and
|
||||
one `season_NN.yaml` per season, used by the upcoming concrete
|
||||
`TVShowRepository` to cache parse/probe results and avoid full
|
||||
rescans on every library read. Covers schema, naming conventions,
|
||||
cache invalidation strategy (size + mtime), self-healing on
|
||||
drift, atomicity (`os.replace`), edge cases (legacy folders,
|
||||
corrupted sidecars, manual file removal), and a phased
|
||||
implementation plan. No code yet — spec only.
|
||||
|
||||
### Internal
|
||||
|
||||
- **`specs/` is now tracked.** The repo-level `.gitignore` had a
|
||||
blanket `*.md` rule with only `CHANGELOG.md` allow-listed. Added
|
||||
explicit exceptions for `/README.md` (root only — avoids
|
||||
unintentionally exposing fixture READMEs) and `specs/**/*.md` so the
|
||||
new design-doc directory ships with the project. Also added an
|
||||
explicit `/.claude/` ignore line for the private dev-docs sub-repo
|
||||
that sits inside the working tree but is versioned separately.
|
||||
|
||||
### Fixed
|
||||
|
||||
- **Multi-episode chain (e.g. `S14E09E10E11`) now collapses to a full
|
||||
range.** The parser previously captured `episode=9, episode_end=10`
|
||||
and dropped E11+. It now returns `episode=first, episode_end=last`,
|
||||
with intermediate values implied. Fixture
|
||||
`shitty/archer_multi_episode/` updated from anti-regression-of-bug
|
||||
to anti-regression-of-fix.
|
||||
- **Apostrophes in titles no longer push the release through the AI
|
||||
fallback.** `Honey.Don't.2025.2160p.WEBRip.DSNP.DV.HDR.x265-Amen`
|
||||
previously parsed with `parse_path="ai"` and everything UNKNOWN
|
||||
because `'` is in the forbidden-chars list. Apostrophes are now
|
||||
pre-stripped before the well-formed check, so the parse completes
|
||||
normally (`title=Honey.Dont, year=2025, quality=2160p, ...`); only
|
||||
the title text loses its apostrophe. `parse_path` becomes
|
||||
`sanitized` to surface the cleanup. Side win: PoP fixture
|
||||
`the_prodigy_full_chaos/` also moves from total failure to a
|
||||
partially-correct parse (year, source, codec extracted).
|
||||
- **Season-range markers (`Sxx-yy`) are now recognized as
|
||||
`tv_complete`.** `Der.Tatortreiniger.S01-06.GERMAN...` previously
|
||||
parsed as `media_type=movie` with `S01-06` glued onto the title.
|
||||
The parser now recognizes the range, sets `season=first`,
|
||||
`media_type=tv_complete`, and removes the marker from the title.
|
||||
`is_season_pack` flips to `true`.
|
||||
- **Pure-punctuation TITLE tokens are dropped at assembly.** Releases
|
||||
with surrounding ` - ` separators (`Vinyl - 1x01 - FHD`) previously
|
||||
produced `title="Vinyl.-"`. Such tokens (a stray dash, a wide pipe
|
||||
`|`, …) carry no title content and are now filtered out. Side
|
||||
effect: PoP fixture `khruangbin_yt_wide_pipe/` also benefits — the
|
||||
YouTube wide-pipe no longer leaks into the title.
|
||||
|
||||
### Added
|
||||
|
||||
- **Fullwidth vertical bar `|` (U+FF5C) is now a recognized release-name
|
||||
token separator.** Added to `alfred/knowledge/release/separators.yaml`
|
||||
so CJK release names (and the occasional decorative YouTube-style use)
|
||||
tokenize cleanly instead of leaving the wide pipe glued onto an
|
||||
adjacent token. The tokenizer in
|
||||
`alfred/domain/release/parser/pipeline.py` already iterates the
|
||||
separator list as plain strings (no regex), so a multi-byte UTF-8
|
||||
separator works without any code change.
|
||||
|
||||
- **`InspectedResult.recommended_action` property** — derived hint that
|
||||
collapses the orchestrator's go / wait / skip decision into a single
|
||||
value (``"process"`` / ``"ask_user"`` / ``"skip"``). Centralizes the
|
||||
exclusion logic that was previously dispersed across road /
|
||||
media_type / main_video checks at each call site. Ordering is part of
|
||||
the contract: ``skip`` (no main video, or media_type == ``"other"``)
|
||||
wins over ``ask_user`` (media_type == ``"unknown"`` or road ==
|
||||
``"path_of_pain"``) which wins over ``process``. Surfaced through the
|
||||
``analyze_release`` tool so the LLM can route on it directly.
|
||||
6 new tests in ``tests/application/test_inspect.py`` cover the four
|
||||
branches and the precedence rules.
|
||||
- **`LanguageRepository` port** in `alfred.domain.shared.ports`. Structural
|
||||
Protocol covering `from_iso`, `from_any`, `all`, `__contains__`, `__len__`
|
||||
— the surface previously coupled to the concrete `LanguageRegistry`.
|
||||
Mirrors the `MediaProber` / `FilesystemScanner` pattern: domain code
|
||||
depends on the Protocol, infrastructure provides the YAML-backed
|
||||
adapter. Tests in `tests/infrastructure/test_language_registry.py`.
|
||||
|
||||
### Changed
|
||||
|
||||
- **`Movie` and `Episode` are now frozen dataclasses.** Both entities
|
||||
hold their track collections as `tuple[AudioTrack, ...]` and
|
||||
`tuple[SubtitleTrack, ...]` instead of mutable lists, and are
|
||||
`@dataclass(frozen=True, eq=False)` (identity-based equality
|
||||
preserved via `__eq__`/`__hash__`). `__post_init__` coercion uses
|
||||
`object.__setattr__` for the `imdb_id` / `title` /
|
||||
`season_number` / `episode_number` normalizations. To project
|
||||
enrichment results (probe output, file metadata) callers now rebuild
|
||||
via `dataclasses.replace(...)`. Pattern aligned with the recent
|
||||
`ParsedRelease` freeze. `MediaWithTracks` mixin contract updated to
|
||||
`tuple` accordingly. `Season` and `TVShow` remain mutable for now —
|
||||
freezing the aggregate root would cascade a full reconstruction on
|
||||
every `add_episode`, deferred.
|
||||
- **`SubtitleCandidate` renamed to `SubtitleScanResult`.** The old name
|
||||
conflated "this might become a placed subtitle" with "this is what a
|
||||
scan pass produced". The class is the output of a scan/identify pass
|
||||
— language/format may still be `None`, confidence reflects how sure
|
||||
the classifier is, and `raw_tokens` holds the filename fragments
|
||||
under analysis. `SubtitleScanResult` says that directly. Pure rename
|
||||
with a refreshed docstring in `alfred/domain/subtitles/entities.py`;
|
||||
no behavior change. Touches the domain entity + `__init__` export,
|
||||
the matcher / identifier / utils services, the manage_subtitles use
|
||||
case, the placer, the metadata store, the shared-media cross-ref
|
||||
comment, and the seven test modules that imported the type.
|
||||
|
||||
- **`ParsedRelease` is now frozen; enrichment passes return new
|
||||
instances.** The VO was mutable so `detect_media_type` and
|
||||
`enrich_from_probe` could patch fields in place — a code smell in a
|
||||
value object whose identity *is* its content. `ParsedRelease` is now
|
||||
`@dataclass(frozen=True)`; `languages` is a `tuple[str, ...]`
|
||||
instead of a `list[str]`. `enrich_from_probe` returns a new
|
||||
`ParsedRelease` via `dataclasses.replace` (only allocates when at
|
||||
least one field actually changed). `inspect_release` rebinds
|
||||
`parsed` after both `detect_media_type` (wrapped in `MediaTypeToken`
|
||||
to satisfy the strict isinstance check that now also runs on
|
||||
replace) and `enrich_from_probe`. Parser pipeline now packs
|
||||
`languages` as a tuple in the assemble dict. Callers updated:
|
||||
`inspect_release`, `testing/recognize_folders_in_downloads.py`, and
|
||||
the enrichment tests (22 call sites + language assertions switched
|
||||
to tuple literals).
|
||||
- **`resolve_destination` use cases take `kb` / `prober` as required
|
||||
params; module-level singletons gone.** The four
|
||||
`resolve_{season,episode,movie,series}_destination` use cases now
|
||||
accept `kb: ReleaseKnowledge` and `prober: MediaProber` as required
|
||||
arguments, matching the shape of `inspect_release`. The module-level
|
||||
`_KB = YamlReleaseKnowledge()` and `_PROBER = FfprobeMediaProber()`
|
||||
singletons that previously lived in
|
||||
`alfred/application/filesystem/resolve_destination.py` are removed —
|
||||
the application layer no longer reaches into infrastructure. The
|
||||
singletons now live at the agent-tools frontier
|
||||
(`alfred/agent/tools/filesystem.py`), where the LLM-facing wrappers
|
||||
instantiate them once and thread them through. `analyze_release` no
|
||||
longer needs the dirty `from ... import _KB` indirection. Tests
|
||||
inject their own stubs by keyword (`prober=_StubProber(...)`) instead
|
||||
of monkeypatching a module attribute.
|
||||
- **`ParsePath` enum renamed to `TokenizationRoute`.** The old name
|
||||
collided with `pathlib.Path` in code-reading mental models, and was
|
||||
one letter from `parse_path` (the field that holds the value) — making
|
||||
it harder than it needed to be to spot the type vs the attribute.
|
||||
``TokenizationRoute`` says what it actually captures (DIRECT /
|
||||
SANITIZED / AI = how the name reached the tokenizer), and the class
|
||||
docstring now spells out the orthogonality with ``Road`` (EASY /
|
||||
SHITTY / PATH_OF_PAIN, which captures parser confidence on
|
||||
``ParseReport``). The ``parse_path`` field name stays unchanged —
|
||||
string values too — so YAML fixtures, the ``analyze_release`` tool
|
||||
spec, and any external consumer are untouched.
|
||||
- **`enrich_from_probe` codec mappings moved to YAML.** The three
|
||||
hard-coded module dicts (`_VIDEO_CODEC_MAP`, `_AUDIO_CODEC_MAP`,
|
||||
`_CHANNEL_MAP`) translating ffprobe output to scene tokens
|
||||
(`hevc → x265`, `eac3 → EAC3`, `8 → "7.1"`, …) now live in
|
||||
`alfred/knowledge/release/probe_mappings.yaml` and are loaded into
|
||||
`ReleaseKnowledge.probe_mappings` (new port field, populated by
|
||||
`YamlReleaseKnowledge`). `enrich_from_probe` gains a third `kb`
|
||||
parameter and reads the maps from there. Aligns with the CLAUDE.md
|
||||
rule that lookup tables of domain knowledge belong in YAML, not in
|
||||
Python — and opens the door to a future "learn new codec" pass.
|
||||
Callers updated: `inspect_release`, `testing/recognize_folders_in_downloads.py`,
|
||||
and all 22 sites in `tests/application/test_enrich_from_probe.py`.
|
||||
- **`ParsedRelease.tech_string` is now a derived `@property`**
|
||||
(`alfred/domain/release/value_objects.py`). It computes
|
||||
`quality.source.codec` joined by dots on every access, so it stays in
|
||||
sync with the underlying fields by construction. The stored field is
|
||||
gone from the dataclass, the dict returned by `assemble()` no longer
|
||||
carries the key, `parse_release`'s malformed-name fallback drops the
|
||||
`tech_string=""` kwarg, and `enrich_from_probe` no longer re-derives
|
||||
it after filling `quality`/`source`/`codec`. Closes the
|
||||
parser/enrichment double-source-of-truth that `e79ca46` had to fix
|
||||
reactively. The fixtures runner now injects `tech_string` alongside
|
||||
`is_season_pack` since `asdict()` skips properties.
|
||||
- **`RuleScope.level` is now an enum (`RuleScopeLevel`).** The set of
|
||||
valid levels (global, release_group, movie, show, season, episode)
|
||||
was documented only in a docstring comment and validated nowhere.
|
||||
`RuleScopeLevel(str, Enum)` keeps wire compatibility (YAML
|
||||
serialization, `.value` access) while making the closed set explicit
|
||||
to type-checkers and IDEs. `to_dict()` emits `.value` strings so
|
||||
YAML output is unchanged.
|
||||
- **`FilePath` VO uses `__post_init__` instead of a hand-rolled
|
||||
`__init__`.** Same public API (accepts `str | Path`), same behavior,
|
||||
but the dataclass-generated `__init__` is no longer bypassed. One
|
||||
less smell in the shared VOs.
|
||||
- **`Language` VO is strict by default; `Language.from_raw()` factory
|
||||
for normalization.** The previous `__post_init__` mutated `iso` and
|
||||
`aliases` via `object.__setattr__` on a frozen dataclass — a code
|
||||
smell hiding behind the dataclass facade. Split: the direct
|
||||
constructor now rejects un-normalized input (uppercase iso,
|
||||
whitespace in aliases, etc.), and `Language.from_raw()` handles
|
||||
arbitrary YAML/user input. Only one caller (LanguageRegistry loading
|
||||
the ISO YAML) needed migration.
|
||||
- **`ParsedRelease.normalised` renamed to `clean`.** The field name
|
||||
promised "dots instead of spaces" but in practice held
|
||||
`raw - site_tag - apostrophes` — only used by `season_folder_name()`.
|
||||
Renamed and docstring corrected.
|
||||
- **`ParsedRelease.media_type` / `parse_path` are strict enums.** The
|
||||
fields were already typed as `MediaTypeToken` / `ParsePath`, but a
|
||||
tolerant `__post_init__` coerced raw strings. With both classes
|
||||
being `(str, Enum)`, the coercion served no purpose. Strict
|
||||
constructor; `.value` no longer passed at call sites; dropped the
|
||||
unused `_VALID_MEDIA_TYPES` / `_VALID_PARSE_PATHS` lookup tables.
|
||||
|
||||
### Removed
|
||||
|
||||
- **`settings.min_movie_size_bytes`** — orphan Pydantic field +
|
||||
validator. Its only consumer (`MovieService.validate_movie_file`)
|
||||
had been removed during an earlier refactor. The "real movie vs
|
||||
sample" rule now lives in extension-based exclusion
|
||||
(`application/release/supported_media.py`) and PoP. If a size
|
||||
threshold is ever needed, it'll go in a knowledge YAML, not in
|
||||
`settings`.
|
||||
|
||||
### Internal
|
||||
|
||||
- **Flattened `alfred.domain.shared.media/` package into a single
|
||||
`media.py` module.** The 6-file package (audio, video, subtitle,
|
||||
info, matching, tracks_mixin + `__init__`) collapsed into one ~250
|
||||
LoC module. All 12 import sites continue to resolve unchanged
|
||||
(`from alfred.domain.shared.media import AudioTrack, MediaInfo, …`)
|
||||
since Python treats `media.py` and `media/__init__.py`
|
||||
interchangeably for import paths. Easier to scan when the whole
|
||||
bounded-context fits on one screen.
|
||||
- **`SubtitleKnowledgeBase` types `language_registry` against the
|
||||
`LanguageRepository` port** instead of the concrete `LanguageRegistry`
|
||||
class. The default constructor still instantiates the concrete adapter
|
||||
when no repository is injected — behaviour is unchanged for existing
|
||||
callers. Opens the door to in-memory fakes in future tests without
|
||||
loading the full ISO 639 YAML.
|
||||
- **Moved `detect_media_type` and `enrich_from_probe` from
|
||||
`alfred.application.filesystem` to `alfred.application.release`**.
|
||||
They are inspection-pipeline helpers — their natural home is next to
|
||||
`inspect_release`, not next to the filesystem use cases. The move
|
||||
also eliminates a circular-import workaround in
|
||||
`resolve_destination.py`: `inspect_release` can now be imported at
|
||||
module top instead of lazily inside `_resolve_parsed`. Public
|
||||
surface is unchanged for callers that imported the helpers from
|
||||
their full module paths (the only call sites — `inspect.py`, two
|
||||
tests, one testing script — were updated in this commit).
|
||||
|
||||
### Added
|
||||
|
||||
- **`resolve_*_destination` use cases now consume `inspect_release`**.
|
||||
`resolve_episode_destination` and `resolve_movie_destination` reuse
|
||||
their existing `source_file` parameter as the inspection target;
|
||||
`resolve_season_destination` and `resolve_series_destination` gain
|
||||
a new **optional** `source_path` parameter (also threaded through
|
||||
the tool wrappers and YAML specs). When the path exists, ffprobe
|
||||
data fills tokens missing from the release name (e.g. quality) and
|
||||
refreshes `tech_string`, so the destination folder / file names
|
||||
end up more accurate. When the path is missing or absent (back-compat
|
||||
callers), the use cases fall back to parse-only — same behavior as
|
||||
before.
|
||||
|
||||
### Fixed
|
||||
|
||||
- **`enrich_from_probe` now refreshes `tech_string`** after filling
|
||||
`quality` / `source` / `codec`. Previously the field stayed at its
|
||||
parser-time value, so filename builders saw stale tech tokens even
|
||||
after a successful probe. New `TestTechString` class in
|
||||
`tests/application/test_enrich_from_probe.py` locks the behavior.
|
||||
|
||||
### Added
|
||||
|
||||
- **`inspect_release` orchestrator + `InspectedResult` VO**
|
||||
(`alfred/application/release/inspect.py`). Single composition of the
|
||||
four inspection layers: `parse_release` → `detect_media_type` (patches
|
||||
`parsed.media_type`) → `find_main_video` (top-level scan) →
|
||||
`prober.probe` + `enrich_from_probe` when a video exists and the
|
||||
refined media type isn't in `{"unknown", "other"}`. Returns a frozen
|
||||
`InspectedResult(parsed, report, source_path, main_video, media_info,
|
||||
probe_used)` that downstream callers consume directly instead of
|
||||
rebuilding the same chain. `kb` and `prober` are injected — no
|
||||
module-level singletons. Never raises.
|
||||
|
||||
### Changed
|
||||
|
||||
- **`analyze_release` tool now delegates to `inspect_release`** — same
|
||||
output shape, plus two new fields: `confidence` (0–100) and `road`
|
||||
(`"easy"` / `"shitty"` / `"path_of_pain"`) surfaced from the parser's
|
||||
`ParseReport`. The tool spec (`specs/analyze_release.yaml`) documents
|
||||
both fields so the LLM can route releases by confidence.
|
||||
|
||||
- **`MediaProber` port now covers full media probing**: added
|
||||
`probe(video) -> MediaInfo | None` alongside the existing
|
||||
`list_subtitle_streams`. `FfprobeMediaProber` (in
|
||||
`alfred/infrastructure/probe/`) implements both methods and is now
|
||||
the single adapter shelling out to `ffprobe`. The standalone
|
||||
`alfred/infrastructure/filesystem/ffprobe.py` module was removed —
|
||||
all callers (tools, testing scripts) instantiate
|
||||
`FfprobeMediaProber` instead. Unblocks the upcoming
|
||||
`inspect_release` orchestrator, which depends on the port.
|
||||
|
||||
### Removed
|
||||
|
||||
- `alfred/infrastructure/filesystem/ffprobe.py` (folded into the
|
||||
`FfprobeMediaProber` adapter).
|
||||
|
||||
---
|
||||
|
||||
## [2026-05-20] — Release parser confidence scoring + exclusion
|
||||
|
||||
### Added
|
||||
|
||||
- **Pre-pipeline exclusion helpers** (`alfred/application/release/supported_media.py`):
|
||||
`is_supported_video(path, kb)` (extension-only check against
|
||||
`kb.video_extensions`) and `find_main_video(folder, kb)` (top-level
|
||||
scan, lexicographically-first eligible file, returns `None` when no
|
||||
video qualifies; accepts a bare file as folder for single-file
|
||||
releases). No size threshold, no filename heuristics —
|
||||
PATH_OF_PAIN handles the exotic cases. Foundation for the future
|
||||
`inspect_release` orchestrator.
|
||||
|
||||
- **Release parser — parse-confidence scoring** (`alfred/domain/release/parser/scoring.py`,
|
||||
`alfred/knowledge/release/scoring.yaml`). `parse_release` now returns
|
||||
`(ParsedRelease, ParseReport)`. The new `ParseReport` frozen VO
|
||||
carries a 0–100 `confidence`, a `road` (`"easy"` / `"shitty"` /
|
||||
`"path_of_pain"`), the residual UNKNOWN tokens, and the missing
|
||||
critical fields. EASY is decided structurally (a group schema
|
||||
matched); SHITTY vs PATH_OF_PAIN is decided by score against a
|
||||
YAML-configurable cutoff (default 60). Weights and penalties also
|
||||
live in `scoring.yaml` — title 30, media_type 20, year 15, season
|
||||
10, episode 5, tech 5 each; penalty 5 per UNKNOWN token capped at
|
||||
-30. `Road` is a new enum, distinct from `ParsePath` (which records
|
||||
the tokenization route, not the confidence tier). `ReleaseKnowledge`
|
||||
port gains a `scoring: dict` field.
|
||||
|
||||
### Changed
|
||||
|
||||
- **`parse_release` signature** is now `(name, kb) → tuple[ParsedRelease,
|
||||
ParseReport]` instead of returning a bare `ParsedRelease`. Call
|
||||
sites updated in `application/filesystem/resolve_destination.py` and
|
||||
`agent/tools/filesystem.py`. Tests updated accordingly.
|
||||
|
||||
---
|
||||
|
||||
## [2026-05-20] — Release parser v2 (EASY + SHITTY)
|
||||
|
||||
### Added
|
||||
|
||||
- **Release parser v2 — EASY path live** (`alfred/domain/release/parser/`):
|
||||
new annotate-based pipeline (tokenize → annotate → assemble) drives
|
||||
releases from known groups. Exposes `Token` (frozen VO with `index` +
|
||||
`role` + `extra`), `TokenRole` enum (structural/technical/meta families),
|
||||
and `GroupSchema` / `SchemaChunk` value objects.
|
||||
- `pipeline.tokenize`: string-ops separator split (no regex), strips
|
||||
a `[site.tag]` prefix/suffix first.
|
||||
- `pipeline.annotate`: detects the trailing group right-to-left
|
||||
(priority to `codec-GROUP` shape, fallback to any non-source dashed
|
||||
token), looks up its `GroupSchema`, then walks tokens and schema
|
||||
chunks in lockstep — optional chunks that don't match are skipped,
|
||||
mandatory mismatches abort EASY and return `None` so the caller can
|
||||
fall back to SHITTY.
|
||||
- `pipeline.assemble`: folds annotated tokens into a
|
||||
`ParsedRelease`-compatible dict.
|
||||
- `parse_release` (in `release.services`) tries the v2 EASY path first
|
||||
and falls through to the legacy SHITTY heuristic on `None`. Legacy
|
||||
SHITTY/PATH OF PAIN behavior is unchanged.
|
||||
- Knowledge: `alfred/knowledge/release/release_groups/{kontrast,elite,
|
||||
rarbg}.yaml` declare the canonical chunk order per group, loaded via
|
||||
new `ReleaseKnowledge.group_schema(name)` port method.
|
||||
- Tests in `tests/domain/release/test_parser_v2_{scaffolding,easy}.py`
|
||||
cover token VOs, site-tag stripping, group detection, schema-driven
|
||||
annotation (movie, TV episode, season pack with optional source),
|
||||
and field assembly.
|
||||
|
||||
- **Release parser v2 — enricher pass** completes the EASY pipeline.
|
||||
The structural schema walk now tolerates non-positional tokens
|
||||
between chunks (instead of aborting on leftover tokens), and a second
|
||||
pass tags them with audio / video-meta / edition / language roles.
|
||||
Multi-token sequences from `audio.yaml`, `video.yaml`, `editions.yaml`
|
||||
(e.g. `DTS.HD.MA`, `DV.HDR10`, `TrueHD.Atmos`, `DIRECTORS.CUT`) are
|
||||
matched before single tokens. Channel layouts like `5.1` and `7.1`
|
||||
(split into two tokens by the `.` separator) are detected as
|
||||
consecutive pairs. Sequence members carry an `extra["sequence_member"]`
|
||||
marker so `assemble` extracts the canonical value only from the
|
||||
primary token. KONTRAST releases with audio / HDR / edition / language
|
||||
metadata now produce a fully populated `ParsedRelease`.
|
||||
|
||||
- **Streaming distributor as a separate dimension** from encoding source.
|
||||
New `alfred/knowledge/release/distributors.yaml` (NF, AMZN, DSNP, HMAX,
|
||||
ATVP, HULU, PCOK, PMTP, CR) feeds a new `ReleaseKnowledge.distributors`
|
||||
port field, a `TokenRole.DISTRIBUTOR` annotation, and a
|
||||
`ParsedRelease.distributor` field. `WEB-DL` stays the source; the
|
||||
platform that produced the release is now recorded distinctly. The
|
||||
five entries (NF, AMZN, DSNP, HMAX, ATVP) were correspondingly removed
|
||||
from `sources.yaml`.
|
||||
|
||||
- **Real-world release fixtures** under `tests/fixtures/releases/{easy,shitty,path_of_pain}/`,
|
||||
each documenting an expected `ParsedRelease` plus the future `routing`
|
||||
(library / torrents / seed_hardlinks) for the upcoming `organize_media`
|
||||
@@ -54,6 +918,22 @@ callers).
|
||||
|
||||
### Changed
|
||||
|
||||
- **Release parser v2 — SHITTY simplified to dict-driven tagging**.
|
||||
The legacy ~480-line heuristic block in `release/services.py` is gone;
|
||||
`pipeline._annotate_shitty` does a single pass that looks each token
|
||||
up in the kb buckets (resolutions / sources / codecs / distributors /
|
||||
year / `SxxExx`) with first-match-wins semantics, and the leftmost
|
||||
contiguous UNKNOWN run becomes the title. `annotate()` no longer
|
||||
returns `None` — SHITTY is the always-on fallback when no group schema
|
||||
matches. `services.py` shrunk from ~525 to ~85 lines. Four fixtures
|
||||
(`deutschland_franchise_box`, `sleaford_yt_slug`,
|
||||
`super_mario_bilingual`, `predator_space_separators` — the last one
|
||||
moved from `shitty/` → `path_of_pain/`) are now marked
|
||||
`pytest.mark.xfail(strict=False)` documenting PoP-grade pathologies
|
||||
that SHITTY intentionally won't handle. `ReleaseFixture` grows an
|
||||
`xfail_reason` field; the parametrized suite wires the xfail mark
|
||||
automatically.
|
||||
|
||||
- **`parse_release` tokenizer is now data-driven**: it splits on any character
|
||||
listed in `separators.yaml` (regex character class) instead of `name.split(".")`.
|
||||
This makes YTS-style releases (`The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]`),
|
||||
|
||||
@@ -6,13 +6,13 @@ from collections.abc import AsyncGenerator
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from alfred.infrastructure.metadata import MetadataStore
|
||||
from alfred.infrastructure.persistence import get_memory
|
||||
from alfred.infrastructure.metadata_TO_CHECK import MetadataStore
|
||||
from alfred.infrastructure.persistence_TO_CHECK import get_memory
|
||||
from alfred.settings import settings
|
||||
|
||||
from .prompt import PromptBuilder
|
||||
from .registry import Tool, make_tools
|
||||
from .workflows import WorkflowLoader
|
||||
from .workflows_TO_CHECK import WorkflowLoader
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@@ -3,12 +3,12 @@
|
||||
import json
|
||||
from typing import Any
|
||||
|
||||
from alfred.infrastructure.persistence import get_memory
|
||||
from alfred.infrastructure.persistence.memory import MemoryRegistry
|
||||
from alfred.infrastructure.persistence_TO_CHECK import get_memory
|
||||
from alfred.infrastructure.persistence_TO_CHECK.memory import MemoryRegistry
|
||||
|
||||
from .expressions import build_expressions_context
|
||||
from .registry import Tool
|
||||
from .workflows import WorkflowLoader
|
||||
from .workflows_TO_CHECK import WorkflowLoader
|
||||
|
||||
# Tools that are always available, regardless of workflow scope.
|
||||
# Kept small on purpose — the noyau is what the agent uses to either
|
||||
|
||||
@@ -6,8 +6,8 @@ from collections.abc import Callable
|
||||
from dataclasses import dataclass
|
||||
from typing import Any
|
||||
|
||||
from .tools.spec import ToolSpec, ToolSpecError
|
||||
from .tools.spec_loader import load_tool_specs
|
||||
from .tools_TO_CHECK.spec import ToolSpec, ToolSpecError
|
||||
from .tools_TO_CHECK.spec_loader import load_tool_specs
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -130,10 +130,10 @@ def make_tools(settings) -> dict[str, Tool]:
|
||||
Returns:
|
||||
Dictionary mapping tool names to Tool objects.
|
||||
"""
|
||||
from .tools import api as api_tools # noqa: PLC0415
|
||||
from .tools import filesystem as fs_tools # noqa: PLC0415
|
||||
from .tools import language as lang_tools # noqa: PLC0415
|
||||
from .tools import workflow as wf_tools # noqa: PLC0415
|
||||
from .tools_TO_CHECK import api as api_tools # noqa: PLC0415
|
||||
from .tools_TO_CHECK import filesystem as fs_tools # noqa: PLC0415
|
||||
from .tools_TO_CHECK import language as lang_tools # noqa: PLC0415
|
||||
from .tools_TO_CHECK import workflow as wf_tools # noqa: PLC0415
|
||||
|
||||
tool_functions = [
|
||||
fs_tools.set_path_for_folder,
|
||||
|
||||
@@ -1,22 +0,0 @@
|
||||
"""Tools module - filesystem and API tools for the agent."""
|
||||
|
||||
from .api import (
|
||||
add_torrent_by_index,
|
||||
add_torrent_to_qbittorrent,
|
||||
find_media_imdb_id,
|
||||
find_torrent,
|
||||
get_torrent_by_index,
|
||||
)
|
||||
from .filesystem import list_folder, set_path_for_folder
|
||||
from .language import set_language
|
||||
|
||||
__all__ = [
|
||||
"set_path_for_folder",
|
||||
"list_folder",
|
||||
"find_media_imdb_id",
|
||||
"find_torrent",
|
||||
"get_torrent_by_index",
|
||||
"add_torrent_to_qbittorrent",
|
||||
"add_torrent_by_index",
|
||||
"set_language",
|
||||
]
|
||||
@@ -0,0 +1,23 @@
|
||||
"""Tools module — agent-exposed wrappers.
|
||||
|
||||
Re-exports are intentionally minimal during the ``unfuck`` refactor.
|
||||
Tool wiring (registry / specs / LLM-facing surface) is the last
|
||||
chunk of work on this branch; until then, importers should reach
|
||||
into the submodules directly (``alfred.agent.tools.filesystem``, …).
|
||||
"""
|
||||
|
||||
from .api import (
|
||||
add_torrent_by_index,
|
||||
add_torrent_to_qbittorrent,
|
||||
find_torrent,
|
||||
get_torrent_by_index,
|
||||
)
|
||||
from .language import set_language
|
||||
|
||||
__all__ = [
|
||||
"find_torrent",
|
||||
"get_torrent_by_index",
|
||||
"add_torrent_to_qbittorrent",
|
||||
"add_torrent_by_index",
|
||||
"set_language",
|
||||
]
|
||||
@@ -3,35 +3,47 @@
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
from alfred.application.movies import SearchMovieUseCase
|
||||
from alfred.application.torrents import AddTorrentUseCase, SearchTorrentsUseCase
|
||||
from alfred.infrastructure.api.knaben import knaben_client
|
||||
from alfred.infrastructure.api.qbittorrent import qbittorrent_client
|
||||
from alfred.infrastructure.api.tmdb import tmdb_client
|
||||
from alfred.infrastructure.persistence import get_memory
|
||||
from alfred.application.movies_TO_CHECK import SearchMovieUseCase
|
||||
from alfred.application.torrents_TO_CHECK import AddTorrentUseCase, SearchTorrentsUseCase
|
||||
from alfred.application.tv_shows_TO_CHECK import SearchShowUseCase
|
||||
from alfred.infrastructure.api_TO_CHECK.knaben import knaben_client
|
||||
from alfred.infrastructure.api_TO_CHECK.qbittorrent import qbittorrent_client
|
||||
from alfred.infrastructure.api_TO_CHECK.tmdb import tmdb_client
|
||||
from alfred.infrastructure.persistence_TO_CHECK import get_memory
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def find_media_imdb_id(media_title: str) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/find_media_imdb_id.yaml."""
|
||||
def search_movies(media_title: str) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/search_movies.yaml."""
|
||||
use_case = SearchMovieUseCase(tmdb_client)
|
||||
response = use_case.execute(media_title)
|
||||
result = response.to_dict()
|
||||
|
||||
if result.get("status") == "ok":
|
||||
memory = get_memory()
|
||||
memory.stm.set_entity(
|
||||
"last_media_search",
|
||||
{
|
||||
"title": result.get("title"),
|
||||
"imdb_id": result.get("imdb_id"),
|
||||
"media_type": result.get("media_type"),
|
||||
"tmdb_id": result.get("tmdb_id"),
|
||||
},
|
||||
memory.stm.set_entity("last_movie_search", {"hits": result.get("hits", [])})
|
||||
memory.stm.set_topic("searching_movie")
|
||||
logger.debug(
|
||||
f"Stored movie search result in STM: {len(result.get('hits', []))} hits"
|
||||
)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def search_shows(show_title: str) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/search_shows.yaml."""
|
||||
use_case = SearchShowUseCase(tmdb_client)
|
||||
response = use_case.execute(show_title)
|
||||
result = response.to_dict()
|
||||
|
||||
if result.get("status") == "ok":
|
||||
memory = get_memory()
|
||||
memory.stm.set_entity("last_show_search", {"hits": result.get("hits", [])})
|
||||
memory.stm.set_topic("searching_show")
|
||||
logger.debug(
|
||||
f"Stored show search result in STM: {len(result.get('hits', []))} hits"
|
||||
)
|
||||
memory.stm.set_topic("searching_media")
|
||||
logger.debug(f"Stored media search result in STM: {result.get('title')}")
|
||||
|
||||
return result
|
||||
|
||||
@@ -1,4 +1,20 @@
|
||||
"""Filesystem tools for folder management."""
|
||||
"""Filesystem tools for folder management.
|
||||
|
||||
Thin wrappers around the 5 atomic filesystem use cases
|
||||
(``alfred.application.filesystem``) plus a few self-contained tools
|
||||
(``analyze_release``, ``probe_media``, ``learn``, …).
|
||||
|
||||
Tools removed during the ``unfuck`` filesystem refactor — to be
|
||||
rewired in a later step:
|
||||
- ``manage_subtitles`` (depends on the rewritten subtitle services)
|
||||
- ``set_path_for_folder`` (no replacement use case yet)
|
||||
- ``create_seed_links`` (flow has changed: hard-link straight to
|
||||
library, no copy back; will be re-introduced per-file when the
|
||||
organize-release workflow lands)
|
||||
- ``resolve_season_destination`` / ``resolve_episode_destination``
|
||||
/ ``resolve_movie_destination`` / ``resolve_series_destination``
|
||||
(their use cases moved to ``_OLD`` files pending a rewrite)
|
||||
"""
|
||||
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
@@ -7,120 +23,136 @@ import yaml
|
||||
|
||||
import alfred as _alfred_pkg
|
||||
from alfred.application.filesystem import (
|
||||
CreateSeedLinksUseCase,
|
||||
ListFolderUseCase,
|
||||
ManageSubtitlesUseCase,
|
||||
MoveMediaUseCase,
|
||||
SetFolderPathUseCase,
|
||||
DirectoryRoots,
|
||||
create_dir_use_case,
|
||||
list_dir_use_case,
|
||||
move_file_use_case,
|
||||
)
|
||||
from alfred.application.filesystem.detect_media_type import detect_media_type
|
||||
from alfred.application.filesystem.enrich_from_probe import enrich_from_probe
|
||||
from alfred.application.filesystem.resolve_destination import (
|
||||
resolve_episode_destination as _resolve_episode_destination,
|
||||
)
|
||||
from alfred.application.filesystem.resolve_destination import (
|
||||
resolve_movie_destination as _resolve_movie_destination,
|
||||
)
|
||||
from alfred.application.filesystem.resolve_destination import (
|
||||
resolve_season_destination as _resolve_season_destination,
|
||||
)
|
||||
from alfred.application.filesystem.resolve_destination import (
|
||||
resolve_series_destination as _resolve_series_destination,
|
||||
)
|
||||
from alfred.infrastructure.filesystem import FileManager, create_folder, move
|
||||
from alfred.infrastructure.filesystem.ffprobe import probe
|
||||
from alfred.infrastructure.filesystem.find_video import find_video_file
|
||||
from alfred.infrastructure.metadata import MetadataStore
|
||||
from alfred.infrastructure.persistence import get_memory
|
||||
from alfred.infrastructure.knowledge_TO_CHECK.release_kb import YamlReleaseKnowledge
|
||||
from alfred.infrastructure.metadata_TO_CHECK import MetadataStore
|
||||
from alfred.infrastructure.persistence_TO_CHECK import get_memory
|
||||
from alfred.infrastructure.probe_TO_CHECK import FfprobeMediaProber
|
||||
|
||||
# Agent-tools frontier: this is the legitimate home for the singletons that
|
||||
# back every LLM-exposed wrapper. The use cases below take ``kb`` / ``prober``
|
||||
# as required params; tests inject their own stubs.
|
||||
_KB = YamlReleaseKnowledge()
|
||||
_PROBER = FfprobeMediaProber()
|
||||
|
||||
_LEARNED_ROOT = Path(_alfred_pkg.__file__).parent.parent / "data" / "knowledge"
|
||||
|
||||
|
||||
class _RootsNotConfigured(Exception):
|
||||
"""Raised when one of the 4 expected roots is missing from memory."""
|
||||
|
||||
def __init__(self, missing: list[str]):
|
||||
super().__init__(f"Roots not configured: {missing}")
|
||||
self.missing = missing
|
||||
|
||||
|
||||
def _load_directory_roots() -> DirectoryRoots:
|
||||
"""Build :class:`DirectoryRoots` from the persisted memory.
|
||||
|
||||
Reads:
|
||||
- ``ltm.workspace.download`` → ``downloads``
|
||||
- ``ltm.workspace.torrent`` → ``torrents``
|
||||
- ``ltm.library_paths['movies']`` → ``movies``
|
||||
- ``ltm.library_paths['tv_shows']`` → ``tv_shows``
|
||||
|
||||
Raises:
|
||||
_RootsNotConfigured: if any of the four paths is unset.
|
||||
"""
|
||||
memory = get_memory()
|
||||
downloads = memory.ltm.workspace.download
|
||||
torrents = memory.ltm.workspace.torrent
|
||||
movies = memory.ltm.library_paths.get("movies")
|
||||
tv_shows = memory.ltm.library_paths.get("tv_shows")
|
||||
|
||||
missing: list[str] = []
|
||||
if not downloads:
|
||||
missing.append("downloads")
|
||||
if not torrents:
|
||||
missing.append("torrents")
|
||||
if not movies:
|
||||
missing.append("movies")
|
||||
if not tv_shows:
|
||||
missing.append("tv_shows")
|
||||
if missing:
|
||||
raise _RootsNotConfigured(missing)
|
||||
|
||||
return DirectoryRoots(
|
||||
downloads=Path(downloads),
|
||||
torrents=Path(torrents),
|
||||
movies=Path(movies),
|
||||
tv_shows=Path(tv_shows),
|
||||
)
|
||||
|
||||
|
||||
def _roots_error(exc: _RootsNotConfigured) -> dict[str, Any]:
|
||||
return {
|
||||
"status": "error",
|
||||
"error": "roots_not_configured",
|
||||
"message": (
|
||||
f"Missing roots: {exc.missing}. "
|
||||
"Configure them via /set_path before using filesystem tools."
|
||||
),
|
||||
}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# 5 atomic filesystem tools — thin wrappers over the use cases.
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def list_folder(path: str) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/list_folder.yaml."""
|
||||
try:
|
||||
roots = _load_directory_roots()
|
||||
except _RootsNotConfigured as e:
|
||||
return _roots_error(e)
|
||||
return list_dir_use_case(Path(path), roots).to_dict()
|
||||
|
||||
|
||||
def create_directory(path: str) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/create_directory.yaml."""
|
||||
try:
|
||||
roots = _load_directory_roots()
|
||||
except _RootsNotConfigured as e:
|
||||
return _roots_error(e)
|
||||
return create_dir_use_case(Path(path), roots).to_dict()
|
||||
|
||||
|
||||
def move_media(source: str, destination: str) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/move_media.yaml."""
|
||||
file_manager = FileManager()
|
||||
use_case = MoveMediaUseCase(file_manager)
|
||||
return use_case.execute(source, destination).to_dict()
|
||||
try:
|
||||
roots = _load_directory_roots()
|
||||
except _RootsNotConfigured as e:
|
||||
return _roots_error(e)
|
||||
return move_file_use_case(Path(source), Path(destination), roots).to_dict()
|
||||
|
||||
|
||||
def move_to_destination(source: str, destination: str) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/move_to_destination.yaml."""
|
||||
parent = str(Path(destination).parent)
|
||||
result = create_folder(parent)
|
||||
if result["status"] != "ok":
|
||||
return result
|
||||
return move(source, destination)
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/move_to_destination.yaml.
|
||||
|
||||
Convenience tool that creates the destination's parent directory
|
||||
if missing, then moves the file. Saves the LLM from having to
|
||||
chain ``create_directory`` + ``move_media`` explicitly.
|
||||
"""
|
||||
try:
|
||||
roots = _load_directory_roots()
|
||||
except _RootsNotConfigured as e:
|
||||
return _roots_error(e)
|
||||
|
||||
dst = Path(destination)
|
||||
mkdir_resp = create_dir_use_case(dst.parent, roots)
|
||||
if mkdir_resp.status != "ok":
|
||||
return mkdir_resp.to_dict()
|
||||
return move_file_use_case(Path(source), dst, roots).to_dict()
|
||||
|
||||
|
||||
def resolve_season_destination(
|
||||
release_name: str,
|
||||
tmdb_title: str,
|
||||
tmdb_year: int,
|
||||
confirmed_folder: str | None = None,
|
||||
) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_season_destination.yaml."""
|
||||
return _resolve_season_destination(
|
||||
release_name, tmdb_title, tmdb_year, confirmed_folder
|
||||
).to_dict()
|
||||
|
||||
|
||||
def resolve_episode_destination(
|
||||
release_name: str,
|
||||
source_file: str,
|
||||
tmdb_title: str,
|
||||
tmdb_year: int,
|
||||
tmdb_episode_title: str | None = None,
|
||||
confirmed_folder: str | None = None,
|
||||
) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_episode_destination.yaml."""
|
||||
return _resolve_episode_destination(
|
||||
release_name,
|
||||
source_file,
|
||||
tmdb_title,
|
||||
tmdb_year,
|
||||
tmdb_episode_title,
|
||||
confirmed_folder,
|
||||
).to_dict()
|
||||
|
||||
|
||||
def resolve_movie_destination(
|
||||
release_name: str,
|
||||
source_file: str,
|
||||
tmdb_title: str,
|
||||
tmdb_year: int,
|
||||
) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_movie_destination.yaml."""
|
||||
return _resolve_movie_destination(
|
||||
release_name, source_file, tmdb_title, tmdb_year
|
||||
).to_dict()
|
||||
|
||||
|
||||
def resolve_series_destination(
|
||||
release_name: str,
|
||||
tmdb_title: str,
|
||||
tmdb_year: int,
|
||||
confirmed_folder: str | None = None,
|
||||
) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_series_destination.yaml."""
|
||||
return _resolve_series_destination(
|
||||
release_name, tmdb_title, tmdb_year, confirmed_folder
|
||||
).to_dict()
|
||||
|
||||
|
||||
def create_seed_links(
|
||||
library_file: str, original_download_folder: str
|
||||
) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/create_seed_links.yaml."""
|
||||
file_manager = FileManager()
|
||||
use_case = CreateSeedLinksUseCase(file_manager)
|
||||
return use_case.execute(library_file, original_download_folder).to_dict()
|
||||
|
||||
|
||||
def manage_subtitles(source_video: str, destination_video: str) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/manage_subtitles.yaml."""
|
||||
file_manager = FileManager()
|
||||
use_case = ManageSubtitlesUseCase(file_manager)
|
||||
return use_case.execute(source_video, destination_video).to_dict()
|
||||
# ---------------------------------------------------------------------------
|
||||
# Self-contained tools — not impacted by the filesystem refactor.
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def learn(pack: str, category: str, key: str, values: list[str]) -> dict[str, Any]:
|
||||
@@ -180,32 +212,12 @@ def learn(pack: str, category: str, key: str, values: list[str]) -> dict[str, An
|
||||
}
|
||||
|
||||
|
||||
def set_path_for_folder(folder_name: str, path_value: str) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/set_path_for_folder.yaml."""
|
||||
file_manager = FileManager()
|
||||
use_case = SetFolderPathUseCase(file_manager)
|
||||
response = use_case.execute(folder_name, path_value)
|
||||
return response.to_dict()
|
||||
|
||||
|
||||
def analyze_release(release_name: str, source_path: str) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/analyze_release.yaml."""
|
||||
from alfred.application.filesystem.resolve_destination import _KB # noqa: PLC0415
|
||||
from alfred.domain.release.services import parse_release # noqa: PLC0415
|
||||
|
||||
path = Path(source_path)
|
||||
parsed = parse_release(release_name, _KB)
|
||||
parsed.media_type = detect_media_type(parsed, path, _KB)
|
||||
|
||||
probe_used = False
|
||||
if parsed.media_type not in ("unknown", "other"):
|
||||
video_file = find_video_file(path, _KB)
|
||||
if video_file:
|
||||
media_info = probe(video_file)
|
||||
if media_info:
|
||||
enrich_from_probe(parsed, media_info)
|
||||
probe_used = True
|
||||
from alfred.application.release_TO_CHECK import inspect_release # noqa: PLC0415
|
||||
|
||||
result = inspect_release(release_name, Path(source_path), _KB, _PROBER)
|
||||
parsed = result.parsed
|
||||
return {
|
||||
"status": "ok",
|
||||
"media_type": parsed.media_type,
|
||||
@@ -227,7 +239,10 @@ def analyze_release(release_name: str, source_path: str) -> dict[str, Any]:
|
||||
"edition": parsed.edition,
|
||||
"site_tag": parsed.site_tag,
|
||||
"is_season_pack": parsed.is_season_pack,
|
||||
"probe_used": probe_used,
|
||||
"probe_used": result.probe_used,
|
||||
"confidence": result.report.confidence,
|
||||
"road": result.report.road,
|
||||
"recommended_action": result.recommended_action,
|
||||
}
|
||||
|
||||
|
||||
@@ -241,7 +256,7 @@ def probe_media(source_path: str) -> dict[str, Any]:
|
||||
"message": f"{source_path} does not exist",
|
||||
}
|
||||
|
||||
media_info = probe(path)
|
||||
media_info = _PROBER.probe(path)
|
||||
if media_info is None:
|
||||
return {
|
||||
"status": "error",
|
||||
@@ -285,14 +300,6 @@ def probe_media(source_path: str) -> dict[str, Any]:
|
||||
}
|
||||
|
||||
|
||||
def list_folder(folder_type: str, path: str = ".") -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/list_folder.yaml."""
|
||||
file_manager = FileManager()
|
||||
use_case = ListFolderUseCase(file_manager)
|
||||
response = use_case.execute(folder_type, path)
|
||||
return response.to_dict()
|
||||
|
||||
|
||||
def read_release_metadata(release_path: str) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/read_release_metadata.yaml."""
|
||||
path = Path(release_path)
|
||||
@@ -3,7 +3,7 @@
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
from alfred.infrastructure.persistence import get_memory
|
||||
from alfred.infrastructure.persistence_TO_CHECK import get_memory
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
+3
@@ -80,3 +80,6 @@ returns:
|
||||
site_tag: Source-site tag if present.
|
||||
is_season_pack: True when the folder contains a full season.
|
||||
probe_used: True when ffprobe successfully enriched the result.
|
||||
confidence: Parser confidence score, 0–100 (higher = more reliable).
|
||||
road: "Parser road: 'easy' (group schema matched), 'shitty' (heuristic but acceptable), or 'path_of_pain' (low confidence — ask the user before auto-routing)."
|
||||
recommended_action: "Orchestrator hint: 'process' (go straight to resolve_*_destination), 'ask_user' (media_type unknown or road=path_of_pain — confirm with the user first), or 'skip' (no main video, or media_type=other — nothing to organize)."
|
||||
+11
@@ -61,6 +61,17 @@ parameters:
|
||||
one.
|
||||
example: Oz.1997.1080p.WEBRip.x265-KONTRAST
|
||||
|
||||
source_path:
|
||||
description: |
|
||||
Absolute path to the release folder on disk. Optional.
|
||||
why_needed: |
|
||||
When provided, the tool runs ffprobe on the main video inside the
|
||||
folder and uses the probe data to fill quality/codec tokens that
|
||||
may be missing from the release name. The enriched tech tokens
|
||||
end up in the destination folder name, so providing source_path
|
||||
gives more accurate names for releases with sparse metadata.
|
||||
example: /downloads/Oz.S03.1080p.WEBRip.x265-KONTRAST
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Paths resolved unambiguously; ready to move.
|
||||
+10
@@ -56,6 +56,16 @@ parameters:
|
||||
Forces the use case to use this exact folder name and skip detection.
|
||||
example: The.Wire.2002.1080p.BluRay.x265-GROUP
|
||||
|
||||
source_path:
|
||||
description: |
|
||||
Absolute path to the release folder on disk. Optional.
|
||||
why_needed: |
|
||||
When provided, the tool runs ffprobe on the main video inside the
|
||||
folder and uses probe data to fill quality/codec tokens that may
|
||||
be missing from the release name, producing a more accurate
|
||||
destination folder name.
|
||||
example: /downloads/The.Wire.S01-S05.1080p.BluRay.x265-GROUP
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Path resolved; ready to move the pack.
|
||||
@@ -9,9 +9,9 @@ to reason over the full set.
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
from alfred.infrastructure.persistence import get_memory
|
||||
from alfred.infrastructure.persistence_TO_CHECK import get_memory
|
||||
|
||||
from ..workflows import WorkflowLoader
|
||||
from ..workflows_TO_CHECK import WorkflowLoader
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
+1
-1
@@ -15,7 +15,7 @@ from alfred.agent.agent import Agent
|
||||
from alfred.agent.llm.deepseek import DeepSeekClient
|
||||
from alfred.agent.llm.exceptions import LLMAPIError, LLMConfigurationError
|
||||
from alfred.agent.llm.ollama import OllamaClient
|
||||
from alfred.infrastructure.persistence import get_memory, init_memory
|
||||
from alfred.infrastructure.persistence_TO_CHECK import get_memory, init_memory
|
||||
from alfred.settings import settings
|
||||
|
||||
logging.basicConfig(
|
||||
|
||||
@@ -0,0 +1,26 @@
|
||||
"""Application-layer exceptions shared across orchestrators.
|
||||
|
||||
Kept in a dedicated module (rather than inside each orchestrator's
|
||||
file) because the sync flows for TV shows and movies raise structurally
|
||||
identical "not found in library" errors — pulling them out makes the
|
||||
shared semantics explicit and avoids cross-imports between the
|
||||
``tv_shows`` and ``movies`` packages.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
|
||||
class ShowNotFoundInLibrary(LookupError):
|
||||
"""Raised when no on-disk TV show carries the requested ``tmdb_id``.
|
||||
|
||||
The sync orchestrator raises this when both the library index and
|
||||
the per-show release repository return ``None`` for a lookup —
|
||||
there is nothing on disk to refresh TMDB facts against.
|
||||
"""
|
||||
|
||||
|
||||
class MovieNotFoundInLibrary(LookupError):
|
||||
"""Raised when no on-disk movie carries the requested ``tmdb_id``.
|
||||
|
||||
Symmetric to :class:`ShowNotFoundInLibrary` for the movies library.
|
||||
"""
|
||||
@@ -1,47 +1,42 @@
|
||||
"""Filesystem use cases."""
|
||||
"""Filesystem application layer — 5 atomic use cases as free functions.
|
||||
|
||||
from .create_seed_links import CreateSeedLinksUseCase
|
||||
Each use case:
|
||||
- accepts :class:`pathlib.Path` inputs plus a :class:`DirectoryRoots` VO,
|
||||
- guards inputs against escaping configured roots,
|
||||
- calls the matching infra op,
|
||||
- catches :class:`~alfred.infrastructure.filesystem.FilesystemError` and
|
||||
returns a frozen DTO with a normalized error code.
|
||||
|
||||
No global state, no ``get_memory()``. Roots are injected.
|
||||
"""
|
||||
|
||||
from .create_dir import create_dir_use_case
|
||||
from .directory_roots import DirectoryRoots
|
||||
from .dto import (
|
||||
CreateSeedLinksResponse,
|
||||
ListFolderResponse,
|
||||
ManageSubtitlesResponse,
|
||||
MoveMediaResponse,
|
||||
PlacedSubtitle,
|
||||
SetFolderPathResponse,
|
||||
CreateDirResponse,
|
||||
LinkFileResponse,
|
||||
ListDirResponse,
|
||||
MoveDirResponse,
|
||||
MoveFileResponse,
|
||||
)
|
||||
from .list_folder import ListFolderUseCase
|
||||
from .manage_subtitles import ManageSubtitlesUseCase
|
||||
from .move_media import MoveMediaUseCase
|
||||
from .resolve_destination import (
|
||||
ResolvedEpisodeDestination,
|
||||
ResolvedMovieDestination,
|
||||
ResolvedSeasonDestination,
|
||||
ResolvedSeriesDestination,
|
||||
resolve_episode_destination,
|
||||
resolve_movie_destination,
|
||||
resolve_season_destination,
|
||||
resolve_series_destination,
|
||||
)
|
||||
from .set_folder_path import SetFolderPathUseCase
|
||||
from .link_file import link_file_use_case
|
||||
from .list_dir import list_dir_use_case
|
||||
from .move_dir import move_dir_use_case
|
||||
from .move_file import move_file_use_case
|
||||
|
||||
__all__ = [
|
||||
"SetFolderPathUseCase",
|
||||
"ListFolderUseCase",
|
||||
"CreateSeedLinksUseCase",
|
||||
"MoveMediaUseCase",
|
||||
"ManageSubtitlesUseCase",
|
||||
"ResolvedSeasonDestination",
|
||||
"ResolvedEpisodeDestination",
|
||||
"ResolvedMovieDestination",
|
||||
"ResolvedSeriesDestination",
|
||||
"resolve_season_destination",
|
||||
"resolve_episode_destination",
|
||||
"resolve_movie_destination",
|
||||
"resolve_series_destination",
|
||||
"SetFolderPathResponse",
|
||||
"ListFolderResponse",
|
||||
"CreateSeedLinksResponse",
|
||||
"MoveMediaResponse",
|
||||
"ManageSubtitlesResponse",
|
||||
"PlacedSubtitle",
|
||||
# use cases
|
||||
"list_dir_use_case",
|
||||
"create_dir_use_case",
|
||||
"link_file_use_case",
|
||||
"move_file_use_case",
|
||||
"move_dir_use_case",
|
||||
# VO
|
||||
"DirectoryRoots",
|
||||
# DTOs
|
||||
"ListDirResponse",
|
||||
"CreateDirResponse",
|
||||
"LinkFileResponse",
|
||||
"MoveFileResponse",
|
||||
"MoveDirResponse",
|
||||
]
|
||||
|
||||
@@ -0,0 +1,41 @@
|
||||
"""Internal helpers: mapping infra exceptions → error codes.
|
||||
|
||||
Kept private (``_errors``) — only the 5 use cases in this package use
|
||||
it. Centralizes the exception → code translation so every use case
|
||||
returns consistent error payloads.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from alfred.infrastructure.filesystem import (
|
||||
CrossDevice,
|
||||
DestinationExists,
|
||||
FilesystemError,
|
||||
FilesystemOSError,
|
||||
NotADirectory,
|
||||
NotAFile,
|
||||
PermissionDenied,
|
||||
SourceNotFound,
|
||||
)
|
||||
|
||||
# Application-layer error codes (guard violations, not infra).
|
||||
PATH_NOT_ALLOWED = "path_not_allowed"
|
||||
|
||||
|
||||
def code_for(exc: FilesystemError) -> str:
|
||||
"""Return the snake-case error code for an infra exception."""
|
||||
if isinstance(exc, SourceNotFound):
|
||||
return "source_not_found"
|
||||
if isinstance(exc, DestinationExists):
|
||||
return "destination_exists"
|
||||
if isinstance(exc, NotADirectory):
|
||||
return "not_a_directory"
|
||||
if isinstance(exc, NotAFile):
|
||||
return "not_a_file"
|
||||
if isinstance(exc, PermissionDenied):
|
||||
return "permission_denied"
|
||||
if isinstance(exc, CrossDevice):
|
||||
return "cross_device"
|
||||
if isinstance(exc, FilesystemOSError):
|
||||
return "filesystem_os_error"
|
||||
return "filesystem_error"
|
||||
@@ -0,0 +1,33 @@
|
||||
"""create_dir use case — create a directory under one of the configured roots."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
from alfred.infrastructure.filesystem import FilesystemError, create_dir
|
||||
|
||||
from ._errors import PATH_NOT_ALLOWED, code_for
|
||||
from .directory_roots import DirectoryRoots
|
||||
from .dto import CreateDirResponse
|
||||
|
||||
|
||||
def create_dir_use_case(path: Path, roots: DirectoryRoots) -> CreateDirResponse:
|
||||
"""Create directory ``path`` (and any missing parents) provided it
|
||||
lives under one of the configured roots.
|
||||
|
||||
Idempotent on the infra side: re-running on an existing directory
|
||||
returns ``status="ok"``.
|
||||
"""
|
||||
if not roots.contains(path):
|
||||
return CreateDirResponse(
|
||||
status="error",
|
||||
error=PATH_NOT_ALLOWED,
|
||||
message=f"Path is outside configured roots: {path}",
|
||||
)
|
||||
|
||||
try:
|
||||
create_dir(path)
|
||||
except FilesystemError as e:
|
||||
return CreateDirResponse(status="error", error=code_for(e), message=str(e))
|
||||
|
||||
return CreateDirResponse(status="ok", path=path)
|
||||
+1
-1
@@ -3,7 +3,7 @@
|
||||
import logging
|
||||
|
||||
from alfred.infrastructure.filesystem import FileManager
|
||||
from alfred.infrastructure.persistence import get_memory
|
||||
from alfred.infrastructure.persistence_TO_CHECK import get_memory
|
||||
|
||||
from .dto import CreateSeedLinksResponse
|
||||
|
||||
@@ -0,0 +1,56 @@
|
||||
"""DirectoryRoots — VO carrying the configured filesystem roots.
|
||||
|
||||
Replaces the ad-hoc ``get_memory().ltm.workspace.<x>`` lookups that were
|
||||
sprinkled across the filesystem use cases. By making roots an explicit
|
||||
input, use cases become pure (no global state read) and easy to test.
|
||||
|
||||
The roots are read once at the tool wrapper boundary (where the agent
|
||||
config lives) and threaded through the use cases.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class DirectoryRoots:
|
||||
"""Configured roots of Alfred's filesystem.
|
||||
|
||||
All paths must be absolute and existing directories — validation is
|
||||
expected at the boundary that builds this VO.
|
||||
|
||||
Attributes:
|
||||
downloads: where qBittorrent drops finished torrents.
|
||||
torrents: where seeding hard-links live (mirrors downloads/).
|
||||
movies: library root for movies.
|
||||
tv_shows: library root for TV shows.
|
||||
"""
|
||||
|
||||
downloads: Path
|
||||
torrents: Path
|
||||
movies: Path
|
||||
tv_shows: Path
|
||||
|
||||
def all(self) -> tuple[Path, ...]:
|
||||
"""Return every configured root, in declaration order."""
|
||||
return (self.downloads, self.torrents, self.movies, self.tv_shows)
|
||||
|
||||
def contains(self, path: Path) -> bool:
|
||||
"""Return True if ``path`` is inside one of the configured roots.
|
||||
|
||||
Uses ``Path.resolve()`` to handle symlinks and ``..`` segments,
|
||||
then ``relative_to`` for an exact within-root check.
|
||||
"""
|
||||
try:
|
||||
resolved = path.resolve()
|
||||
except OSError:
|
||||
return False
|
||||
for root in self.all():
|
||||
try:
|
||||
resolved.relative_to(root.resolve())
|
||||
return True
|
||||
except (ValueError, OSError):
|
||||
continue
|
||||
return False
|
||||
@@ -1,19 +1,28 @@
|
||||
"""Filesystem application DTOs."""
|
||||
"""DTOs for the 5 atomic filesystem use cases.
|
||||
|
||||
Each use case returns a small frozen dataclass tagged with a ``status``
|
||||
field. On error, ``error`` (machine-readable code) and ``message``
|
||||
(human-readable) are populated; on success, the relevant payload
|
||||
fields are.
|
||||
|
||||
Error codes mirror the infrastructure exception types (lowercased,
|
||||
snake-cased) — e.g. ``SourceNotFound`` → ``"source_not_found"`` — plus
|
||||
the application-layer ``"path_not_allowed"`` for guard violations.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
@dataclass
|
||||
class CopyMediaResponse:
|
||||
"""Response from copying a media file."""
|
||||
@dataclass(frozen=True)
|
||||
class ListDirResponse:
|
||||
"""Response from ``list_dir_use_case``."""
|
||||
|
||||
status: str
|
||||
source: str | None = None
|
||||
destination: str | None = None
|
||||
filename: str | None = None
|
||||
size: int | None = None
|
||||
status: str # "ok" | "error"
|
||||
path: Path | None = None
|
||||
entries: tuple[Path, ...] = ()
|
||||
error: str | None = None
|
||||
message: str | None = None
|
||||
|
||||
@@ -22,22 +31,33 @@ class CopyMediaResponse:
|
||||
return {"status": self.status, "error": self.error, "message": self.message}
|
||||
return {
|
||||
"status": self.status,
|
||||
"source": self.source,
|
||||
"destination": self.destination,
|
||||
"filename": self.filename,
|
||||
"size": self.size,
|
||||
"path": str(self.path) if self.path else None,
|
||||
"entries": [str(p) for p in self.entries],
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class MoveMediaResponse:
|
||||
"""Response from moving a media file."""
|
||||
@dataclass(frozen=True)
|
||||
class CreateDirResponse:
|
||||
"""Response from ``create_dir_use_case``."""
|
||||
|
||||
status: str
|
||||
source: str | None = None
|
||||
destination: str | None = None
|
||||
filename: str | None = None
|
||||
size: int | None = None
|
||||
path: Path | None = None
|
||||
error: str | None = None
|
||||
message: str | None = None
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
if self.error:
|
||||
return {"status": self.status, "error": self.error, "message": self.message}
|
||||
return {"status": self.status, "path": str(self.path) if self.path else None}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class LinkFileResponse:
|
||||
"""Response from ``link_file_use_case``."""
|
||||
|
||||
status: str
|
||||
source: Path | None = None
|
||||
destination: Path | None = None
|
||||
error: str | None = None
|
||||
message: str | None = None
|
||||
|
||||
@@ -46,125 +66,18 @@ class MoveMediaResponse:
|
||||
return {"status": self.status, "error": self.error, "message": self.message}
|
||||
return {
|
||||
"status": self.status,
|
||||
"source": self.source,
|
||||
"destination": self.destination,
|
||||
"filename": self.filename,
|
||||
"size": self.size,
|
||||
"source": str(self.source) if self.source else None,
|
||||
"destination": str(self.destination) if self.destination else None,
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class SetFolderPathResponse:
|
||||
"""Response from setting a folder path."""
|
||||
@dataclass(frozen=True)
|
||||
class MoveFileResponse:
|
||||
"""Response from ``move_file_use_case``."""
|
||||
|
||||
status: str
|
||||
folder_name: str | None = None
|
||||
path: str | None = None
|
||||
error: str | None = None
|
||||
message: str | None = None
|
||||
|
||||
def to_dict(self):
|
||||
"""Convert to dict for agent compatibility."""
|
||||
result = {"status": self.status}
|
||||
|
||||
if self.error:
|
||||
result["error"] = self.error
|
||||
result["message"] = self.message
|
||||
else:
|
||||
if self.folder_name:
|
||||
result["folder_name"] = self.folder_name
|
||||
if self.path:
|
||||
result["path"] = self.path
|
||||
|
||||
return result
|
||||
|
||||
|
||||
@dataclass
|
||||
class PlacedSubtitle:
|
||||
"""One subtitle file successfully placed."""
|
||||
|
||||
source: str
|
||||
destination: str
|
||||
filename: str
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return {
|
||||
"source": self.source,
|
||||
"destination": self.destination,
|
||||
"filename": self.filename,
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class UnresolvedTrack:
|
||||
"""A subtitle track that needs agent clarification before placement."""
|
||||
|
||||
raw_tokens: list[str]
|
||||
file_path: str | None = None
|
||||
file_size_kb: float | None = None
|
||||
reason: str = "" # "unknown_language" | "low_confidence"
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return {
|
||||
"raw_tokens": self.raw_tokens,
|
||||
"file_path": self.file_path,
|
||||
"file_size_kb": self.file_size_kb,
|
||||
"reason": self.reason,
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class AvailableSubtitle:
|
||||
"""One subtitle track available on an embedded media item."""
|
||||
|
||||
language: str # ISO 639-2 code
|
||||
subtitle_type: str # "standard" | "sdh" | "forced" | "unknown"
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return {"language": self.language, "type": self.subtitle_type}
|
||||
|
||||
|
||||
@dataclass
|
||||
class ManageSubtitlesResponse:
|
||||
"""Response from the manage_subtitles use case."""
|
||||
|
||||
status: str # "ok" | "needs_clarification" | "error"
|
||||
video_path: str | None = None
|
||||
placed: list[PlacedSubtitle] | None = None
|
||||
skipped_count: int = 0
|
||||
unresolved: list[UnresolvedTrack] | None = None
|
||||
available: list[AvailableSubtitle] | None = None # embedded tracks summary
|
||||
error: str | None = None
|
||||
message: str | None = None
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
if self.error:
|
||||
return {"status": self.status, "error": self.error, "message": self.message}
|
||||
result = {
|
||||
"status": self.status,
|
||||
"video_path": self.video_path,
|
||||
"placed": [p.to_dict() for p in (self.placed or [])],
|
||||
"placed_count": len(self.placed or []),
|
||||
"skipped_count": self.skipped_count,
|
||||
}
|
||||
if self.unresolved:
|
||||
result["unresolved"] = [u.to_dict() for u in self.unresolved]
|
||||
result["unresolved_count"] = len(self.unresolved)
|
||||
if self.available:
|
||||
result["available"] = [a.to_dict() for a in self.available]
|
||||
return result
|
||||
|
||||
|
||||
@dataclass
|
||||
class CreateSeedLinksResponse:
|
||||
"""Response from creating seed links for a torrent."""
|
||||
|
||||
status: str
|
||||
torrent_subfolder: str | None = None
|
||||
linked_file: str | None = None
|
||||
copied_files: list[str] | None = None
|
||||
copied_count: int = 0
|
||||
skipped: list[str] | None = None
|
||||
source: Path | None = None
|
||||
destination: Path | None = None
|
||||
error: str | None = None
|
||||
message: str | None = None
|
||||
|
||||
@@ -173,41 +86,26 @@ class CreateSeedLinksResponse:
|
||||
return {"status": self.status, "error": self.error, "message": self.message}
|
||||
return {
|
||||
"status": self.status,
|
||||
"torrent_subfolder": self.torrent_subfolder,
|
||||
"linked_file": self.linked_file,
|
||||
"copied_files": self.copied_files or [],
|
||||
"copied_count": self.copied_count,
|
||||
"skipped": self.skipped or [],
|
||||
"source": str(self.source) if self.source else None,
|
||||
"destination": str(self.destination) if self.destination else None,
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class ListFolderResponse:
|
||||
"""Response from listing a folder."""
|
||||
@dataclass(frozen=True)
|
||||
class MoveDirResponse:
|
||||
"""Response from ``move_dir_use_case``."""
|
||||
|
||||
status: str
|
||||
folder_type: str | None = None
|
||||
path: str | None = None
|
||||
entries: list[str] | None = None
|
||||
count: int | None = None
|
||||
source: Path | None = None
|
||||
destination: Path | None = None
|
||||
error: str | None = None
|
||||
message: str | None = None
|
||||
|
||||
def to_dict(self):
|
||||
"""Convert to dict for agent compatibility."""
|
||||
result = {"status": self.status}
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
if self.error:
|
||||
result["error"] = self.error
|
||||
result["message"] = self.message
|
||||
else:
|
||||
if self.folder_type:
|
||||
result["folder_type"] = self.folder_type
|
||||
if self.path:
|
||||
result["path"] = self.path
|
||||
if self.entries is not None:
|
||||
result["entries"] = self.entries
|
||||
if self.count is not None:
|
||||
result["count"] = self.count
|
||||
|
||||
return result
|
||||
return {"status": self.status, "error": self.error, "message": self.message}
|
||||
return {
|
||||
"status": self.status,
|
||||
"source": str(self.source) if self.source else None,
|
||||
"destination": str(self.destination) if self.destination else None,
|
||||
}
|
||||
|
||||
@@ -0,0 +1,188 @@
|
||||
"""Filesystem application DTOs."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
|
||||
|
||||
@dataclass
|
||||
class CopyMediaResponse:
|
||||
"""Response from copying a media file."""
|
||||
|
||||
status: str
|
||||
source: str | None = None
|
||||
destination: str | None = None
|
||||
filename: str | None = None
|
||||
size: int | None = None
|
||||
error: str | None = None
|
||||
message: str | None = None
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
if self.error:
|
||||
return {"status": self.status, "error": self.error, "message": self.message}
|
||||
return {
|
||||
"status": self.status,
|
||||
"source": self.source,
|
||||
"destination": self.destination,
|
||||
"filename": self.filename,
|
||||
"size": self.size,
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class MoveMediaResponse:
|
||||
"""Response from moving a media file."""
|
||||
|
||||
status: str
|
||||
source: str | None = None
|
||||
destination: str | None = None
|
||||
filename: str | None = None
|
||||
size: int | None = None
|
||||
error: str | None = None
|
||||
message: str | None = None
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
if self.error:
|
||||
return {"status": self.status, "error": self.error, "message": self.message}
|
||||
return {
|
||||
"status": self.status,
|
||||
"source": self.source,
|
||||
"destination": self.destination,
|
||||
"filename": self.filename,
|
||||
"size": self.size,
|
||||
}
|
||||
|
||||
|
||||
|
||||
@dataclass
|
||||
class PlacedSubtitle:
|
||||
"""One subtitle file successfully placed."""
|
||||
|
||||
source: str
|
||||
destination: str
|
||||
filename: str
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return {
|
||||
"source": self.source,
|
||||
"destination": self.destination,
|
||||
"filename": self.filename,
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class UnresolvedTrack:
|
||||
"""A subtitle track that needs agent clarification before placement."""
|
||||
|
||||
raw_tokens: list[str]
|
||||
file_path: str | None = None
|
||||
file_size_kb: float | None = None
|
||||
reason: str = "" # "unknown_language" | "low_confidence"
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return {
|
||||
"raw_tokens": self.raw_tokens,
|
||||
"file_path": self.file_path,
|
||||
"file_size_kb": self.file_size_kb,
|
||||
"reason": self.reason,
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class AvailableSubtitle:
|
||||
"""One subtitle track available on an embedded media item."""
|
||||
|
||||
language: str # ISO 639-2 code
|
||||
subtitle_type: str # "standard" | "sdh" | "forced" | "unknown"
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return {"language": self.language, "type": self.subtitle_type}
|
||||
|
||||
|
||||
@dataclass
|
||||
class ManageSubtitlesResponse:
|
||||
"""Response from the manage_subtitles use case."""
|
||||
|
||||
status: str # "ok" | "needs_clarification" | "error"
|
||||
video_path: str | None = None
|
||||
placed: list[PlacedSubtitle] | None = None
|
||||
skipped_count: int = 0
|
||||
unresolved: list[UnresolvedTrack] | None = None
|
||||
available: list[AvailableSubtitle] | None = None # embedded tracks summary
|
||||
error: str | None = None
|
||||
message: str | None = None
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
if self.error:
|
||||
return {"status": self.status, "error": self.error, "message": self.message}
|
||||
result = {
|
||||
"status": self.status,
|
||||
"video_path": self.video_path,
|
||||
"placed": [p.to_dict() for p in (self.placed or [])],
|
||||
"placed_count": len(self.placed or []),
|
||||
"skipped_count": self.skipped_count,
|
||||
}
|
||||
if self.unresolved:
|
||||
result["unresolved"] = [u.to_dict() for u in self.unresolved]
|
||||
result["unresolved_count"] = len(self.unresolved)
|
||||
if self.available:
|
||||
result["available"] = [a.to_dict() for a in self.available]
|
||||
return result
|
||||
|
||||
|
||||
@dataclass
|
||||
class CreateSeedLinksResponse:
|
||||
"""Response from creating seed links for a torrent."""
|
||||
|
||||
status: str
|
||||
torrent_subfolder: str | None = None
|
||||
linked_file: str | None = None
|
||||
copied_files: list[str] | None = None
|
||||
copied_count: int = 0
|
||||
skipped: list[str] | None = None
|
||||
error: str | None = None
|
||||
message: str | None = None
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
if self.error:
|
||||
return {"status": self.status, "error": self.error, "message": self.message}
|
||||
return {
|
||||
"status": self.status,
|
||||
"torrent_subfolder": self.torrent_subfolder,
|
||||
"linked_file": self.linked_file,
|
||||
"copied_files": self.copied_files or [],
|
||||
"copied_count": self.copied_count,
|
||||
"skipped": self.skipped or [],
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class ListFolderResponse:
|
||||
"""Response from listing a folder."""
|
||||
|
||||
status: str
|
||||
folder_type: str | None = None # SHOULD BE A PROPERTY
|
||||
path: str | None = None # NOT NONE - Should be path
|
||||
entries: list[str] | None = None # NOT NONE - Empty list of path
|
||||
count: int | None = None # USELESS
|
||||
error: str | None = None
|
||||
message: str | None = None
|
||||
|
||||
def to_dict(self):
|
||||
"""Convert to dict for agent compatibility."""
|
||||
result = {"status": self.status}
|
||||
|
||||
if self.error:
|
||||
result["error"] = self.error
|
||||
result["message"] = self.message
|
||||
else:
|
||||
if self.folder_type:
|
||||
result["folder_type"] = self.folder_type
|
||||
if self.path:
|
||||
result["path"] = self.path
|
||||
if self.entries is not None:
|
||||
result["entries"] = self.entries
|
||||
if self.count is not None:
|
||||
result["count"] = self.count
|
||||
|
||||
return result
|
||||
@@ -1,82 +0,0 @@
|
||||
"""enrich_from_probe — fill missing ParsedRelease fields from MediaInfo."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from alfred.domain.release.value_objects import ParsedRelease
|
||||
from alfred.domain.shared.media import MediaInfo
|
||||
|
||||
# Map ffprobe codec names to scene-style codec tokens
|
||||
_VIDEO_CODEC_MAP = {
|
||||
"hevc": "x265",
|
||||
"h264": "x264",
|
||||
"h265": "x265",
|
||||
"av1": "AV1",
|
||||
"vp9": "VP9",
|
||||
"mpeg4": "XviD",
|
||||
}
|
||||
|
||||
# Map ffprobe audio codec names to scene-style tokens
|
||||
_AUDIO_CODEC_MAP = {
|
||||
"eac3": "EAC3",
|
||||
"ac3": "AC3",
|
||||
"dts": "DTS",
|
||||
"truehd": "TrueHD",
|
||||
"aac": "AAC",
|
||||
"flac": "FLAC",
|
||||
"opus": "OPUS",
|
||||
"mp3": "MP3",
|
||||
"pcm_s16l": "PCM",
|
||||
"pcm_s24l": "PCM",
|
||||
}
|
||||
|
||||
# Map channel count to standard layout string
|
||||
_CHANNEL_MAP = {
|
||||
8: "7.1",
|
||||
6: "5.1",
|
||||
2: "2.0",
|
||||
1: "1.0",
|
||||
}
|
||||
|
||||
|
||||
def enrich_from_probe(parsed: ParsedRelease, info: MediaInfo) -> None:
|
||||
"""
|
||||
Fill None fields in parsed using data from ffprobe MediaInfo.
|
||||
|
||||
Only overwrites fields that are currently None — token-level values
|
||||
from the release name always take priority.
|
||||
Mutates parsed in place.
|
||||
"""
|
||||
if parsed.quality is None and info.resolution:
|
||||
parsed.quality = info.resolution
|
||||
|
||||
if parsed.codec is None and info.video_codec:
|
||||
parsed.codec = _VIDEO_CODEC_MAP.get(
|
||||
info.video_codec.lower(), info.video_codec.upper()
|
||||
)
|
||||
|
||||
if parsed.bit_depth is None and info.video_codec:
|
||||
# ffprobe exposes bit depth via pix_fmt — not in MediaInfo yet, skip for now
|
||||
pass
|
||||
|
||||
# Audio — use the default track, fallback to first
|
||||
default_track = next((t for t in info.audio_tracks if t.is_default), None)
|
||||
track = default_track or (info.audio_tracks[0] if info.audio_tracks else None)
|
||||
|
||||
if track:
|
||||
if parsed.audio_codec is None and track.codec:
|
||||
parsed.audio_codec = _AUDIO_CODEC_MAP.get(
|
||||
track.codec.lower(), track.codec.upper()
|
||||
)
|
||||
|
||||
if parsed.audio_channels is None and track.channels:
|
||||
parsed.audio_channels = _CHANNEL_MAP.get(
|
||||
track.channels, f"{track.channels}ch"
|
||||
)
|
||||
|
||||
# Languages — merge ffprobe languages with token-level ones
|
||||
# "und" = undetermined, not useful
|
||||
if info.audio_languages:
|
||||
existing = set(parsed.languages)
|
||||
for lang in info.audio_languages:
|
||||
if lang.lower() != "und" and lang.upper() not in existing:
|
||||
parsed.languages.append(lang)
|
||||
@@ -0,0 +1,40 @@
|
||||
"""link_file use case — hard-link a file from one root to another."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
from alfred.infrastructure.filesystem import FilesystemError, link_file
|
||||
|
||||
from ._errors import PATH_NOT_ALLOWED, code_for
|
||||
from .directory_roots import DirectoryRoots
|
||||
from .dto import LinkFileResponse
|
||||
|
||||
|
||||
def link_file_use_case(
|
||||
src: Path, dst: Path, roots: DirectoryRoots
|
||||
) -> LinkFileResponse:
|
||||
"""Hard-link ``src`` to ``dst``. Both must be under configured roots.
|
||||
|
||||
The destination parent must already exist — the caller is expected
|
||||
to have created it via ``create_dir_use_case`` if needed.
|
||||
"""
|
||||
if not roots.contains(src):
|
||||
return LinkFileResponse(
|
||||
status="error",
|
||||
error=PATH_NOT_ALLOWED,
|
||||
message=f"Source is outside configured roots: {src}",
|
||||
)
|
||||
if not roots.contains(dst):
|
||||
return LinkFileResponse(
|
||||
status="error",
|
||||
error=PATH_NOT_ALLOWED,
|
||||
message=f"Destination is outside configured roots: {dst}",
|
||||
)
|
||||
|
||||
try:
|
||||
link_file(src, dst)
|
||||
except FilesystemError as e:
|
||||
return LinkFileResponse(status="error", error=code_for(e), message=str(e))
|
||||
|
||||
return LinkFileResponse(status="ok", source=src, destination=dst)
|
||||
@@ -0,0 +1,34 @@
|
||||
"""list_dir use case — list a directory after guarding it within roots."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
from alfred.infrastructure.filesystem import FilesystemError, list_dir
|
||||
|
||||
from ._errors import PATH_NOT_ALLOWED, code_for
|
||||
from .directory_roots import DirectoryRoots
|
||||
from .dto import ListDirResponse
|
||||
|
||||
|
||||
def list_dir_use_case(path: Path, roots: DirectoryRoots) -> ListDirResponse:
|
||||
"""List the immediate children of ``path`` if it lives under one of
|
||||
the configured roots.
|
||||
|
||||
Returns a :class:`ListDirResponse`. On guard failure, status is
|
||||
``"error"`` with ``error="path_not_allowed"``. On infra failure,
|
||||
status is ``"error"`` with a code mapped from the raised exception.
|
||||
"""
|
||||
if not roots.contains(path):
|
||||
return ListDirResponse(
|
||||
status="error",
|
||||
error=PATH_NOT_ALLOWED,
|
||||
message=f"Path is outside configured roots: {path}",
|
||||
)
|
||||
|
||||
try:
|
||||
entries = list_dir(path)
|
||||
except FilesystemError as e:
|
||||
return ListDirResponse(status="error", error=code_for(e), message=str(e))
|
||||
|
||||
return ListDirResponse(status="ok", path=path, entries=tuple(entries))
|
||||
+18
-18
@@ -3,25 +3,25 @@
|
||||
import logging
|
||||
from pathlib import Path
|
||||
|
||||
from alfred.domain.shared.value_objects import ImdbId
|
||||
from alfred.domain.subtitles.entities import SubtitleCandidate
|
||||
from alfred.domain.subtitles.services.identifier import SubtitleIdentifier
|
||||
from alfred.domain.subtitles.services.matcher import SubtitleMatcher
|
||||
from alfred.domain.subtitles.services.pattern_detector import PatternDetector
|
||||
from alfred.application.subtitles.placer import (
|
||||
from alfred.application.subtitles_TO_CHECK.placer import (
|
||||
PlacedTrack,
|
||||
SubtitlePlacer,
|
||||
_build_dest_name,
|
||||
)
|
||||
from alfred.domain.subtitles.services.utils import available_subtitles
|
||||
from alfred.domain.subtitles.value_objects import ScanStrategy
|
||||
from alfred.domain.shared_TO_CHECK.value_objects import ImdbId
|
||||
from alfred.domain.subtitles_TO_CHECK.entities import SubtitleScanResult
|
||||
from alfred.domain.subtitles_TO_CHECK.services.identifier import SubtitleIdentifier
|
||||
from alfred.domain.subtitles_TO_CHECK.services.matcher import SubtitleMatcher
|
||||
from alfred.domain.subtitles_TO_CHECK.services.pattern_detector import PatternDetector
|
||||
from alfred.domain.subtitles_TO_CHECK.services.utils import available_subtitles
|
||||
from alfred.domain.subtitles_TO_CHECK.value_objects import ScanStrategy
|
||||
from alfred.infrastructure.filesystem.scanner import PathlibFilesystemScanner
|
||||
from alfred.infrastructure.knowledge.subtitles.base import SubtitleKnowledgeBase
|
||||
from alfred.infrastructure.knowledge.subtitles.loader import KnowledgeLoader
|
||||
from alfred.infrastructure.persistence.context import get_memory
|
||||
from alfred.infrastructure.probe.ffprobe_prober import FfprobeMediaProber
|
||||
from alfred.infrastructure.subtitle.metadata_store import SubtitleMetadataStore
|
||||
from alfred.infrastructure.subtitle.rule_repository import RuleSetRepository
|
||||
from alfred.infrastructure.knowledge_TO_CHECK.subtitles.base import SubtitleKnowledgeBase
|
||||
from alfred.infrastructure.knowledge_TO_CHECK.subtitles.loader import KnowledgeLoader
|
||||
from alfred.infrastructure.persistence_TO_CHECK.context import get_memory
|
||||
from alfred.infrastructure.probe_TO_CHECK.ffprobe_prober import FfprobeMediaProber
|
||||
from alfred.infrastructure.subtitle_TO_CHECK.metadata_store import SubtitleMetadataStore
|
||||
from alfred.infrastructure.subtitle_TO_CHECK.rule_repository import RuleSetRepository
|
||||
|
||||
from .dto import (
|
||||
AvailableSubtitle,
|
||||
@@ -278,7 +278,7 @@ class ManageSubtitlesUseCase:
|
||||
|
||||
|
||||
def _to_unresolved_dto(
|
||||
track: SubtitleCandidate, min_confidence: float = 0.7
|
||||
track: SubtitleScanResult, min_confidence: float = 0.7
|
||||
) -> UnresolvedTrack:
|
||||
reason = "unknown_language" if track.language is None else "low_confidence"
|
||||
return UnresolvedTrack(
|
||||
@@ -291,10 +291,10 @@ def _to_unresolved_dto(
|
||||
|
||||
def _pair_placed_with_tracks(
|
||||
placed: list[PlacedTrack],
|
||||
tracks: list[SubtitleCandidate],
|
||||
) -> list[tuple[PlacedTrack, SubtitleCandidate]]:
|
||||
tracks: list[SubtitleScanResult],
|
||||
) -> list[tuple[PlacedTrack, SubtitleScanResult]]:
|
||||
"""
|
||||
Pair each PlacedTrack with its originating SubtitleCandidate by source path.
|
||||
Pair each PlacedTrack with its originating SubtitleScanResult by source path.
|
||||
Falls back to positional matching if paths don't align.
|
||||
"""
|
||||
track_by_path = {t.file_path: t for t in tracks if t.file_path}
|
||||
@@ -0,0 +1,36 @@
|
||||
"""move_dir use case — move a directory tree between configured roots."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
from alfred.infrastructure.filesystem import FilesystemError, move_dir
|
||||
|
||||
from ._errors import PATH_NOT_ALLOWED, code_for
|
||||
from .directory_roots import DirectoryRoots
|
||||
from .dto import MoveDirResponse
|
||||
|
||||
|
||||
def move_dir_use_case(
|
||||
src: Path, dst: Path, roots: DirectoryRoots
|
||||
) -> MoveDirResponse:
|
||||
"""Move directory ``src`` to ``dst``. Both must be under configured roots."""
|
||||
if not roots.contains(src):
|
||||
return MoveDirResponse(
|
||||
status="error",
|
||||
error=PATH_NOT_ALLOWED,
|
||||
message=f"Source is outside configured roots: {src}",
|
||||
)
|
||||
if not roots.contains(dst):
|
||||
return MoveDirResponse(
|
||||
status="error",
|
||||
error=PATH_NOT_ALLOWED,
|
||||
message=f"Destination is outside configured roots: {dst}",
|
||||
)
|
||||
|
||||
try:
|
||||
move_dir(src, dst)
|
||||
except FilesystemError as e:
|
||||
return MoveDirResponse(status="error", error=code_for(e), message=str(e))
|
||||
|
||||
return MoveDirResponse(status="ok", source=src, destination=dst)
|
||||
@@ -0,0 +1,36 @@
|
||||
"""move_file use case — move a file between configured roots."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
from alfred.infrastructure.filesystem import FilesystemError, move_file
|
||||
|
||||
from ._errors import PATH_NOT_ALLOWED, code_for
|
||||
from .directory_roots import DirectoryRoots
|
||||
from .dto import MoveFileResponse
|
||||
|
||||
|
||||
def move_file_use_case(
|
||||
src: Path, dst: Path, roots: DirectoryRoots
|
||||
) -> MoveFileResponse:
|
||||
"""Move file ``src`` to ``dst``. Both must be under configured roots."""
|
||||
if not roots.contains(src):
|
||||
return MoveFileResponse(
|
||||
status="error",
|
||||
error=PATH_NOT_ALLOWED,
|
||||
message=f"Source is outside configured roots: {src}",
|
||||
)
|
||||
if not roots.contains(dst):
|
||||
return MoveFileResponse(
|
||||
status="error",
|
||||
error=PATH_NOT_ALLOWED,
|
||||
message=f"Destination is outside configured roots: {dst}",
|
||||
)
|
||||
|
||||
try:
|
||||
move_file(src, dst)
|
||||
except FilesystemError as e:
|
||||
return MoveFileResponse(status="error", error=code_for(e), message=str(e))
|
||||
|
||||
return MoveFileResponse(status="ok", source=src, destination=dst)
|
||||
+55
-15
@@ -22,16 +22,35 @@ import logging
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
|
||||
from alfred.application.release_TO_CHECK import inspect_release
|
||||
from alfred.domain.release import parse_release
|
||||
from alfred.domain.release.ports import ReleaseKnowledge
|
||||
from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
|
||||
from alfred.infrastructure.persistence import get_memory
|
||||
from alfred.domain.releases_TO_CHECK.ports import ReleaseKnowledge
|
||||
from alfred.domain.release.value_objects import ParsedRelease
|
||||
from alfred.domain.shared_TO_CHECK.ports import MediaProber
|
||||
from alfred.infrastructure.persistence_TO_CHECK import get_memory
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Single module-level knowledge instance. YAML is loaded once at first import.
|
||||
# Tests that need a custom KB can monkeypatch this attribute.
|
||||
_KB: ReleaseKnowledge = YamlReleaseKnowledge()
|
||||
|
||||
def _resolve_parsed(
|
||||
release_name: str,
|
||||
source_path: str | None,
|
||||
kb: ReleaseKnowledge,
|
||||
prober: MediaProber,
|
||||
) -> ParsedRelease:
|
||||
"""Pick the right entry point depending on whether we have a path.
|
||||
|
||||
When ``source_path`` is provided and points to something that exists,
|
||||
we run the full inspection pipeline so probe data can refresh tech
|
||||
fields (which feed every filename builder). Otherwise we fall back
|
||||
to a parse-only path — same behavior as before.
|
||||
"""
|
||||
if source_path:
|
||||
path = Path(source_path)
|
||||
if path.exists():
|
||||
return inspect_release(release_name, path, kb, prober).parsed
|
||||
parsed, _ = parse_release(release_name, kb)
|
||||
return parsed
|
||||
|
||||
|
||||
def _find_existing_tvshow_folders(
|
||||
@@ -236,13 +255,20 @@ def resolve_season_destination(
|
||||
release_name: str,
|
||||
tmdb_title: str,
|
||||
tmdb_year: int,
|
||||
kb: ReleaseKnowledge,
|
||||
prober: MediaProber,
|
||||
confirmed_folder: str | None = None,
|
||||
source_path: str | None = None,
|
||||
) -> ResolvedSeasonDestination:
|
||||
"""
|
||||
Compute destination paths for a season pack.
|
||||
|
||||
Returns series_folder + season_folder. No file paths — the whole
|
||||
source folder is moved as-is into season_folder.
|
||||
|
||||
When ``source_path`` points to the release on disk, the parser is
|
||||
augmented with ffprobe data so tech tokens missing from the release
|
||||
name (quality / codec) end up in the folder names.
|
||||
"""
|
||||
tv_root = _get_tv_root()
|
||||
if not tv_root:
|
||||
@@ -252,8 +278,8 @@ def resolve_season_destination(
|
||||
message="TV show library path is not configured.",
|
||||
)
|
||||
|
||||
parsed = parse_release(release_name, _KB)
|
||||
tmdb_title_safe = _KB.sanitize_for_fs(tmdb_title)
|
||||
parsed = _resolve_parsed(release_name, source_path, kb, prober)
|
||||
tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
|
||||
computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)
|
||||
|
||||
resolved = _resolve_series_folder(
|
||||
@@ -286,6 +312,8 @@ def resolve_episode_destination(
|
||||
source_file: str,
|
||||
tmdb_title: str,
|
||||
tmdb_year: int,
|
||||
kb: ReleaseKnowledge,
|
||||
prober: MediaProber,
|
||||
tmdb_episode_title: str | None = None,
|
||||
confirmed_folder: str | None = None,
|
||||
) -> ResolvedEpisodeDestination:
|
||||
@@ -293,6 +321,8 @@ def resolve_episode_destination(
|
||||
Compute destination paths for a single episode file.
|
||||
|
||||
Returns series_folder + season_folder + library_file (full path to .mkv).
|
||||
``source_file`` doubles as the inspection target — when it exists,
|
||||
ffprobe enrichment refreshes tech tokens missing from the release name.
|
||||
"""
|
||||
tv_root = _get_tv_root()
|
||||
if not tv_root:
|
||||
@@ -302,11 +332,11 @@ def resolve_episode_destination(
|
||||
message="TV show library path is not configured.",
|
||||
)
|
||||
|
||||
parsed = parse_release(release_name, _KB)
|
||||
parsed = _resolve_parsed(release_name, source_file, kb, prober)
|
||||
ext = Path(source_file).suffix
|
||||
tmdb_title_safe = _KB.sanitize_for_fs(tmdb_title)
|
||||
tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
|
||||
tmdb_episode_title_safe = (
|
||||
_KB.sanitize_for_fs(tmdb_episode_title) if tmdb_episode_title else None
|
||||
kb.sanitize_for_fs(tmdb_episode_title) if tmdb_episode_title else None
|
||||
)
|
||||
computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)
|
||||
|
||||
@@ -345,11 +375,15 @@ def resolve_movie_destination(
|
||||
source_file: str,
|
||||
tmdb_title: str,
|
||||
tmdb_year: int,
|
||||
kb: ReleaseKnowledge,
|
||||
prober: MediaProber,
|
||||
) -> ResolvedMovieDestination:
|
||||
"""
|
||||
Compute destination paths for a movie file.
|
||||
|
||||
Returns movie_folder + library_file (full path to .mkv).
|
||||
``source_file`` doubles as the inspection target — when it exists,
|
||||
ffprobe enrichment refreshes tech tokens missing from the release name.
|
||||
"""
|
||||
memory = get_memory()
|
||||
movies_root = memory.ltm.library_paths.get("movie")
|
||||
@@ -360,9 +394,9 @@ def resolve_movie_destination(
|
||||
message="Movie library path is not configured.",
|
||||
)
|
||||
|
||||
parsed = parse_release(release_name, _KB)
|
||||
parsed = _resolve_parsed(release_name, source_file, kb, prober)
|
||||
ext = Path(source_file).suffix
|
||||
tmdb_title_safe = _KB.sanitize_for_fs(tmdb_title)
|
||||
tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
|
||||
|
||||
folder_name = parsed.movie_folder_name(tmdb_title_safe, tmdb_year)
|
||||
filename = parsed.movie_filename(tmdb_title_safe, tmdb_year, ext)
|
||||
@@ -384,12 +418,18 @@ def resolve_series_destination(
|
||||
release_name: str,
|
||||
tmdb_title: str,
|
||||
tmdb_year: int,
|
||||
kb: ReleaseKnowledge,
|
||||
prober: MediaProber,
|
||||
confirmed_folder: str | None = None,
|
||||
source_path: str | None = None,
|
||||
) -> ResolvedSeriesDestination:
|
||||
"""
|
||||
Compute destination path for a complete multi-season series pack.
|
||||
|
||||
Returns only series_folder — the whole pack lands directly inside it.
|
||||
|
||||
When ``source_path`` points to the release on disk, ffprobe
|
||||
enrichment refreshes tech tokens missing from the release name.
|
||||
"""
|
||||
tv_root = _get_tv_root()
|
||||
if not tv_root:
|
||||
@@ -399,8 +439,8 @@ def resolve_series_destination(
|
||||
message="TV show library path is not configured.",
|
||||
)
|
||||
|
||||
parsed = parse_release(release_name, _KB)
|
||||
tmdb_title_safe = _KB.sanitize_for_fs(tmdb_title)
|
||||
parsed = _resolve_parsed(release_name, source_path, kb, prober)
|
||||
tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
|
||||
computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)
|
||||
|
||||
resolved = _resolve_series_folder(
|
||||
@@ -1,50 +0,0 @@
|
||||
"""Set folder path use case."""
|
||||
|
||||
import logging
|
||||
|
||||
from alfred.infrastructure.filesystem import FileManager
|
||||
|
||||
from .dto import SetFolderPathResponse
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class SetFolderPathUseCase:
|
||||
"""
|
||||
Use case for setting a folder path in configuration.
|
||||
|
||||
This orchestrates the FileManager to set folder paths.
|
||||
"""
|
||||
|
||||
def __init__(self, file_manager: FileManager):
|
||||
"""
|
||||
Initialize use case.
|
||||
|
||||
Args:
|
||||
file_manager: FileManager instance
|
||||
"""
|
||||
self.file_manager = file_manager
|
||||
|
||||
def execute(self, folder_name: str, path_value: str) -> SetFolderPathResponse:
|
||||
"""
|
||||
Set a folder path in configuration.
|
||||
|
||||
Args:
|
||||
folder_name: Name of folder to set (download, tvshow, movie, torrent)
|
||||
path_value: Absolute path to the folder
|
||||
|
||||
Returns:
|
||||
SetFolderPathResponse with success or error information
|
||||
"""
|
||||
result = self.file_manager.set_folder_path(folder_name, path_value)
|
||||
|
||||
if result.get("status") == "ok":
|
||||
return SetFolderPathResponse(
|
||||
status="ok",
|
||||
folder_name=result.get("folder_name"),
|
||||
path=result.get("path"),
|
||||
)
|
||||
else:
|
||||
return SetFolderPathResponse(
|
||||
status="error", error=result.get("error"), message=result.get("message")
|
||||
)
|
||||
@@ -1,44 +0,0 @@
|
||||
"""Movie application DTOs."""
|
||||
|
||||
from dataclasses import dataclass
|
||||
|
||||
|
||||
@dataclass
|
||||
class SearchMovieResponse:
|
||||
"""Response from searching for a movie."""
|
||||
|
||||
status: str
|
||||
imdb_id: str | None = None
|
||||
title: str | None = None
|
||||
media_type: str | None = None
|
||||
tmdb_id: int | None = None
|
||||
overview: str | None = None
|
||||
release_date: str | None = None
|
||||
vote_average: float | None = None
|
||||
error: str | None = None
|
||||
message: str | None = None
|
||||
|
||||
def to_dict(self):
|
||||
"""Convert to dict for agent compatibility."""
|
||||
result = {"status": self.status}
|
||||
|
||||
if self.error:
|
||||
result["error"] = self.error
|
||||
result["message"] = self.message
|
||||
else:
|
||||
if self.imdb_id:
|
||||
result["imdb_id"] = self.imdb_id
|
||||
if self.title:
|
||||
result["title"] = self.title
|
||||
if self.media_type:
|
||||
result["media_type"] = self.media_type
|
||||
if self.tmdb_id:
|
||||
result["tmdb_id"] = self.tmdb_id
|
||||
if self.overview:
|
||||
result["overview"] = self.overview
|
||||
if self.release_date:
|
||||
result["release_date"] = self.release_date
|
||||
if self.vote_average:
|
||||
result["vote_average"] = self.vote_average
|
||||
|
||||
return result
|
||||
@@ -1,93 +0,0 @@
|
||||
"""Search movie use case."""
|
||||
|
||||
import logging
|
||||
|
||||
from alfred.infrastructure.api.tmdb import (
|
||||
TMDBAPIError,
|
||||
TMDBClient,
|
||||
TMDBConfigurationError,
|
||||
TMDBNotFoundError,
|
||||
)
|
||||
|
||||
from .dto import SearchMovieResponse
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class SearchMovieUseCase:
|
||||
"""
|
||||
Use case for searching a movie and retrieving its IMDb ID.
|
||||
|
||||
This orchestrates the TMDB API client to find movie information.
|
||||
"""
|
||||
|
||||
def __init__(self, tmdb_client: TMDBClient):
|
||||
"""
|
||||
Initialize use case.
|
||||
|
||||
Args:
|
||||
tmdb_client: TMDB API client
|
||||
"""
|
||||
self.tmdb_client = tmdb_client
|
||||
|
||||
def execute(self, media_title: str) -> SearchMovieResponse:
|
||||
"""
|
||||
Search for a movie by title.
|
||||
|
||||
Args:
|
||||
media_title: Title of the movie to search for
|
||||
|
||||
Returns:
|
||||
SearchMovieResponse with movie information or error
|
||||
"""
|
||||
try:
|
||||
# Use the TMDB client to search for media
|
||||
result = self.tmdb_client.search_media(media_title)
|
||||
|
||||
# Check if IMDb ID was found
|
||||
if result.imdb_id:
|
||||
logger.info(f"IMDb ID found for '{media_title}': {result.imdb_id}")
|
||||
return SearchMovieResponse(
|
||||
status="ok",
|
||||
imdb_id=result.imdb_id,
|
||||
title=result.title,
|
||||
media_type=result.media_type,
|
||||
tmdb_id=result.tmdb_id,
|
||||
overview=result.overview,
|
||||
release_date=result.release_date,
|
||||
vote_average=result.vote_average,
|
||||
)
|
||||
else:
|
||||
logger.warning(f"No IMDb ID available for '{media_title}'")
|
||||
return SearchMovieResponse(
|
||||
status="ok",
|
||||
title=result.title,
|
||||
media_type=result.media_type,
|
||||
tmdb_id=result.tmdb_id,
|
||||
error="no_imdb_id",
|
||||
message=f"No IMDb ID available for '{result.title}'",
|
||||
)
|
||||
|
||||
except TMDBNotFoundError as e:
|
||||
logger.info(f"Media not found: {e}")
|
||||
return SearchMovieResponse(
|
||||
status="error", error="not_found", message=str(e)
|
||||
)
|
||||
|
||||
except TMDBConfigurationError as e:
|
||||
logger.error(f"TMDB configuration error: {e}")
|
||||
return SearchMovieResponse(
|
||||
status="error", error="configuration_error", message=str(e)
|
||||
)
|
||||
|
||||
except TMDBAPIError as e:
|
||||
logger.error(f"TMDB API error: {e}")
|
||||
return SearchMovieResponse(
|
||||
status="error", error="api_error", message=str(e)
|
||||
)
|
||||
|
||||
except ValueError as e:
|
||||
logger.error(f"Validation error: {e}")
|
||||
return SearchMovieResponse(
|
||||
status="error", error="validation_failed", message=str(e)
|
||||
)
|
||||
+3
-2
@@ -1,9 +1,10 @@
|
||||
"""Movie use cases."""
|
||||
|
||||
from .dto import SearchMovieResponse
|
||||
from .dto import MovieHit, SearchMovieResponse
|
||||
from .search_movie import SearchMovieUseCase
|
||||
|
||||
__all__ = [
|
||||
"SearchMovieUseCase",
|
||||
"MovieHit",
|
||||
"SearchMovieResponse",
|
||||
"SearchMovieUseCase",
|
||||
]
|
||||
@@ -0,0 +1,40 @@
|
||||
"""Movie application DTOs."""
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class MovieHit:
|
||||
"""One movie hit, flattened for transport to the agent."""
|
||||
|
||||
tmdb_id: int
|
||||
title: str
|
||||
release_year: int | None = None
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
out: dict = {"tmdb_id": self.tmdb_id, "title": self.title}
|
||||
if self.release_year is not None:
|
||||
out["release_year"] = self.release_year
|
||||
return out
|
||||
|
||||
|
||||
@dataclass
|
||||
class SearchMovieResponse:
|
||||
"""Response from searching for a movie."""
|
||||
|
||||
status: str
|
||||
hits: list[MovieHit] = field(default_factory=list)
|
||||
error: str | None = None
|
||||
message: str | None = None
|
||||
|
||||
def to_dict(self):
|
||||
"""Convert to dict for agent compatibility."""
|
||||
result: dict = {"status": self.status}
|
||||
|
||||
if self.error:
|
||||
result["error"] = self.error
|
||||
result["message"] = self.message
|
||||
else:
|
||||
result["hits"] = [h.to_dict() for h in self.hits]
|
||||
|
||||
return result
|
||||
@@ -0,0 +1,60 @@
|
||||
"""Search movie use case."""
|
||||
|
||||
import logging
|
||||
|
||||
from alfred.infrastructure.api_TO_CHECK.tmdb import (
|
||||
TMDBAPIError,
|
||||
TMDBClient,
|
||||
TMDBConfigurationError,
|
||||
)
|
||||
|
||||
from .dto import MovieHit, SearchMovieResponse
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class SearchMovieUseCase:
|
||||
"""List movies matching a free-text query via TMDB ``/search/movie``.
|
||||
|
||||
The use case is a thin orchestrator: it asks the client for hits,
|
||||
flattens domain VOs into agent-friendly primitives, and wraps
|
||||
errors. It deliberately does **not** look up ``imdb_id`` —
|
||||
enrichment is the caller's job (via :meth:`TMDBClient.get_movie_info`
|
||||
on a chosen ``tmdb_id``).
|
||||
"""
|
||||
|
||||
def __init__(self, tmdb_client: TMDBClient):
|
||||
self.tmdb_client = tmdb_client
|
||||
|
||||
def execute(self, media_title: str) -> SearchMovieResponse:
|
||||
try:
|
||||
results = self.tmdb_client.search_movies(media_title)
|
||||
|
||||
hits = [
|
||||
MovieHit(
|
||||
tmdb_id=r.tmdb_id.value,
|
||||
title=str(r.title),
|
||||
release_year=r.release_year.value if r.release_year else None,
|
||||
)
|
||||
for r in results
|
||||
]
|
||||
logger.info(f"search_movies({media_title!r}) → {len(hits)} hits")
|
||||
return SearchMovieResponse(status="ok", hits=hits)
|
||||
|
||||
except TMDBConfigurationError as e:
|
||||
logger.error(f"TMDB configuration error: {e}")
|
||||
return SearchMovieResponse(
|
||||
status="error", error="configuration_error", message=str(e)
|
||||
)
|
||||
|
||||
except TMDBAPIError as e:
|
||||
logger.error(f"TMDB API error: {e}")
|
||||
return SearchMovieResponse(
|
||||
status="error", error="api_error", message=str(e)
|
||||
)
|
||||
|
||||
except ValueError as e:
|
||||
logger.error(f"Validation error: {e}")
|
||||
return SearchMovieResponse(
|
||||
status="error", error="validation_failed", message=str(e)
|
||||
)
|
||||
@@ -0,0 +1,20 @@
|
||||
"""Release application layer — orchestrators sitting between domain
|
||||
parsing and infrastructure I/O.
|
||||
|
||||
Public surface:
|
||||
|
||||
- :func:`is_supported_video` / :func:`find_main_video` — pre-pipeline
|
||||
filesystem helpers (extension-only filtering, top-level video pick).
|
||||
- :func:`inspect_release` / :class:`InspectedResult` — full inspection
|
||||
pipeline combining parse + filesystem refinement + probe enrichment.
|
||||
"""
|
||||
|
||||
from .inspect import InspectedResult, inspect_release
|
||||
from .supported_media import find_main_video, is_supported_video
|
||||
|
||||
__all__ = [
|
||||
"InspectedResult",
|
||||
"find_main_video",
|
||||
"inspect_release",
|
||||
"is_supported_video",
|
||||
]
|
||||
+1
-1
@@ -19,7 +19,7 @@ from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
from alfred.domain.release.ports import ReleaseKnowledge
|
||||
from alfred.domain.releases_TO_CHECK.ports import ReleaseKnowledge
|
||||
from alfred.domain.release.value_objects import ParsedRelease
|
||||
|
||||
|
||||
@@ -0,0 +1,74 @@
|
||||
"""enrich_from_probe — fill missing ParsedRelease fields from MediaInfo."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import replace
|
||||
|
||||
from alfred.domain.releases_TO_CHECK.ports import ReleaseKnowledge
|
||||
from alfred.domain.release.value_objects import ParsedRelease
|
||||
from alfred.domain.shared_TO_CHECK.media import MediaInfo
|
||||
|
||||
|
||||
def enrich_from_probe(
|
||||
parsed: ParsedRelease, info: MediaInfo, kb: ReleaseKnowledge
|
||||
) -> ParsedRelease:
|
||||
"""
|
||||
Return a new ParsedRelease with None fields filled from ffprobe MediaInfo.
|
||||
|
||||
Only overwrites fields that are currently None — token-level values
|
||||
from the release name always take priority. ``ParsedRelease`` is
|
||||
frozen; this returns a new instance via :func:`dataclasses.replace`.
|
||||
|
||||
Translation tables (ffprobe codec name → scene token, channel count
|
||||
→ layout) live in ``kb.probe_mappings`` (loaded from
|
||||
``alfred/knowledge/release/probe_mappings.yaml``). When ffprobe
|
||||
reports a value with no mapping entry, the fallback is the uppercase
|
||||
raw value so unknown codecs still surface in a predictable form.
|
||||
"""
|
||||
mappings = kb.probe_mappings
|
||||
video_codec_map: dict[str, str] = mappings.get("video_codec", {})
|
||||
audio_codec_map: dict[str, str] = mappings.get("audio_codec", {})
|
||||
channel_map: dict[int, str] = mappings.get("audio_channels", {})
|
||||
|
||||
updates: dict[str, object] = {}
|
||||
|
||||
if parsed.quality is None and info.resolution:
|
||||
updates["quality"] = info.resolution
|
||||
|
||||
if parsed.codec is None and info.video_codec:
|
||||
updates["codec"] = video_codec_map.get(
|
||||
info.video_codec.lower(), info.video_codec.upper()
|
||||
)
|
||||
|
||||
# bit_depth: ffprobe exposes it via pix_fmt — not in MediaInfo yet, skip.
|
||||
|
||||
# Audio — use the default track, fallback to first
|
||||
default_track = next((t for t in info.audio_tracks if t.is_default), None)
|
||||
track = default_track or (info.audio_tracks[0] if info.audio_tracks else None)
|
||||
|
||||
if track:
|
||||
if parsed.audio_codec is None and track.codec:
|
||||
updates["audio_codec"] = audio_codec_map.get(
|
||||
track.codec.lower(), track.codec.upper()
|
||||
)
|
||||
|
||||
if parsed.audio_channels is None and track.channels:
|
||||
updates["audio_channels"] = channel_map.get(
|
||||
track.channels, f"{track.channels}ch"
|
||||
)
|
||||
|
||||
# Languages — merge ffprobe languages with token-level ones
|
||||
# "und" = undetermined, not useful
|
||||
if info.audio_languages:
|
||||
existing_upper = {lang.upper() for lang in parsed.languages}
|
||||
new_languages = list(parsed.languages)
|
||||
for lang in info.audio_languages:
|
||||
if lang.lower() != "und" and lang.upper() not in existing_upper:
|
||||
new_languages.append(lang)
|
||||
existing_upper.add(lang.upper())
|
||||
if len(new_languages) != len(parsed.languages):
|
||||
updates["languages"] = tuple(new_languages)
|
||||
|
||||
if not updates:
|
||||
return parsed
|
||||
return replace(parsed, **updates)
|
||||
@@ -0,0 +1,192 @@
|
||||
"""Release inspection orchestrator — the canonical "look at this thing"
|
||||
entry point.
|
||||
|
||||
``inspect_release`` is the single composition of the four layers we
|
||||
care about for a freshly-arrived release:
|
||||
|
||||
1. **Parse the name** — :func:`alfred.domain.release.services.parse_release`
|
||||
gives a ``ParsedRelease`` plus a ``ParseReport`` (confidence + road).
|
||||
2. **Pick the main video** — :func:`find_main_video` runs a top-level
|
||||
scan over the source path. If nothing qualifies the result still
|
||||
completes; downstream callers decide what to do with a videoless
|
||||
release.
|
||||
3. **Refine the media type** — :func:`detect_media_type` uses the
|
||||
on-disk extension mix to override any token-level guess (e.g. a
|
||||
bare ``.iso`` folder becomes ``"other"``). The refined value is
|
||||
patched onto ``parsed`` in place — same convention as
|
||||
``analyze_release`` had before.
|
||||
4. **Probe the video** — the injected :class:`MediaProber` fills in
|
||||
missing technical fields via :func:`enrich_from_probe`. Skipped
|
||||
when there is no main video or when ``media_type`` ended up in
|
||||
``{"unknown", "other"}`` (the probe would tell us nothing useful).
|
||||
|
||||
The return type is :class:`InspectedResult`, a frozen VO that bundles
|
||||
everything downstream callers need (``analyze_release`` tool,
|
||||
``resolve_destination``, future workflow stages) without forcing them
|
||||
to redo the same four calls.
|
||||
|
||||
Design notes:
|
||||
|
||||
- **Application layer.** This module touches both domain
|
||||
(``parse_release``) and infrastructure (``MediaProber`` port). That
|
||||
is exactly application's job — orchestrate.
|
||||
- **Knowledge base is injected.** ``inspect_release`` takes ``kb`` and
|
||||
``prober`` as parameters; no module-level singletons here. Callers
|
||||
(the tool wrapper, tests) decide what to plug in.
|
||||
- **Mutation is contained.** We still mutate ``parsed.media_type`` and
|
||||
let ``enrich_from_probe`` fill its ``None`` fields, because
|
||||
``ParsedRelease`` is intentionally a mutable dataclass. The outer
|
||||
``InspectedResult`` is frozen so the *bundle* is immutable from the
|
||||
caller's perspective.
|
||||
- **Never raises.** Filesystem / probe errors surface as ``None``
|
||||
fields on the result, never as exceptions — same contract as the
|
||||
underlying adapters.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, replace
|
||||
from pathlib import Path
|
||||
|
||||
from alfred.application.release_TO_CHECK.detect_media_type import detect_media_type
|
||||
from alfred.application.release_TO_CHECK.enrich_from_probe import enrich_from_probe
|
||||
from alfred.application.release_TO_CHECK.supported_media import find_main_video
|
||||
from alfred.domain.releases_TO_CHECK.ports import ReleaseKnowledge
|
||||
from alfred.domain.releases_TO_CHECK.parser.services import parse_release
|
||||
from alfred.domain.release.value_objects import (
|
||||
MediaTypeToken,
|
||||
ParsedRelease,
|
||||
ParseReport,
|
||||
)
|
||||
from alfred.domain.shared_TO_CHECK.media import MediaInfo
|
||||
from alfred.domain.shared_TO_CHECK.ports import MediaProber
|
||||
|
||||
# Media types for which a probe carries no useful information.
|
||||
_NON_PROBABLE_MEDIA_TYPES = frozenset({"unknown", "other"})
|
||||
|
||||
# Media types for which there's nothing for the organizer to do.
|
||||
# ``other`` covers things like games / ISOs / archives sitting on the
|
||||
# downloads folder. ``unknown`` does NOT belong here — those need a
|
||||
# user decision, not a skip.
|
||||
_SKIPPABLE_MEDIA_TYPES = frozenset({"other"})
|
||||
|
||||
# Roads that signal the parser couldn't reach a confident answer on its
|
||||
# own. ``Road`` values are kept as strings on the report to avoid a
|
||||
# cross-package import here.
|
||||
_ASK_USER_ROADS = frozenset({"path_of_pain"})
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class InspectedResult:
|
||||
"""The full picture of a release: parsed name + filesystem reality.
|
||||
|
||||
Bundles everything the downstream pipeline needs after a single
|
||||
inspection pass:
|
||||
|
||||
- ``parsed`` — :class:`ParsedRelease`, with ``media_type`` already
|
||||
refined by :func:`detect_media_type` and ``None`` tech fields
|
||||
filled in by :func:`enrich_from_probe` when a probe ran.
|
||||
- ``report`` — :class:`ParseReport` from the parser (confidence +
|
||||
road, untouched by inspection).
|
||||
- ``source_path`` — the path the inspector was pointed at (file or
|
||||
folder), as supplied by the caller.
|
||||
- ``main_video`` — the canonical video file inside ``source_path``,
|
||||
or ``None`` if no eligible file was found.
|
||||
- ``media_info`` — the :class:`MediaInfo` snapshot when a probe
|
||||
succeeded; ``None`` when no video was probed (no main video, or
|
||||
``media_type`` in ``{"unknown", "other"}``) or when ffprobe
|
||||
failed.
|
||||
- ``probe_used`` — ``True`` iff ``media_info`` is non-``None`` and
|
||||
``enrich_from_probe`` actually ran. Explicit flag so callers
|
||||
don't have to re-derive the condition.
|
||||
- ``recommended_action`` — derived hint for the orchestrator (see
|
||||
property docstring). Encodes the exclusion / clarification /
|
||||
go-ahead decision in one place so downstream callers don't
|
||||
re-implement the same checks.
|
||||
"""
|
||||
|
||||
parsed: ParsedRelease
|
||||
report: ParseReport
|
||||
source_path: Path
|
||||
main_video: Path | None
|
||||
media_info: MediaInfo | None
|
||||
probe_used: bool
|
||||
|
||||
@property
|
||||
def recommended_action(self) -> str:
|
||||
"""Return one of ``"skip"`` / ``"ask_user"`` / ``"process"``.
|
||||
|
||||
- ``"skip"`` — nothing to organize:
|
||||
* the source has no main video file, **or**
|
||||
* ``media_type`` is ``"other"`` (games / ISOs / archives).
|
||||
- ``"ask_user"`` — a decision is required before any action:
|
||||
* ``media_type`` is ``"unknown"`` (parser couldn't classify), **or**
|
||||
* the parse landed on ``Road.PATH_OF_PAIN``
|
||||
(low-confidence, malformed name, etc.).
|
||||
- ``"process"`` — everything else: a confident parse with a
|
||||
usable media type and a main video on disk. The orchestrator
|
||||
can move straight to the planning step.
|
||||
|
||||
The check ordering matters: ``"skip"`` wins over ``"ask_user"``
|
||||
because if there's no video to organize, no question to the
|
||||
user can change that. ``"ask_user"`` then wins over
|
||||
``"process"`` because a confident parse alone isn't enough if
|
||||
the type or road still flag uncertainty.
|
||||
"""
|
||||
if self.main_video is None:
|
||||
return "skip"
|
||||
if self.parsed.media_type.value in _SKIPPABLE_MEDIA_TYPES:
|
||||
return "skip"
|
||||
if self.parsed.media_type.value == "unknown":
|
||||
return "ask_user"
|
||||
if self.report.road in _ASK_USER_ROADS:
|
||||
return "ask_user"
|
||||
return "process"
|
||||
|
||||
|
||||
def inspect_release(
|
||||
release_name: str,
|
||||
source_path: Path,
|
||||
kb: ReleaseKnowledge,
|
||||
prober: MediaProber,
|
||||
) -> InspectedResult:
|
||||
"""Run the full inspection pipeline on ``release_name`` /
|
||||
``source_path``.
|
||||
|
||||
See module docstring for the four-step flow. ``kb`` and ``prober``
|
||||
are injected so the caller controls the knowledge base layering
|
||||
and the probe adapter (real ffprobe in production, stubs in tests).
|
||||
|
||||
Never raises. A missing or unreadable ``source_path`` simply
|
||||
results in ``main_video=None`` and ``media_info=None``.
|
||||
"""
|
||||
parsed, report = parse_release(release_name, kb)
|
||||
|
||||
# Step 2: refine media_type from the on-disk extension mix.
|
||||
# detect_media_type tolerates non-existent paths (returns parsed.media_type
|
||||
# untouched), so no need to guard here. ParsedRelease is frozen — use
|
||||
# dataclasses.replace to rebind with the refined value.
|
||||
refined_media_type = MediaTypeToken(detect_media_type(parsed, source_path, kb))
|
||||
if refined_media_type != parsed.media_type:
|
||||
parsed = replace(parsed, media_type=refined_media_type)
|
||||
|
||||
# Step 3: pick the canonical main video (top-level scan only).
|
||||
main_video = find_main_video(source_path, kb)
|
||||
|
||||
# Step 4: probe + enrich, when it makes sense.
|
||||
media_info: MediaInfo | None = None
|
||||
probe_used = False
|
||||
if main_video is not None and parsed.media_type not in _NON_PROBABLE_MEDIA_TYPES:
|
||||
media_info = prober.probe(main_video)
|
||||
if media_info is not None:
|
||||
parsed = enrich_from_probe(parsed, media_info, kb)
|
||||
probe_used = True
|
||||
|
||||
return InspectedResult(
|
||||
parsed=parsed,
|
||||
report=report,
|
||||
source_path=source_path,
|
||||
main_video=main_video,
|
||||
media_info=media_info,
|
||||
probe_used=probe_used,
|
||||
)
|
||||
@@ -0,0 +1,74 @@
|
||||
"""Pre-pipeline exclusion — decide which files are worth parsing.
|
||||
|
||||
These helpers live one notch above the domain: they touch the
|
||||
filesystem (``Path.iterdir``, ``Path.suffix``) but carry no parsing
|
||||
logic of their own. The goal is to filter out non-video files and pick
|
||||
the canonical "main video" from a release folder *before* anything
|
||||
hits :func:`~alfred.domain.release.parse_release`.
|
||||
|
||||
Design notes (Phase A bis, 2026-05-20):
|
||||
|
||||
- **Extension is the sole eligibility criterion.** A file is supported
|
||||
iff its suffix is in ``kb.video_extensions``. No size threshold, no
|
||||
filename heuristics ("sample", "trailer", …). If a release packs a
|
||||
bloated featurette or names its sample alphabetically before the
|
||||
main feature, that's PATH_OF_PAIN territory — not this layer's job.
|
||||
|
||||
- **Top-level scan only.** ``find_main_video`` does not descend into
|
||||
subdirectories. Releases that wrap the main video in ``Sample/`` or
|
||||
similar are non-scene-standard and handled by the orchestrator
|
||||
upstream.
|
||||
|
||||
- **Lexicographic tie-break.** When several candidates qualify
|
||||
(legitimate for season packs), we return the first by alphabetical
|
||||
order. Deterministic, no size-based ranking.
|
||||
|
||||
- **Direct ``Path`` I/O.** No ``FilesystemScanner`` port — this layer
|
||||
is application, not domain. If isolation becomes necessary for
|
||||
testing scale, we'll introduce a port then.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
from alfred.domain.releases_TO_CHECK.ports.knowledge import ReleaseKnowledge
|
||||
|
||||
|
||||
def is_supported_video(path: Path, kb: ReleaseKnowledge) -> bool:
|
||||
"""Return True when ``path`` is a video file the parser should
|
||||
consider.
|
||||
|
||||
The check is purely extension-based: ``path.suffix.lower()`` must
|
||||
belong to ``kb.video_extensions``. ``path`` must also be a regular
|
||||
file — directories and broken symlinks return False.
|
||||
"""
|
||||
if not path.is_file():
|
||||
return False
|
||||
return path.suffix.lower() in kb.video_extensions
|
||||
|
||||
|
||||
def find_main_video(folder: Path, kb: ReleaseKnowledge) -> Path | None:
|
||||
"""Return the canonical main video file inside ``folder``, or
|
||||
``None`` if there isn't one.
|
||||
|
||||
Behavior:
|
||||
|
||||
- Top-level scan only — subdirectories are ignored.
|
||||
- Eligibility is :func:`is_supported_video`.
|
||||
- When several files qualify, the lexicographically first one wins.
|
||||
- When ``folder`` itself is a video file, it is returned as-is
|
||||
(single-file releases are valid).
|
||||
- When ``folder`` doesn't exist or isn't a directory (and isn't a
|
||||
video file either), returns ``None``.
|
||||
"""
|
||||
if folder.is_file():
|
||||
return folder if is_supported_video(folder, kb) else None
|
||||
|
||||
if not folder.is_dir():
|
||||
return None
|
||||
|
||||
candidates = sorted(
|
||||
child for child in folder.iterdir() if is_supported_video(child, kb)
|
||||
)
|
||||
return candidates[0] if candidates else None
|
||||
+7
-7
@@ -5,13 +5,13 @@ import os
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
|
||||
from alfred.domain.subtitles.entities import SubtitleCandidate
|
||||
from alfred.domain.subtitles.value_objects import SubtitleType
|
||||
from alfred.domain.subtitles_TO_CHECK.entities import SubtitleScanResult
|
||||
from alfred.domain.subtitles_TO_CHECK.value_objects import SubtitleType
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _build_dest_name(track: SubtitleCandidate, video_stem: str) -> str:
|
||||
def _build_dest_name(track: SubtitleScanResult, video_stem: str) -> str:
|
||||
"""
|
||||
Build the destination filename for a subtitle track.
|
||||
|
||||
@@ -41,7 +41,7 @@ class PlacedTrack:
|
||||
@dataclass
|
||||
class PlaceResult:
|
||||
placed: list[PlacedTrack]
|
||||
skipped: list[tuple[SubtitleCandidate, str]] # (track, reason)
|
||||
skipped: list[tuple[SubtitleScanResult, str]] # (track, reason)
|
||||
|
||||
@property
|
||||
def placed_count(self) -> int:
|
||||
@@ -54,7 +54,7 @@ class PlaceResult:
|
||||
|
||||
class SubtitlePlacer:
|
||||
"""
|
||||
Hard-links matched SubtitleCandidate files next to a destination video.
|
||||
Hard-links matched SubtitleScanResult files next to a destination video.
|
||||
|
||||
Uses the same hard-link strategy as FileManager.copy_file:
|
||||
instant, no data duplication, qBittorrent keeps seeding.
|
||||
@@ -64,11 +64,11 @@ class SubtitlePlacer:
|
||||
|
||||
def place(
|
||||
self,
|
||||
tracks: list[SubtitleCandidate],
|
||||
tracks: list[SubtitleScanResult],
|
||||
destination_video: Path,
|
||||
) -> PlaceResult:
|
||||
placed: list[PlacedTrack] = []
|
||||
skipped: list[tuple[SubtitleCandidate, str]] = []
|
||||
skipped: list[tuple[SubtitleScanResult, str]] = []
|
||||
|
||||
dest_dir = destination_video.parent
|
||||
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
|
||||
import logging
|
||||
|
||||
from alfred.infrastructure.api.qbittorrent import (
|
||||
from alfred.infrastructure.api_TO_CHECK.qbittorrent import (
|
||||
QBittorrentAPIError,
|
||||
QBittorrentAuthError,
|
||||
QBittorrentClient,
|
||||
+1
-1
@@ -2,7 +2,7 @@
|
||||
|
||||
import logging
|
||||
|
||||
from alfred.infrastructure.api.knaben import (
|
||||
from alfred.infrastructure.api_TO_CHECK.knaben import (
|
||||
KnabenAPIError,
|
||||
KnabenClient,
|
||||
KnabenNotFoundError,
|
||||
@@ -0,0 +1,21 @@
|
||||
"""TV-show orchestrators — operate on the Alfred-managed TV library tree.
|
||||
|
||||
The TV library is a directory of show folders (one per TV show), each
|
||||
holding season folders containing video files. Modules here walk this
|
||||
tree and reconstruct on-disk :class:`SeriesRelease` aggregates by
|
||||
reusing the existing release pipeline (``inspect_release``) rather
|
||||
than duplicating its parse/probe logic.
|
||||
"""
|
||||
|
||||
from .dto import SearchShowResponse, ShowHit
|
||||
from .search_show import SearchShowUseCase
|
||||
from .walker import SeasonFolder, ShowTree, walk_show
|
||||
|
||||
__all__ = [
|
||||
"SearchShowResponse",
|
||||
"SearchShowUseCase",
|
||||
"SeasonFolder",
|
||||
"ShowHit",
|
||||
"ShowTree",
|
||||
"walk_show",
|
||||
]
|
||||
@@ -0,0 +1,39 @@
|
||||
"""TV show application DTOs."""
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ShowHit:
|
||||
"""One TV-show hit, flattened for transport to the agent."""
|
||||
|
||||
tmdb_id: int
|
||||
name: str
|
||||
first_air_year: int | None = None
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
out: dict = {"tmdb_id": self.tmdb_id, "name": self.name}
|
||||
if self.first_air_year is not None:
|
||||
out["first_air_year"] = self.first_air_year
|
||||
return out
|
||||
|
||||
|
||||
@dataclass
|
||||
class SearchShowResponse:
|
||||
"""Response from searching for a TV show."""
|
||||
|
||||
status: str
|
||||
hits: list[ShowHit] = field(default_factory=list)
|
||||
error: str | None = None
|
||||
message: str | None = None
|
||||
|
||||
def to_dict(self):
|
||||
result: dict = {"status": self.status}
|
||||
|
||||
if self.error:
|
||||
result["error"] = self.error
|
||||
result["message"] = self.message
|
||||
else:
|
||||
result["hits"] = [h.to_dict() for h in self.hits]
|
||||
|
||||
return result
|
||||
@@ -0,0 +1,59 @@
|
||||
"""Search TV show use case."""
|
||||
|
||||
import logging
|
||||
|
||||
from alfred.infrastructure.api_TO_CHECK.tmdb import (
|
||||
TMDBAPIError,
|
||||
TMDBClient,
|
||||
TMDBConfigurationError,
|
||||
)
|
||||
|
||||
from .dto import SearchShowResponse, ShowHit
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class SearchShowUseCase:
|
||||
"""List TV shows matching a free-text query via TMDB ``/search/tv``.
|
||||
|
||||
Symmetric to :class:`alfred.application.movies.SearchMovieUseCase`:
|
||||
thin orchestrator, flattens domain VOs into agent-friendly
|
||||
primitives, no ``imdb_id`` enrichment (caller follows up with
|
||||
:meth:`TMDBClient.get_tv_show_info` on a chosen ``tmdb_id``).
|
||||
"""
|
||||
|
||||
def __init__(self, tmdb_client: TMDBClient):
|
||||
self.tmdb_client = tmdb_client
|
||||
|
||||
def execute(self, show_title: str) -> SearchShowResponse:
|
||||
try:
|
||||
results = self.tmdb_client.search_shows(show_title)
|
||||
|
||||
hits = [
|
||||
ShowHit(
|
||||
tmdb_id=r.tmdb_id.value,
|
||||
name=r.name,
|
||||
first_air_year=r.first_air_year,
|
||||
)
|
||||
for r in results
|
||||
]
|
||||
logger.info(f"search_shows({show_title!r}) → {len(hits)} hits")
|
||||
return SearchShowResponse(status="ok", hits=hits)
|
||||
|
||||
except TMDBConfigurationError as e:
|
||||
logger.error(f"TMDB configuration error: {e}")
|
||||
return SearchShowResponse(
|
||||
status="error", error="configuration_error", message=str(e)
|
||||
)
|
||||
|
||||
except TMDBAPIError as e:
|
||||
logger.error(f"TMDB API error: {e}")
|
||||
return SearchShowResponse(
|
||||
status="error", error="api_error", message=str(e)
|
||||
)
|
||||
|
||||
except ValueError as e:
|
||||
logger.error(f"Validation error: {e}")
|
||||
return SearchShowResponse(
|
||||
status="error", error="validation_failed", message=str(e)
|
||||
)
|
||||
@@ -0,0 +1,208 @@
|
||||
"""Show tree walker — minimal filesystem traversal of a TV show folder.
|
||||
|
||||
The walker is intentionally dumb: it lists season folders, classifies
|
||||
each one as PACK or EPISODIC by **inspecting its filesystem
|
||||
structure**, and hands the orchestrator a flat list of video files
|
||||
per season. It does not parse release names, run ffprobe, or
|
||||
classify subtitle files. All of that intelligence lives in the
|
||||
existing release pipeline (``inspect_release`` + downstream
|
||||
services); the walker just hands the orchestrator the paths to feed
|
||||
into that pipeline.
|
||||
|
||||
Folder convention
|
||||
-----------------
|
||||
|
||||
Inside an Alfred-managed library, a show root looks like::
|
||||
|
||||
Foundation/
|
||||
Foundation.S01.1080p.WEB-DL.x265-GROUP/ ← PACK season
|
||||
Foundation.S01E01.1080p.WEB-DL.x265.mkv ← flat video
|
||||
Foundation.S01E02.1080p.WEB-DL.x265.mkv
|
||||
...
|
||||
Foundation.S02/ ← EPISODIC season
|
||||
Foundation.S02E01.1080p.WEB-DL.x265-GROUP/ ← episode subfolder
|
||||
Foundation.S02E01.1080p.WEB-DL.x265-GROUP.mkv
|
||||
Foundation.S02E02.1080p.WEB-DL.x265-OTHER/
|
||||
Foundation.S02E02.1080p.WEB-DL.x265-OTHER.mkv
|
||||
|
||||
The walker recognizes a season folder by a ``Sxx`` token anywhere in
|
||||
its name (case-insensitive). It does **not** care about Plex-style
|
||||
names (``Season 01``, ``Specials``) — the Alfred library uses
|
||||
release-style folder names only.
|
||||
|
||||
PACK vs EPISODIC is a **structural distinction**, not a naming one:
|
||||
|
||||
* **PACK** — season folder contains N flat video files. No
|
||||
subfolders.
|
||||
* **EPISODIC** — season folder contains N subfolders, each holding
|
||||
exactly one video.
|
||||
|
||||
A season folder that mixes the two layouts (some flat videos AND
|
||||
some subfolders) is malformed: the walker reports
|
||||
``mode=None`` and an empty ``video_files`` tuple so the
|
||||
orchestrator can warn and skip it.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import re
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
|
||||
from alfred.domain.releases_TO_CHECK.ports import ReleaseKnowledge
|
||||
from alfred.domain.releases_TO_CHECK.value_objects import ReleaseMode
|
||||
from alfred.domain.shared_TO_CHECK.ports import FilesystemScanner
|
||||
|
||||
_LOG = logging.getLogger(__name__)
|
||||
|
||||
# Matches any ``Sxx`` token (1-2 digits) bounded by non-alphanumerics.
|
||||
# Examples that match: ``Foundation.S01.1080p`` , ``S2.Pack`` , ``BBC.s10.bluray``.
|
||||
# Examples that don't: ``Sample`` , ``Soundtrack`` , ``2024.S0E1`` (no S+digits boundary).
|
||||
_SEASON_TOKEN_RE = re.compile(r"(?<![A-Za-z0-9])s(\d{1,2})(?![A-Za-z0-9])", re.IGNORECASE)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class SeasonFolder:
|
||||
"""One season folder discovered inside a show root.
|
||||
|
||||
``mode`` is set by the walker from the FS structure:
|
||||
|
||||
* :attr:`ReleaseMode.PACK` — ``video_files`` lists the season
|
||||
folder's flat videos.
|
||||
* :attr:`ReleaseMode.EPISODIC` — ``video_files`` lists each
|
||||
episode subfolder's single video.
|
||||
* ``None`` — the folder is empty, malformed (mixed layout), or
|
||||
otherwise unclassifiable. ``video_files`` is empty. The
|
||||
orchestrator decides whether to warn/skip.
|
||||
"""
|
||||
|
||||
season_dir: Path
|
||||
mode: ReleaseMode | None
|
||||
video_files: tuple[Path, ...]
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ShowTree:
|
||||
"""The full structural snapshot of a show on disk."""
|
||||
|
||||
show_root: Path
|
||||
season_folders: tuple[SeasonFolder, ...]
|
||||
|
||||
|
||||
def walk_show(
|
||||
show_root: Path,
|
||||
*,
|
||||
scanner: FilesystemScanner,
|
||||
kb: ReleaseKnowledge,
|
||||
) -> ShowTree:
|
||||
"""Walk ``show_root`` and return its structural tree.
|
||||
|
||||
The walker:
|
||||
|
||||
* lists direct children of ``show_root``,
|
||||
* keeps the directories whose name contains a ``Sxx`` token,
|
||||
* classifies each season folder as PACK / EPISODIC / unknown by
|
||||
inspecting its direct children (videos vs subfolders),
|
||||
* for EPISODIC, descends one extra level into each episode
|
||||
subfolder to collect its single video,
|
||||
* sorts season folders by name and video files by name within
|
||||
each folder.
|
||||
|
||||
The walker never raises — empty / unreadable / malformed
|
||||
directories surface as a ``SeasonFolder`` with ``mode=None`` and
|
||||
an empty ``video_files`` tuple.
|
||||
"""
|
||||
video_exts = {ext.lower() for ext in kb.video_extensions}
|
||||
season_folders: list[SeasonFolder] = []
|
||||
for entry in scanner.scan_dir(show_root):
|
||||
if not entry.is_dir or not _SEASON_TOKEN_RE.search(entry.name):
|
||||
continue
|
||||
season_folders.append(
|
||||
_classify_season(entry.path, scanner=scanner, video_exts=video_exts)
|
||||
)
|
||||
return ShowTree(
|
||||
show_root=show_root, season_folders=tuple(season_folders)
|
||||
)
|
||||
|
||||
|
||||
# --------------------------------------------------------------------------- #
|
||||
# Season-folder classification #
|
||||
# --------------------------------------------------------------------------- #
|
||||
|
||||
|
||||
def _classify_season(
|
||||
season_dir: Path,
|
||||
*,
|
||||
scanner: FilesystemScanner,
|
||||
video_exts: set[str],
|
||||
) -> SeasonFolder:
|
||||
"""Inspect one season folder and decide PACK / EPISODIC / unknown.
|
||||
|
||||
Looks only at direct children. For EPISODIC, descends one extra
|
||||
level into each subfolder to collect its single video. Mixed
|
||||
layouts (flat videos + subfolders) are reported as ``mode=None``
|
||||
so the orchestrator can skip them with a warning.
|
||||
"""
|
||||
flat_videos: list[Path] = []
|
||||
subdirs: list[Path] = []
|
||||
for child in scanner.scan_dir(season_dir):
|
||||
if child.is_file and child.suffix.lower() in video_exts:
|
||||
flat_videos.append(child.path)
|
||||
elif child.is_dir:
|
||||
subdirs.append(child.path)
|
||||
# Anything else (non-video files like .nfo, .srt at the season
|
||||
# root) is ignored — it doesn't affect classification.
|
||||
|
||||
has_flat = bool(flat_videos)
|
||||
has_subdirs = bool(subdirs)
|
||||
|
||||
if has_flat and has_subdirs:
|
||||
_LOG.warning(
|
||||
"walker: season folder %s mixes flat videos and subfolders — "
|
||||
"malformed layout, skipping",
|
||||
season_dir,
|
||||
)
|
||||
return SeasonFolder(season_dir=season_dir, mode=None, video_files=())
|
||||
|
||||
if has_flat:
|
||||
return SeasonFolder(
|
||||
season_dir=season_dir,
|
||||
mode=ReleaseMode.PACK,
|
||||
video_files=tuple(sorted(flat_videos)),
|
||||
)
|
||||
|
||||
if has_subdirs:
|
||||
episode_videos: list[Path] = []
|
||||
for sub in sorted(subdirs):
|
||||
videos_in_sub = [
|
||||
child.path
|
||||
for child in scanner.scan_dir(sub)
|
||||
if child.is_file and child.suffix.lower() in video_exts
|
||||
]
|
||||
if len(videos_in_sub) == 0:
|
||||
_LOG.warning(
|
||||
"walker: episode subfolder %s contains no video — skipping",
|
||||
sub,
|
||||
)
|
||||
continue
|
||||
if len(videos_in_sub) > 1:
|
||||
_LOG.warning(
|
||||
"walker: episode subfolder %s contains %d videos — "
|
||||
"malformed, skipping season %s",
|
||||
sub,
|
||||
len(videos_in_sub),
|
||||
season_dir,
|
||||
)
|
||||
return SeasonFolder(
|
||||
season_dir=season_dir, mode=None, video_files=()
|
||||
)
|
||||
episode_videos.append(videos_in_sub[0])
|
||||
return SeasonFolder(
|
||||
season_dir=season_dir,
|
||||
mode=ReleaseMode.EPISODIC,
|
||||
video_files=tuple(episode_videos),
|
||||
)
|
||||
|
||||
# No flat videos, no subdirs → empty season folder.
|
||||
return SeasonFolder(season_dir=season_dir, mode=None, video_files=())
|
||||
@@ -1,104 +0,0 @@
|
||||
"""Movie domain entities."""
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
|
||||
from ..shared.media import AudioTrack, MediaWithTracks, SubtitleTrack
|
||||
from ..shared.value_objects import FilePath, FileSize, ImdbId
|
||||
from .value_objects import MovieTitle, Quality, ReleaseYear
|
||||
|
||||
|
||||
@dataclass(eq=False)
|
||||
class Movie(MediaWithTracks):
|
||||
"""
|
||||
Movie aggregate root for the movies domain.
|
||||
|
||||
Carries file metadata (path, size) and the tracks discovered by the
|
||||
ffprobe + subtitle scan pipeline. The track lists may be empty when the
|
||||
movie is known but not yet scanned, or when no file is downloaded.
|
||||
|
||||
Track helpers follow the same "C+" contract as ``Episode``: pass a
|
||||
``Language`` for cross-format matching, or a ``str`` for case-insensitive
|
||||
direct comparison.
|
||||
|
||||
Equality is identity-based: two ``Movie`` instances are equal iff they
|
||||
share the same ``imdb_id``, regardless of file/track contents. This is
|
||||
the DDD aggregate invariant — the aggregate is identified by its root id.
|
||||
"""
|
||||
|
||||
imdb_id: ImdbId
|
||||
title: MovieTitle
|
||||
release_year: ReleaseYear | None = None
|
||||
quality: Quality = Quality.UNKNOWN
|
||||
file_path: FilePath | None = None
|
||||
file_size: FileSize | None = None
|
||||
tmdb_id: int | None = None
|
||||
added_at: datetime = field(default_factory=datetime.now)
|
||||
audio_tracks: list[AudioTrack] = field(default_factory=list)
|
||||
subtitle_tracks: list[SubtitleTrack] = field(default_factory=list)
|
||||
|
||||
def __post_init__(self):
|
||||
"""Validate movie entity."""
|
||||
# Ensure ImdbId is actually an ImdbId instance
|
||||
if not isinstance(self.imdb_id, ImdbId):
|
||||
if isinstance(self.imdb_id, str):
|
||||
self.imdb_id = ImdbId(self.imdb_id)
|
||||
else:
|
||||
raise ValueError(
|
||||
f"imdb_id must be ImdbId or str, got {type(self.imdb_id)}"
|
||||
)
|
||||
|
||||
# Ensure MovieTitle is actually a MovieTitle instance
|
||||
if not isinstance(self.title, MovieTitle):
|
||||
if isinstance(self.title, str):
|
||||
self.title = MovieTitle(self.title)
|
||||
else:
|
||||
raise ValueError(
|
||||
f"title must be MovieTitle or str, got {type(self.title)}"
|
||||
)
|
||||
|
||||
def __eq__(self, other: object) -> bool:
|
||||
if not isinstance(other, Movie):
|
||||
return NotImplemented
|
||||
return self.imdb_id == other.imdb_id
|
||||
|
||||
def __hash__(self) -> int:
|
||||
return hash(self.imdb_id)
|
||||
|
||||
# Track helpers (has_audio_in / audio_languages / has_subtitles_in /
|
||||
# has_forced_subs / subtitle_languages) come from MediaWithTracks.
|
||||
|
||||
def get_folder_name(self) -> str:
|
||||
"""
|
||||
Get the folder name for this movie.
|
||||
|
||||
Format: "Title (Year)"
|
||||
Example: "Inception (2010)"
|
||||
"""
|
||||
if self.release_year:
|
||||
return f"{self.title.value} ({self.release_year.value})"
|
||||
return self.title.value
|
||||
|
||||
def get_filename(self) -> str:
|
||||
"""
|
||||
Get the suggested filename for this movie.
|
||||
|
||||
Format: "Title.Year.Quality.ext"
|
||||
Example: "Inception.2010.1080p.mkv"
|
||||
"""
|
||||
parts = [self.title.normalized()]
|
||||
|
||||
if self.release_year:
|
||||
parts.append(str(self.release_year.value))
|
||||
|
||||
if self.quality != Quality.UNKNOWN:
|
||||
parts.append(self.quality.value)
|
||||
|
||||
# Extension will be added based on actual file
|
||||
return ".".join(parts)
|
||||
|
||||
def __str__(self) -> str:
|
||||
return f"{self.title.value} ({self.release_year.value if self.release_year else 'Unknown'})"
|
||||
|
||||
def __repr__(self) -> str:
|
||||
return f"Movie(imdb_id={self.imdb_id}, title='{self.title.value}')"
|
||||
@@ -1,73 +0,0 @@
|
||||
"""Movie repository interfaces (abstract)."""
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
|
||||
from ..shared.value_objects import ImdbId
|
||||
from .entities import Movie
|
||||
|
||||
|
||||
class MovieRepository(ABC):
|
||||
"""
|
||||
Abstract repository for movie persistence.
|
||||
|
||||
This defines the interface that infrastructure implementations must follow.
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def save(self, movie: Movie) -> None:
|
||||
"""
|
||||
Save a movie to the repository.
|
||||
|
||||
Args:
|
||||
movie: Movie entity to save
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def find_by_imdb_id(self, imdb_id: ImdbId) -> Movie | None:
|
||||
"""
|
||||
Find a movie by its IMDb ID.
|
||||
|
||||
Args:
|
||||
imdb_id: IMDb ID to search for
|
||||
|
||||
Returns:
|
||||
Movie if found, None otherwise
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def find_all(self) -> list[Movie]:
|
||||
"""
|
||||
Get all movies in the repository.
|
||||
|
||||
Returns:
|
||||
List of all movies
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def delete(self, imdb_id: ImdbId) -> bool:
|
||||
"""
|
||||
Delete a movie from the repository.
|
||||
|
||||
Args:
|
||||
imdb_id: IMDb ID of the movie to delete
|
||||
|
||||
Returns:
|
||||
True if deleted, False if not found
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def exists(self, imdb_id: ImdbId) -> bool:
|
||||
"""
|
||||
Check if a movie exists in the repository.
|
||||
|
||||
Args:
|
||||
imdb_id: IMDb ID to check
|
||||
|
||||
Returns:
|
||||
True if exists, False otherwise
|
||||
"""
|
||||
pass
|
||||
@@ -0,0 +1,91 @@
|
||||
"""Movie domain entities."""
|
||||
|
||||
from dataclasses import dataclass
|
||||
|
||||
from ..shared_TO_CHECK.value_objects import ImdbId, TmdbId
|
||||
from .value_objects import MovieTitle, ReleaseYear
|
||||
|
||||
|
||||
@dataclass(frozen=True, eq=False)
|
||||
class Movie:
|
||||
"""
|
||||
Movie aggregate root for the movies domain.
|
||||
|
||||
TMDB-only aggregate: carries identity (``tmdb_id`` + optional
|
||||
``imdb_id``) plus the catalog facts that come from TMDB (``title``,
|
||||
``release_year``). Filesystem-side concerns (file path, quality,
|
||||
tracks, ``added_at``) live on :class:`alfred.domain.releases.entities.
|
||||
MovieRelease`, the per-movie release aggregate persisted alongside.
|
||||
|
||||
Frozen: rebuild via ``dataclasses.replace`` to project metadata
|
||||
updates (e.g. a TMDB refresh) onto a new instance.
|
||||
|
||||
Equality is identity-based on ``tmdb_id``: two ``Movie`` instances
|
||||
are equal iff they share the same primary key. ``imdb_id`` is a
|
||||
secondary anchor and not part of the identity.
|
||||
"""
|
||||
|
||||
tmdb_id: TmdbId
|
||||
title: MovieTitle
|
||||
imdb_id: ImdbId | None = None
|
||||
release_year: ReleaseYear | None = None
|
||||
|
||||
def __post_init__(self) -> None:
|
||||
if not isinstance(self.tmdb_id, TmdbId):
|
||||
raise ValueError(
|
||||
f"tmdb_id must be TmdbId, got {type(self.tmdb_id)}"
|
||||
)
|
||||
if not isinstance(self.title, MovieTitle):
|
||||
if isinstance(self.title, str):
|
||||
object.__setattr__(self, "title", MovieTitle(self.title))
|
||||
else:
|
||||
raise ValueError(
|
||||
f"title must be MovieTitle or str, got {type(self.title)}"
|
||||
)
|
||||
if self.imdb_id is not None and not isinstance(self.imdb_id, ImdbId):
|
||||
raise ValueError(
|
||||
f"imdb_id must be ImdbId or None, got {type(self.imdb_id)}"
|
||||
)
|
||||
|
||||
def __eq__(self, other: object) -> bool:
|
||||
if not isinstance(other, Movie):
|
||||
return NotImplemented
|
||||
return self.tmdb_id == other.tmdb_id
|
||||
|
||||
def __hash__(self) -> int:
|
||||
return hash(self.tmdb_id)
|
||||
|
||||
# WRONG
|
||||
def get_folder_name(self) -> str:
|
||||
"""
|
||||
Get the folder name for this movie.
|
||||
|
||||
Format: "Title (Year)"
|
||||
Example: "Inception (2010)"
|
||||
"""
|
||||
if self.release_year:
|
||||
return f"{self.title.value} ({self.release_year.value})"
|
||||
return self.title.value
|
||||
|
||||
# WRONG
|
||||
def get_filename(self) -> str:
|
||||
"""
|
||||
Get the suggested base filename (without extension) for this movie.
|
||||
|
||||
Format: ``Title.Year`` (quality lives on
|
||||
:class:`alfred.domain.releases.entities.MovieRelease` now and is
|
||||
appended by the release-aware caller — typically the rescan /
|
||||
organize flow, after Phase 4).
|
||||
|
||||
Example: ``Inception.2010``.
|
||||
"""
|
||||
parts = [self.title.normalized()]
|
||||
if self.release_year:
|
||||
parts.append(str(self.release_year.value))
|
||||
return ".".join(parts)
|
||||
|
||||
def __str__(self) -> str:
|
||||
return f"{self.title.value} ({self.release_year.value if self.release_year else 'Unknown'})"
|
||||
|
||||
def __repr__(self) -> str:
|
||||
return f"Movie(tmdb_id={self.tmdb_id}, title='{self.title.value}')"
|
||||
@@ -1,6 +1,6 @@
|
||||
"""Movie domain exceptions."""
|
||||
|
||||
from ..shared.exceptions import DomainException, NotFoundError
|
||||
from ..shared_TO_CHECK.exceptions import DomainException, NotFoundError
|
||||
|
||||
|
||||
class MovieNotFound(NotFoundError):
|
||||
+3
-15
@@ -3,8 +3,7 @@
|
||||
from dataclasses import dataclass
|
||||
from enum import Enum
|
||||
|
||||
from ..shared.exceptions import ValidationError
|
||||
from ..shared.value_objects import to_dot_folder_name
|
||||
from ..shared_TO_CHECK.exceptions import ValidationError
|
||||
|
||||
|
||||
class Quality(Enum):
|
||||
@@ -56,18 +55,11 @@ class MovieTitle:
|
||||
f"Movie title must be a string, got {type(self.value)}"
|
||||
)
|
||||
|
||||
if len(self.value) > 500:
|
||||
if len(self.value) > 150:
|
||||
raise ValidationError(
|
||||
f"Movie title too long: {len(self.value)} characters (max 500)"
|
||||
f"Movie title too long: {len(self.value)} characters (max 150)"
|
||||
)
|
||||
|
||||
def normalized(self) -> str:
|
||||
"""
|
||||
Return normalized title for file system usage.
|
||||
|
||||
Removes special characters and replaces spaces with dots.
|
||||
"""
|
||||
return to_dot_folder_name(self.value)
|
||||
|
||||
def __str__(self) -> str:
|
||||
return self.value
|
||||
@@ -93,10 +85,6 @@ class ReleaseYear:
|
||||
f"Release year must be an integer, got {type(self.value)}"
|
||||
)
|
||||
|
||||
# Movies started around 1888, and we shouldn't have movies from the future
|
||||
if self.value < 1888 or self.value > 2100:
|
||||
raise ValidationError(f"Invalid release year: {self.value}")
|
||||
|
||||
def __str__(self) -> str:
|
||||
return str(self.value)
|
||||
|
||||
@@ -1,6 +0,0 @@
|
||||
"""Release domain — release name parsing and naming conventions."""
|
||||
|
||||
from .services import parse_release
|
||||
from .value_objects import ParsedRelease
|
||||
|
||||
__all__ = ["ParsedRelease", "parse_release"]
|
||||
@@ -1,52 +0,0 @@
|
||||
"""ReleaseKnowledge port — the read-only query surface that
|
||||
``parse_release`` and ``ParsedRelease`` need from the release knowledge
|
||||
base, expressed as a structural Protocol so the domain never imports any
|
||||
concrete loader.
|
||||
|
||||
The concrete YAML-backed implementation lives in
|
||||
``alfred/infrastructure/knowledge/release_kb.py``. Tests can supply any
|
||||
object that satisfies this shape (e.g. a simple dataclass).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Protocol
|
||||
|
||||
|
||||
class ReleaseKnowledge(Protocol):
|
||||
"""Read-only snapshot of release-name parsing knowledge."""
|
||||
|
||||
# --- Token sets used by the tokenizer / matchers ---
|
||||
|
||||
resolutions: set[str]
|
||||
sources: set[str]
|
||||
codecs: set[str]
|
||||
language_tokens: set[str]
|
||||
forbidden_chars: set[str]
|
||||
hdr_extra: set[str]
|
||||
|
||||
# --- Structured knowledge (loaded from YAML as dicts) ---
|
||||
|
||||
audio: dict
|
||||
video_meta: dict
|
||||
editions: dict
|
||||
media_type_tokens: dict
|
||||
|
||||
# --- Tokenizer separators ---
|
||||
|
||||
separators: list[str]
|
||||
|
||||
# --- File-extension sets (used by application/infra modules that work
|
||||
# directly with filesystem paths, e.g. media-type detection, video
|
||||
# lookup). Domain parsing itself doesn't touch these. ---
|
||||
|
||||
video_extensions: set[str]
|
||||
non_video_extensions: set[str]
|
||||
subtitle_extensions: set[str]
|
||||
metadata_extensions: set[str]
|
||||
|
||||
# --- Filesystem sanitization (Option B: pre-sanitize at parse time) ---
|
||||
|
||||
def sanitize_for_fs(self, text: str) -> str:
|
||||
"""Strip filesystem-forbidden characters from ``text``."""
|
||||
...
|
||||
@@ -1,506 +0,0 @@
|
||||
"""Release domain — parsing service."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
|
||||
from .ports import ReleaseKnowledge
|
||||
from .value_objects import MediaTypeToken, ParsedRelease, ParsePath
|
||||
|
||||
|
||||
def _tokenize(name: str, kb: ReleaseKnowledge) -> list[str]:
|
||||
"""Split a release name on the configured separators, dropping empty tokens."""
|
||||
pattern = "[" + re.escape("".join(kb.separators)) + "]+"
|
||||
return [t for t in re.split(pattern, name) if t]
|
||||
|
||||
|
||||
def parse_release(name: str, kb: ReleaseKnowledge) -> ParsedRelease:
|
||||
"""
|
||||
Parse a release name and return a ParsedRelease.
|
||||
|
||||
Flow:
|
||||
1. Strip a leading/trailing [site.tag] if present (sets parse_path="sanitized").
|
||||
2. Check the remainder for truly forbidden chars (anything not in the
|
||||
configured separators list). If any remain → media_type="unknown",
|
||||
parse_path="ai", and the LLM handles it.
|
||||
3. Tokenize using the configured separators (".", " ", "[", "]", "(", ")", "_", ...)
|
||||
and run token-level matchers (season/episode, tech, languages, audio,
|
||||
video, edition, title, year).
|
||||
"""
|
||||
parse_path = ParsePath.DIRECT.value
|
||||
|
||||
# Always try to extract a bracket-enclosed site tag first.
|
||||
clean, site_tag = _strip_site_tag(name)
|
||||
if site_tag is not None:
|
||||
parse_path = ParsePath.SANITIZED.value
|
||||
|
||||
if not _is_well_formed(clean, kb):
|
||||
return ParsedRelease(
|
||||
raw=name,
|
||||
normalised=clean,
|
||||
title=clean,
|
||||
title_sanitized=kb.sanitize_for_fs(clean),
|
||||
year=None,
|
||||
season=None,
|
||||
episode=None,
|
||||
episode_end=None,
|
||||
quality=None,
|
||||
source=None,
|
||||
codec=None,
|
||||
group="UNKNOWN",
|
||||
tech_string="",
|
||||
media_type=MediaTypeToken.UNKNOWN.value,
|
||||
site_tag=site_tag,
|
||||
parse_path=ParsePath.AI.value,
|
||||
)
|
||||
|
||||
name = clean
|
||||
tokens = _tokenize(name, kb)
|
||||
|
||||
season, episode, episode_end = _extract_season_episode(tokens)
|
||||
quality, source, codec, group, tech_tokens = _extract_tech(tokens, kb)
|
||||
languages, lang_tokens = _extract_languages(tokens, kb)
|
||||
audio_codec, audio_channels, audio_tokens = _extract_audio(tokens, kb)
|
||||
bit_depth, hdr_format, video_tokens = _extract_video_meta(tokens, kb)
|
||||
edition, edition_tokens = _extract_edition(tokens, kb)
|
||||
title = _extract_title(
|
||||
tokens,
|
||||
tech_tokens | lang_tokens | audio_tokens | video_tokens | edition_tokens,
|
||||
kb,
|
||||
)
|
||||
year = _extract_year(tokens, title)
|
||||
media_type = _infer_media_type(
|
||||
season, quality, source, codec, year, edition, tokens, kb
|
||||
)
|
||||
|
||||
tech_parts = [p for p in [quality, source, codec] if p]
|
||||
tech_string = ".".join(tech_parts)
|
||||
|
||||
return ParsedRelease(
|
||||
raw=name,
|
||||
normalised=name,
|
||||
title=title,
|
||||
title_sanitized=kb.sanitize_for_fs(title),
|
||||
year=year,
|
||||
season=season,
|
||||
episode=episode,
|
||||
episode_end=episode_end,
|
||||
quality=quality,
|
||||
source=source,
|
||||
codec=codec,
|
||||
group=group,
|
||||
tech_string=tech_string,
|
||||
media_type=media_type,
|
||||
site_tag=site_tag,
|
||||
parse_path=parse_path,
|
||||
languages=languages,
|
||||
audio_codec=audio_codec,
|
||||
audio_channels=audio_channels,
|
||||
bit_depth=bit_depth,
|
||||
hdr_format=hdr_format,
|
||||
edition=edition,
|
||||
)
|
||||
|
||||
|
||||
def _infer_media_type(
|
||||
season: int | None,
|
||||
quality: str | None,
|
||||
source: str | None,
|
||||
codec: str | None,
|
||||
year: int | None,
|
||||
edition: str | None,
|
||||
tokens: list[str],
|
||||
kb: ReleaseKnowledge,
|
||||
) -> str:
|
||||
"""
|
||||
Infer media_type from token-level evidence only (no filesystem access).
|
||||
|
||||
- documentary : DOC token present
|
||||
- concert : CONCERT token present
|
||||
- tv_complete : INTEGRALE/COMPLETE token, no season
|
||||
- tv_show : season token found
|
||||
- movie : no season, at least one tech marker
|
||||
- unknown : no conclusive evidence
|
||||
"""
|
||||
upper_tokens = {t.upper() for t in tokens}
|
||||
|
||||
doc_tokens = {t.upper() for t in kb.media_type_tokens.get("doc", [])}
|
||||
concert_tokens = {t.upper() for t in kb.media_type_tokens.get("concert", [])}
|
||||
integrale_tokens = {t.upper() for t in kb.media_type_tokens.get("integrale", [])}
|
||||
|
||||
if upper_tokens & doc_tokens:
|
||||
return MediaTypeToken.DOCUMENTARY.value
|
||||
if upper_tokens & concert_tokens:
|
||||
return MediaTypeToken.CONCERT.value
|
||||
if (
|
||||
edition in {"COMPLETE", "INTEGRALE", "COLLECTION"}
|
||||
or upper_tokens & integrale_tokens
|
||||
) and season is None:
|
||||
return MediaTypeToken.TV_COMPLETE.value
|
||||
if season is not None:
|
||||
return MediaTypeToken.TV_SHOW.value
|
||||
if any([quality, source, codec, year]):
|
||||
return MediaTypeToken.MOVIE.value
|
||||
return MediaTypeToken.UNKNOWN.value
|
||||
|
||||
|
||||
def _is_well_formed(name: str, kb: ReleaseKnowledge) -> bool:
|
||||
"""Return True if name contains no forbidden characters per scene naming rules.
|
||||
|
||||
Characters listed as token separators (spaces, brackets, parens, …) are NOT
|
||||
considered malforming — the tokenizer handles them. Only truly broken chars
|
||||
like '@', '#', '!', '%' make a name malformed.
|
||||
"""
|
||||
tokenizable = set(kb.separators)
|
||||
return not any(c in name for c in kb.forbidden_chars if c not in tokenizable)
|
||||
|
||||
|
||||
def _strip_site_tag(name: str) -> tuple[str, str | None]:
|
||||
"""
|
||||
Strip a site watermark tag from the release name and return (clean_name, tag).
|
||||
|
||||
Handles two positions:
|
||||
- Prefix: "[ OxTorrent.vc ] The.Title.S01..."
|
||||
- Suffix: "The.Title.S01...-NTb[TGx]"
|
||||
|
||||
Anything between [...] is treated as a site tag.
|
||||
Returns (original_name, None) if no tag found.
|
||||
"""
|
||||
s = name.strip()
|
||||
|
||||
if s.startswith("["):
|
||||
close = s.find("]")
|
||||
if close != -1:
|
||||
tag = s[1:close].strip()
|
||||
remainder = s[close + 1 :].strip()
|
||||
if tag and remainder:
|
||||
return remainder, tag
|
||||
|
||||
if s.endswith("]"):
|
||||
open_bracket = s.rfind("[")
|
||||
if open_bracket != -1:
|
||||
tag = s[open_bracket + 1 : -1].strip()
|
||||
remainder = s[:open_bracket].strip()
|
||||
if tag and remainder:
|
||||
return remainder, tag
|
||||
|
||||
return s, None
|
||||
|
||||
|
||||
def _parse_season_episode(tok: str) -> tuple[int, int | None, int | None] | None:
|
||||
"""
|
||||
Parse a single token as a season/episode marker.
|
||||
|
||||
Handles:
|
||||
- SxxExx / SxxExxExx / Sxx (canonical scene form)
|
||||
- NxNN / NxNNxNN (alt form: 1x05, 12x07x08)
|
||||
|
||||
Returns (season, episode, episode_end) or None if not a season token.
|
||||
"""
|
||||
upper = tok.upper()
|
||||
|
||||
# SxxExx form
|
||||
if len(upper) >= 3 and upper[0] == "S" and upper[1:3].isdigit():
|
||||
season = int(upper[1:3])
|
||||
rest = upper[3:]
|
||||
|
||||
if not rest:
|
||||
return season, None, None
|
||||
|
||||
episodes: list[int] = []
|
||||
while rest.startswith("E") and len(rest) >= 3 and rest[1:3].isdigit():
|
||||
episodes.append(int(rest[1:3]))
|
||||
rest = rest[3:]
|
||||
|
||||
if not episodes:
|
||||
return None # malformed token like "S03XYZ"
|
||||
|
||||
return season, episodes[0], episodes[1] if len(episodes) >= 2 else None
|
||||
|
||||
# NxNN form — split on "X" (uppercased), all parts must be digits
|
||||
if "X" in upper:
|
||||
parts = upper.split("X")
|
||||
if len(parts) >= 2 and all(p.isdigit() and p for p in parts):
|
||||
season = int(parts[0])
|
||||
episode = int(parts[1])
|
||||
episode_end = int(parts[2]) if len(parts) >= 3 else None
|
||||
return season, episode, episode_end
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def _extract_season_episode(
|
||||
tokens: list[str],
|
||||
) -> tuple[int | None, int | None, int | None]:
|
||||
for tok in tokens:
|
||||
parsed = _parse_season_episode(tok)
|
||||
if parsed is not None:
|
||||
return parsed
|
||||
return None, None, None
|
||||
|
||||
|
||||
def _extract_tech(
|
||||
tokens: list[str],
|
||||
kb: ReleaseKnowledge,
|
||||
) -> tuple[str | None, str | None, str | None, str, set[str]]:
|
||||
"""
|
||||
Extract quality, source, codec, group from tokens.
|
||||
|
||||
Returns (quality, source, codec, group, tech_token_set).
|
||||
|
||||
Group extraction strategy (in priority order):
|
||||
1. Token where prefix is a known codec: x265-GROUP
|
||||
2. Rightmost token with a dash that isn't a known source
|
||||
"""
|
||||
quality: str | None = None
|
||||
source: str | None = None
|
||||
codec: str | None = None
|
||||
group = "UNKNOWN"
|
||||
tech_tokens: set[str] = set()
|
||||
|
||||
for tok in tokens:
|
||||
tl = tok.lower()
|
||||
|
||||
if tl in kb.resolutions:
|
||||
quality = tok
|
||||
tech_tokens.add(tok)
|
||||
continue
|
||||
|
||||
if tl in kb.sources:
|
||||
source = tok
|
||||
tech_tokens.add(tok)
|
||||
continue
|
||||
|
||||
if "-" in tok:
|
||||
parts = tok.rsplit("-", 1)
|
||||
# codec-GROUP (highest priority for group)
|
||||
if parts[0].lower() in kb.codecs:
|
||||
codec = parts[0]
|
||||
group = parts[1] if parts[1] else "UNKNOWN"
|
||||
tech_tokens.add(tok)
|
||||
continue
|
||||
# source with dash: Web-DL, WEB-DL, etc.
|
||||
if parts[0].lower() in kb.sources or tok.lower().replace("-", "") in kb.sources:
|
||||
source = tok
|
||||
tech_tokens.add(tok)
|
||||
continue
|
||||
|
||||
if tl in kb.codecs:
|
||||
codec = tok
|
||||
tech_tokens.add(tok)
|
||||
|
||||
# Fallback: rightmost token with a dash that isn't a known source
|
||||
if group == "UNKNOWN":
|
||||
for tok in reversed(tokens):
|
||||
if "-" in tok:
|
||||
parts = tok.rsplit("-", 1)
|
||||
tl = tok.lower()
|
||||
if tl in kb.sources or tok.lower().replace("-", "") in kb.sources:
|
||||
continue
|
||||
if parts[1]:
|
||||
group = parts[1]
|
||||
break
|
||||
|
||||
return quality, source, codec, group, tech_tokens
|
||||
|
||||
|
||||
def _is_year_token(tok: str) -> bool:
|
||||
"""Return True if tok is a 4-digit year between 1900 and 2099."""
|
||||
return len(tok) == 4 and tok.isdigit() and 1900 <= int(tok) <= 2099
|
||||
|
||||
|
||||
def _extract_title(
|
||||
tokens: list[str], tech_tokens: set[str], kb: ReleaseKnowledge
|
||||
) -> str:
|
||||
"""Extract the title portion: everything before the first season/year/tech token."""
|
||||
title_parts = []
|
||||
known_tech = kb.resolutions | kb.sources | kb.codecs
|
||||
for tok in tokens:
|
||||
if _parse_season_episode(tok) is not None:
|
||||
break
|
||||
if _is_year_token(tok):
|
||||
break
|
||||
if tok in tech_tokens or tok.lower() in known_tech:
|
||||
break
|
||||
if "-" in tok and any(p.lower() in kb.codecs | kb.sources for p in tok.split("-")):
|
||||
break
|
||||
title_parts.append(tok)
|
||||
|
||||
return ".".join(title_parts) if title_parts else tokens[0]
|
||||
|
||||
|
||||
def _extract_year(tokens: list[str], title: str) -> int | None:
|
||||
"""Extract a 4-digit year from tokens (only after the title)."""
|
||||
title_len = len(title.split("."))
|
||||
for tok in tokens[title_len:]:
|
||||
if _is_year_token(tok):
|
||||
return int(tok)
|
||||
return None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Sequence matcher
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _match_sequences(
|
||||
tokens: list[str],
|
||||
sequences: list[dict],
|
||||
key: str,
|
||||
) -> tuple[str | None, set[str]]:
|
||||
"""
|
||||
Try to match multi-token sequences against consecutive tokens.
|
||||
|
||||
Returns (matched_value, set_of_matched_tokens) or (None, empty_set).
|
||||
Sequences must be ordered most-specific first in the YAML.
|
||||
"""
|
||||
upper_tokens = [t.upper() for t in tokens]
|
||||
for seq in sequences:
|
||||
seq_upper = [s.upper() for s in seq["tokens"]]
|
||||
n = len(seq_upper)
|
||||
for i in range(len(upper_tokens) - n + 1):
|
||||
if upper_tokens[i : i + n] == seq_upper:
|
||||
matched = set(tokens[i : i + n])
|
||||
return seq[key], matched
|
||||
return None, set()
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Language extraction
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _extract_languages(
|
||||
tokens: list[str], kb: ReleaseKnowledge
|
||||
) -> tuple[list[str], set[str]]:
|
||||
"""Extract language tokens. Returns (languages, matched_token_set)."""
|
||||
languages = []
|
||||
lang_tokens: set[str] = set()
|
||||
for tok in tokens:
|
||||
if tok.upper() in kb.language_tokens:
|
||||
languages.append(tok.upper())
|
||||
lang_tokens.add(tok)
|
||||
return languages, lang_tokens
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Audio extraction
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _extract_audio(
|
||||
tokens: list[str], kb: ReleaseKnowledge,
|
||||
) -> tuple[str | None, str | None, set[str]]:
|
||||
"""
|
||||
Extract audio codec and channel layout.
|
||||
|
||||
Returns (audio_codec, audio_channels, matched_token_set).
|
||||
Sequences are tried first (DTS.HD.MA, TrueHD.Atmos, …), then single tokens.
|
||||
"""
|
||||
audio_codec: str | None = None
|
||||
audio_channels: str | None = None
|
||||
audio_tokens: set[str] = set()
|
||||
|
||||
known_codecs = {c.upper() for c in kb.audio.get("codecs", [])}
|
||||
known_channels = set(kb.audio.get("channels", []))
|
||||
|
||||
# Try multi-token sequences first
|
||||
matched_codec, matched_set = _match_sequences(
|
||||
tokens, kb.audio.get("sequences", []), "codec"
|
||||
)
|
||||
if matched_codec:
|
||||
audio_codec = matched_codec
|
||||
audio_tokens |= matched_set
|
||||
|
||||
# Channel layouts like "5.1" or "7.1" are split into two tokens by normalize —
|
||||
# detect them as consecutive pairs "X" + "Y" where "X.Y" is a known channel.
|
||||
# The second token may have a "-GROUP" suffix (e.g. "1-KTH" → strip it).
|
||||
for i in range(len(tokens) - 1):
|
||||
second = tokens[i + 1].split("-")[0]
|
||||
candidate = f"{tokens[i]}.{second}"
|
||||
if candidate in known_channels and audio_channels is None:
|
||||
audio_channels = candidate
|
||||
audio_tokens.add(tokens[i])
|
||||
audio_tokens.add(tokens[i + 1])
|
||||
|
||||
for tok in tokens:
|
||||
if tok in audio_tokens:
|
||||
continue
|
||||
if tok.upper() in known_codecs and audio_codec is None:
|
||||
audio_codec = tok
|
||||
audio_tokens.add(tok)
|
||||
elif tok in known_channels and audio_channels is None:
|
||||
audio_channels = tok
|
||||
audio_tokens.add(tok)
|
||||
|
||||
return audio_codec, audio_channels, audio_tokens
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Video metadata extraction (bit depth, HDR)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _extract_video_meta(
|
||||
tokens: list[str], kb: ReleaseKnowledge,
|
||||
) -> tuple[str | None, str | None, set[str]]:
|
||||
"""
|
||||
Extract bit depth and HDR format.
|
||||
|
||||
Returns (bit_depth, hdr_format, matched_token_set).
|
||||
"""
|
||||
bit_depth: str | None = None
|
||||
hdr_format: str | None = None
|
||||
video_tokens: set[str] = set()
|
||||
|
||||
known_hdr = {h.upper() for h in kb.video_meta.get("hdr", [])} | kb.hdr_extra
|
||||
known_depth = {d.lower() for d in kb.video_meta.get("bit_depth", [])}
|
||||
|
||||
# Try HDR sequences first
|
||||
matched_hdr, matched_set = _match_sequences(
|
||||
tokens, kb.video_meta.get("sequences", []), "hdr"
|
||||
)
|
||||
if matched_hdr:
|
||||
hdr_format = matched_hdr
|
||||
video_tokens |= matched_set
|
||||
|
||||
for tok in tokens:
|
||||
if tok in video_tokens:
|
||||
continue
|
||||
if tok.upper() in known_hdr and hdr_format is None:
|
||||
hdr_format = tok.upper()
|
||||
video_tokens.add(tok)
|
||||
elif tok.lower() in known_depth and bit_depth is None:
|
||||
bit_depth = tok.lower()
|
||||
video_tokens.add(tok)
|
||||
|
||||
return bit_depth, hdr_format, video_tokens
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Edition extraction
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _extract_edition(
|
||||
tokens: list[str], kb: ReleaseKnowledge
|
||||
) -> tuple[str | None, set[str]]:
|
||||
"""
|
||||
Extract release edition (UNRATED, EXTENDED, DIRECTORS.CUT, …).
|
||||
|
||||
Returns (edition, matched_token_set).
|
||||
"""
|
||||
known_tokens = {t.upper() for t in kb.editions.get("tokens", [])}
|
||||
|
||||
# Try multi-token sequences first
|
||||
matched_edition, matched_set = _match_sequences(
|
||||
tokens, kb.editions.get("sequences", []), "edition"
|
||||
)
|
||||
if matched_edition:
|
||||
return matched_edition, matched_set
|
||||
|
||||
for tok in tokens:
|
||||
if tok.upper() in known_tokens:
|
||||
return tok.upper(), {tok}
|
||||
|
||||
return None, set()
|
||||
@@ -0,0 +1,38 @@
|
||||
"""Filesystem release aggregates — what the user owns on disk.
|
||||
|
||||
This bounded context is intentionally separated from
|
||||
``alfred.domain.tv_shows`` / ``alfred.domain.movies`` (TMDB identity).
|
||||
A :class:`SeriesRelease` describes the physical files on disk for one
|
||||
show; a :class:`TVShow` describes the work as catalogued by TMDB. The
|
||||
two are linked by :class:`~alfred.domain.shared.value_objects.TmdbId`
|
||||
in the persistence layer, never by direct reference.
|
||||
|
||||
Not to be confused with ``alfred.domain.release`` (singular) which
|
||||
parses release **names** (strings → tokens). The two packages may be
|
||||
merged later; for now they coexist as separate concerns.
|
||||
"""
|
||||
|
||||
from .builders import SeasonReleaseBuilder, SeriesReleaseBuilder
|
||||
from .entities import (
|
||||
EpisodeRelease,
|
||||
MovieRelease,
|
||||
SeasonRelease,
|
||||
SeriesRelease,
|
||||
TrackProfile,
|
||||
)
|
||||
from .repositories import MovieReleaseRepository, SeriesReleaseRepository
|
||||
from .value_objects import EpisodeRange, ReleaseMode
|
||||
|
||||
__all__ = [
|
||||
"EpisodeRange",
|
||||
"EpisodeRelease",
|
||||
"MovieRelease",
|
||||
"MovieReleaseRepository",
|
||||
"ReleaseMode",
|
||||
"SeasonRelease",
|
||||
"SeasonReleaseBuilder",
|
||||
"SeriesRelease",
|
||||
"SeriesReleaseBuilder",
|
||||
"SeriesReleaseRepository",
|
||||
"TrackProfile",
|
||||
]
|
||||
@@ -0,0 +1,243 @@
|
||||
"""Builders for the filesystem release aggregates.
|
||||
|
||||
The aggregates are frozen — :class:`SeriesRelease`, :class:`SeasonRelease`,
|
||||
and :class:`EpisodeRelease` are ``@dataclass(frozen=True)`` and offer no
|
||||
mutation methods. All construction goes through these builders, which
|
||||
assemble the aggregate piece by piece and emit a frozen instance via
|
||||
``build()``.
|
||||
|
||||
Typical usage during a filesystem walk::
|
||||
|
||||
builder = SeriesReleaseBuilder(tmdb_id=TmdbId(84958), imdb_id=ImdbId("tt0804484"))
|
||||
sb = builder.season_builder(SeasonNumber(1), folder="Show.S01", mode=ReleaseMode.PACK)
|
||||
sb.add_episode(EpisodeRelease(
|
||||
episodes=EpisodeRange(EpisodeNumber(1), EpisodeNumber(1)),
|
||||
file_path=FilePath("Show.S01/Show.S01E01.mkv"),
|
||||
tracks=TrackProfile(),
|
||||
))
|
||||
release = builder.build()
|
||||
|
||||
Builders are **single-use scratchpads**: they hold mutable state during
|
||||
construction, then produce an immutable aggregate.
|
||||
|
||||
Invariants enforced at ``build()`` time:
|
||||
|
||||
* Seasons are emitted sorted by ``season_number``.
|
||||
* Episodes within each season are emitted sorted by their
|
||||
``EpisodeRange.start`` (so a season with ``E01-E03`` + ``E04`` is
|
||||
emitted in that order).
|
||||
* No two ``EpisodeRelease`` within a season may overlap (same TMDB
|
||||
episode covered by two distinct files) — raises ``ValidationError``.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from ..shared_TO_CHECK.exceptions import ValidationError
|
||||
from ..shared_TO_CHECK.value_objects import ImdbId, TmdbId
|
||||
from ..tv_shows.value_objects import SeasonNumber
|
||||
from .entities import (
|
||||
EpisodeRelease,
|
||||
SeasonRelease,
|
||||
SeriesRelease,
|
||||
)
|
||||
from .value_objects import ReleaseMode
|
||||
|
||||
|
||||
# ════════════════════════════════════════════════════════════════════════════
|
||||
# MovieReleaseBuilder
|
||||
# ════════════════════════════════════════════════════════════════════════════
|
||||
# ...
|
||||
|
||||
|
||||
|
||||
|
||||
# ════════════════════════════════════════════════════════════════════════════
|
||||
# SeasonReleaseBuilder
|
||||
# ════════════════════════════════════════════════════════════════════════════
|
||||
|
||||
|
||||
class SeasonReleaseBuilder:
|
||||
"""
|
||||
Mutable scratchpad for a :class:`SeasonRelease`.
|
||||
|
||||
Episodes are appended in arbitrary order; ``build()`` sorts them by
|
||||
their range start before emitting the frozen aggregate and verifies
|
||||
there are no overlapping ranges.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
season_number: SeasonNumber | int,
|
||||
*,
|
||||
folder: str,
|
||||
mode: ReleaseMode,
|
||||
) -> None:
|
||||
if isinstance(season_number, int):
|
||||
season_number = SeasonNumber(season_number)
|
||||
self._season_number: SeasonNumber = season_number
|
||||
self._folder: str = folder
|
||||
self._mode: ReleaseMode = mode
|
||||
self._episodes: list[EpisodeRelease] = []
|
||||
|
||||
@classmethod
|
||||
def from_existing(cls, season: SeasonRelease) -> SeasonReleaseBuilder:
|
||||
"""Seed a builder from an existing frozen :class:`SeasonRelease`."""
|
||||
builder = cls(
|
||||
season.season_number,
|
||||
folder=season.folder,
|
||||
mode=season.mode,
|
||||
)
|
||||
builder._episodes = list(season.episodes)
|
||||
return builder
|
||||
|
||||
@property
|
||||
def season_number(self) -> SeasonNumber:
|
||||
return self._season_number
|
||||
|
||||
@property
|
||||
def mode(self) -> ReleaseMode:
|
||||
return self._mode
|
||||
|
||||
def set_folder(self, folder: str) -> SeasonReleaseBuilder:
|
||||
self._folder = folder
|
||||
return self
|
||||
|
||||
def set_mode(self, mode: ReleaseMode) -> SeasonReleaseBuilder:
|
||||
self._mode = mode
|
||||
return self
|
||||
|
||||
def add_episode(self, episode: EpisodeRelease) -> SeasonReleaseBuilder:
|
||||
"""Append a physical-file :class:`EpisodeRelease` to this season."""
|
||||
self._episodes.append(episode)
|
||||
return self
|
||||
|
||||
def build(self) -> SeasonRelease:
|
||||
"""Emit a frozen :class:`SeasonRelease` with episodes sorted.
|
||||
|
||||
Raises :class:`ValidationError` if any two episode ranges overlap
|
||||
(same TMDB slot claimed by two distinct files).
|
||||
"""
|
||||
ordered = tuple(
|
||||
sorted(self._episodes, key=lambda ep: ep.episodes.start.value)
|
||||
)
|
||||
# Overlap check — ranges are inclusive on both ends, sorted by start.
|
||||
for prev, curr in zip(ordered, ordered[1:], strict=False):
|
||||
if curr.episodes.start.value <= prev.episodes.end.value:
|
||||
raise ValidationError(
|
||||
f"SeasonRelease season {self._season_number}: overlapping "
|
||||
f"episode ranges {prev.episodes} and {curr.episodes}"
|
||||
)
|
||||
return SeasonRelease(
|
||||
season_number=self._season_number,
|
||||
folder=self._folder,
|
||||
mode=self._mode,
|
||||
episodes=ordered,
|
||||
)
|
||||
|
||||
|
||||
# ════════════════════════════════════════════════════════════════════════════
|
||||
# SeriesReleaseBuilder
|
||||
# ════════════════════════════════════════════════════════════════════════════
|
||||
|
||||
|
||||
class SeriesReleaseBuilder:
|
||||
"""
|
||||
Mutable scratchpad for the :class:`SeriesRelease` aggregate root.
|
||||
|
||||
Seasons are tracked via internal :class:`SeasonReleaseBuilder`
|
||||
instances keyed by :class:`SeasonNumber`.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
tmdb_id: TmdbId | int,
|
||||
imdb_id: ImdbId | str | None = None,
|
||||
) -> None:
|
||||
if isinstance(tmdb_id, int):
|
||||
tmdb_id = TmdbId(tmdb_id)
|
||||
if isinstance(imdb_id, str):
|
||||
imdb_id = ImdbId(imdb_id)
|
||||
self._tmdb_id: TmdbId = tmdb_id
|
||||
self._imdb_id: ImdbId | None = imdb_id
|
||||
self._season_builders: dict[SeasonNumber, SeasonReleaseBuilder] = {}
|
||||
|
||||
@classmethod
|
||||
def from_existing(cls, release: SeriesRelease) -> SeriesReleaseBuilder:
|
||||
"""Seed a builder from an existing frozen :class:`SeriesRelease`."""
|
||||
builder = cls(
|
||||
tmdb_id=release.tmdb_id,
|
||||
imdb_id=release.imdb_id,
|
||||
)
|
||||
for season in release.seasons:
|
||||
builder._season_builders[season.season_number] = (
|
||||
SeasonReleaseBuilder.from_existing(season)
|
||||
)
|
||||
return builder
|
||||
|
||||
# ── Top-level mutators ─────────────────────────────────────────────────
|
||||
|
||||
def set_imdb_id(self, imdb_id: ImdbId | str | None) -> SeriesReleaseBuilder:
|
||||
if isinstance(imdb_id, str):
|
||||
imdb_id = ImdbId(imdb_id)
|
||||
self._imdb_id = imdb_id
|
||||
return self
|
||||
|
||||
# ── Content ────────────────────────────────────────────────────────────
|
||||
|
||||
def season_builder(
|
||||
self,
|
||||
season_number: SeasonNumber | int,
|
||||
*,
|
||||
folder: str | None = None,
|
||||
mode: ReleaseMode | None = None,
|
||||
) -> SeasonReleaseBuilder:
|
||||
"""
|
||||
Return (creating if needed) the :class:`SeasonReleaseBuilder` for a
|
||||
season.
|
||||
|
||||
``folder`` and ``mode`` are required when the builder does not yet
|
||||
exist for this season; subsequent calls may pass them to override.
|
||||
"""
|
||||
if isinstance(season_number, int):
|
||||
season_number = SeasonNumber(season_number)
|
||||
sb = self._season_builders.get(season_number)
|
||||
if sb is None:
|
||||
if folder is None or mode is None:
|
||||
raise ValidationError(
|
||||
f"season_builder({season_number}): folder and mode "
|
||||
f"are required to create a new season builder"
|
||||
)
|
||||
sb = SeasonReleaseBuilder(season_number, folder=folder, mode=mode)
|
||||
self._season_builders[season_number] = sb
|
||||
else:
|
||||
if folder is not None:
|
||||
sb.set_folder(folder)
|
||||
if mode is not None:
|
||||
sb.set_mode(mode)
|
||||
return sb
|
||||
|
||||
def add_season(self, season: SeasonRelease) -> SeriesReleaseBuilder:
|
||||
"""
|
||||
Attach (or replace) a fully-built :class:`SeasonRelease`.
|
||||
|
||||
Replaces any existing season with the same number.
|
||||
"""
|
||||
self._season_builders[season.season_number] = (
|
||||
SeasonReleaseBuilder.from_existing(season)
|
||||
)
|
||||
return self
|
||||
|
||||
# ── Emit ───────────────────────────────────────────────────────────────
|
||||
|
||||
def build(self) -> SeriesRelease:
|
||||
"""Emit a frozen :class:`SeriesRelease` with seasons sorted by number."""
|
||||
ordered_seasons = tuple(
|
||||
self._season_builders[n].build()
|
||||
for n in sorted(self._season_builders, key=lambda x: x.value)
|
||||
)
|
||||
return SeriesRelease(
|
||||
tmdb_id=self._tmdb_id,
|
||||
imdb_id=self._imdb_id,
|
||||
seasons=ordered_seasons,
|
||||
)
|
||||
@@ -0,0 +1,217 @@
|
||||
"""Filesystem release aggregates.
|
||||
|
||||
The release domain models what the user owns on disk — one
|
||||
:class:`SeriesRelease` per show, one :class:`MovieRelease` per movie.
|
||||
TMDB identity (title, status, episode_count, …) lives in the
|
||||
``tv_shows`` / ``movies`` domains and is linked via the
|
||||
:class:`~alfred.domain.shared.value_objects.TmdbId` natural key.
|
||||
|
||||
All entities are frozen. Mutation goes through the builders in
|
||||
:mod:`alfred.domain.releases.builders`.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from datetime import datetime
|
||||
|
||||
from ..shared_TO_CHECK.exceptions import ValidationError
|
||||
from ..shared_TO_CHECK.media import AudioTrack, SubtitleTrack
|
||||
from ..shared_TO_CHECK.value_objects import FilePath, ImdbId, TmdbId
|
||||
from ..tv_shows.value_objects import SeasonNumber
|
||||
from .value_objects import EpisodeRange, ReleaseMode
|
||||
|
||||
__all__ = [
|
||||
"EpisodeRelease",
|
||||
"MovieRelease",
|
||||
"SeasonRelease",
|
||||
"SeriesRelease",
|
||||
"TrackProfile",
|
||||
]
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class TrackProfile:
|
||||
"""
|
||||
Audio + subtitle tracks of one physical file.
|
||||
|
||||
Tracks live per-file (not per-season): every ``EpisodeRelease`` and
|
||||
``MovieRelease`` carries its own ``TrackProfile``. Season-level
|
||||
aggregation is computed by the caller when needed.
|
||||
"""
|
||||
|
||||
audio_tracks: tuple[AudioTrack, ...] = ()
|
||||
subtitle_tracks: tuple[SubtitleTrack, ...] = ()
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class EpisodeRelease:
|
||||
"""
|
||||
One physical episode file (or multi-episode file) on disk.
|
||||
|
||||
:attr:`episodes` is an :class:`EpisodeRange` — a single ``.mkv``
|
||||
that covers ``S01E02E03`` carries ``EpisodeRange(start=E02, end=E03)``
|
||||
and is recorded once. The library index lists it under each covered
|
||||
slot (``E02``, ``E03``) for symmetric lookups.
|
||||
|
||||
:attr:`file_path` is **relative to the show root** (e.g.
|
||||
``"Show.S01/Show.S01E02.mkv"`` for PACK,
|
||||
``"Show.S01/Show.S01E02-RG/Show.S01E02-RG.mkv"`` for EPISODIC).
|
||||
The caller (repository) prepends the absolute show root when
|
||||
needed.
|
||||
"""
|
||||
|
||||
episodes: EpisodeRange
|
||||
file_path: FilePath
|
||||
tracks: TrackProfile = TrackProfile()
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class SeasonRelease:
|
||||
"""
|
||||
All physical files on disk for one season of a show.
|
||||
|
||||
The :attr:`mode` flag records the filesystem layout:
|
||||
|
||||
* :attr:`ReleaseMode.PACK` — the season folder contains N video
|
||||
files directly. ``episodes`` lists each ``.mkv`` in the folder.
|
||||
* :attr:`ReleaseMode.EPISODIC` — the season folder contains N
|
||||
sub-folders, each with one episode. ``episodes`` lists each
|
||||
``(subfolder, file)`` pair.
|
||||
|
||||
:attr:`folder` is the season folder name, relative to the show root.
|
||||
|
||||
Invariant: every ``EpisodeRelease.episodes`` range stays within
|
||||
sane bounds (validated at construction). Cross-episode duplicate
|
||||
detection (two files claiming the same TMDB slot) is the
|
||||
builder's job, not the entity's.
|
||||
"""
|
||||
|
||||
season_number: SeasonNumber
|
||||
folder: str
|
||||
mode: ReleaseMode
|
||||
episodes: tuple[EpisodeRelease, ...] = ()
|
||||
|
||||
def __post_init__(self) -> None:
|
||||
if not isinstance(self.season_number, SeasonNumber):
|
||||
raise ValidationError(
|
||||
f"SeasonRelease.season_number must be SeasonNumber, "
|
||||
f"got {type(self.season_number)}"
|
||||
)
|
||||
if not isinstance(self.mode, ReleaseMode):
|
||||
raise ValidationError(
|
||||
f"SeasonRelease.mode must be ReleaseMode, got {type(self.mode)}"
|
||||
)
|
||||
if not isinstance(self.folder, str) or not self.folder:
|
||||
raise ValidationError(
|
||||
f"SeasonRelease.folder must be a non-empty string, "
|
||||
f"got {self.folder!r}"
|
||||
)
|
||||
|
||||
def episode_count(self) -> int:
|
||||
"""
|
||||
Total number of TMDB episode slots covered by all physical files.
|
||||
|
||||
Sums each :meth:`EpisodeRange.count` — a season with two files
|
||||
``E01`` + ``E02-E03`` returns ``3`` (one slot from the first
|
||||
file, two from the second).
|
||||
|
||||
Compared by the caller against the library index's TMDB
|
||||
``episode_count`` to detect incomplete seasons.
|
||||
"""
|
||||
return sum(ep.episodes.count() for ep in self.episodes)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class SeriesRelease:
|
||||
"""
|
||||
All physical seasons on disk for one show.
|
||||
|
||||
Anchored to TMDB by :attr:`tmdb_id` (primary key). :attr:`imdb_id`
|
||||
is optional and stored as a secondary anchor — useful for the
|
||||
occasional show without TMDB coverage, and for cross-checking
|
||||
when both ids are known.
|
||||
|
||||
Seasons are exposed sorted by ``season_number`` (the builder
|
||||
enforces this on emit). No duplicate ``season_number`` is
|
||||
permitted across :attr:`seasons`.
|
||||
"""
|
||||
|
||||
tmdb_id: TmdbId
|
||||
imdb_id: ImdbId | None
|
||||
seasons: tuple[SeasonRelease, ...] = ()
|
||||
|
||||
def __post_init__(self) -> None:
|
||||
if not isinstance(self.tmdb_id, TmdbId):
|
||||
raise ValidationError(
|
||||
f"SeriesRelease.tmdb_id must be TmdbId, got {type(self.tmdb_id)}"
|
||||
)
|
||||
if self.imdb_id is not None and not isinstance(self.imdb_id, ImdbId):
|
||||
raise ValidationError(
|
||||
f"SeriesRelease.imdb_id must be ImdbId or None, "
|
||||
f"got {type(self.imdb_id)}"
|
||||
)
|
||||
seen: set[int] = set()
|
||||
for s in self.seasons:
|
||||
if s.season_number.value in seen:
|
||||
raise ValidationError(
|
||||
f"SeriesRelease has duplicate season "
|
||||
f"{s.season_number}"
|
||||
)
|
||||
seen.add(s.season_number.value)
|
||||
|
||||
def get_season(self, season_number: SeasonNumber) -> SeasonRelease | None:
|
||||
"""Return the :class:`SeasonRelease` for ``season_number`` or ``None``."""
|
||||
for s in self.seasons:
|
||||
if s.season_number == season_number:
|
||||
return s
|
||||
return None
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class MovieRelease:
|
||||
"""
|
||||
A single physical movie file on disk.
|
||||
|
||||
Anchored to TMDB by :attr:`tmdb_id`; :attr:`imdb_id` optional
|
||||
secondary anchor.
|
||||
|
||||
:attr:`folder` is the movie folder name relative to the
|
||||
``movies/`` library root. :attr:`file_path` is the video file
|
||||
name relative to the folder (movies are one folder, one file in
|
||||
Alfred's layout — no sub-folders).
|
||||
|
||||
:attr:`added_at` is the UTC timestamp at which the release was
|
||||
first observed in the library — set by the caller (organizer /
|
||||
rescan) when the aggregate is built. Persisted by the v2 movie
|
||||
sidecar; not derived from the filesystem (mtime drifts across
|
||||
moves and hard-links).
|
||||
"""
|
||||
|
||||
tmdb_id: TmdbId
|
||||
imdb_id: ImdbId | None
|
||||
folder: str
|
||||
file_path: FilePath
|
||||
added_at: datetime
|
||||
tracks: TrackProfile = TrackProfile()
|
||||
|
||||
def __post_init__(self) -> None:
|
||||
if not isinstance(self.tmdb_id, TmdbId):
|
||||
raise ValidationError(
|
||||
f"MovieRelease.tmdb_id must be TmdbId, got {type(self.tmdb_id)}"
|
||||
)
|
||||
if self.imdb_id is not None and not isinstance(self.imdb_id, ImdbId):
|
||||
raise ValidationError(
|
||||
f"MovieRelease.imdb_id must be ImdbId or None, "
|
||||
f"got {type(self.imdb_id)}"
|
||||
)
|
||||
if not isinstance(self.folder, str) or not self.folder:
|
||||
raise ValidationError(
|
||||
f"MovieRelease.folder must be a non-empty string, "
|
||||
f"got {self.folder!r}"
|
||||
)
|
||||
if not isinstance(self.added_at, datetime):
|
||||
raise ValidationError(
|
||||
f"MovieRelease.added_at must be datetime, "
|
||||
f"got {type(self.added_at)}"
|
||||
)
|
||||
@@ -0,0 +1,27 @@
|
||||
"""Release parser v2 — annotate-based pipeline.
|
||||
|
||||
This package is the future home of ``parse_release``. It restructures the
|
||||
parsing logic around a **tokenize → annotate → assemble** pipeline:
|
||||
|
||||
1. **tokenize**: split the release name into atomic tokens.
|
||||
2. **annotate**: walk tokens left-to-right, assigning each one a
|
||||
:class:`TokenRole` (TITLE, YEAR, SEASON, RESOLUTION, …) using the
|
||||
injected :class:`~alfred.domain.release.ports.knowledge.ReleaseKnowledge`.
|
||||
3. **assemble**: fold the annotated tokens into a :class:`ParsedRelease`.
|
||||
|
||||
The pipeline has three internal paths driven by the detected release group:
|
||||
|
||||
- **EASY**: known group (KONTRAST, RARBG, …) with a schema-driven layout
|
||||
declared in ``knowledge/release/release_groups/<group>.yaml``.
|
||||
- **SHITTY**: unknown group, best-effort matching against the global
|
||||
knowledge sets, with a 0-100 confidence score.
|
||||
- **PATH OF PAIN**: score below threshold OR critical chunks missing —
|
||||
signaled to the caller, who decides whether to involve the LLM/user.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from .schema import GroupSchema, SchemaChunk
|
||||
from .tokens import Token, TokenRole
|
||||
|
||||
__all__ = ["GroupSchema", "SchemaChunk", "Token", "TokenRole"]
|
||||
@@ -0,0 +1,762 @@
|
||||
"""Annotate-based pipeline.
|
||||
|
||||
Three stages:
|
||||
|
||||
1. :func:`tokenize` — release name → ``list[Token]`` (all UNKNOWN), plus
|
||||
a separately-returned site tag (e.g. ``[YTS.MX]``) that is never
|
||||
tokenized.
|
||||
2. :func:`annotate` — promote each token's :class:`TokenRole` using the
|
||||
injected knowledge base. Two sub-passes:
|
||||
|
||||
a. **Structural** (schema-driven, EASY only). Detects the group at
|
||||
the right end, looks up its :class:`GroupSchema`, then matches
|
||||
the schema's chunk sequence against the token stream. Between
|
||||
two structural chunks, any number of unmatched tokens may
|
||||
remain — they are left UNKNOWN for the enricher pass to handle.
|
||||
b. **Enrichers** (non-positional). Walks UNKNOWN tokens and tags
|
||||
audio / video-meta / edition / language roles. Multi-token
|
||||
sequences (``DTS.HD.MA``, ``DV.HDR10``, ``DIRECTORS.CUT``) are
|
||||
matched first, single tokens after.
|
||||
|
||||
3. :func:`assemble` — fold annotated tokens into a
|
||||
:class:`~alfred.domain.release.value_objects.ParsedRelease`-compatible
|
||||
dict.
|
||||
|
||||
The pipeline is **pure**: no I/O, no TMDB, no probe. All knowledge
|
||||
arrives through ``kb: ReleaseKnowledge``.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from ..ports.knowledge import ReleaseKnowledge
|
||||
from alfred.domain.releases_TO_CHECK.value_objects_old_question_mark import MediaTypeToken
|
||||
from .schema import GroupSchema
|
||||
from .tokens import Token, TokenRole
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Stage 1 — tokenize
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def strip_site_tag(name: str) -> tuple[str, str | None]:
|
||||
"""Split off a ``[site.tag]`` prefix or suffix.
|
||||
|
||||
Returns ``(clean_name, tag)``. If no tag is found, returns
|
||||
``(name.strip(), None)``.
|
||||
"""
|
||||
s = name.strip()
|
||||
|
||||
if s.startswith("["):
|
||||
close = s.find("]")
|
||||
if close != -1:
|
||||
tag = s[1:close].strip()
|
||||
remainder = s[close + 1 :].strip()
|
||||
if tag and remainder:
|
||||
return remainder, tag
|
||||
|
||||
if s.endswith("]"):
|
||||
open_bracket = s.rfind("[")
|
||||
if open_bracket != -1:
|
||||
tag = s[open_bracket + 1 : -1].strip()
|
||||
remainder = s[:open_bracket].strip()
|
||||
if tag and remainder:
|
||||
return remainder, tag
|
||||
|
||||
return s, None
|
||||
|
||||
|
||||
def tokenize(name: str, kb: ReleaseKnowledge) -> tuple[list[Token], str | None]:
|
||||
"""Split ``name`` into tokens after stripping any site tag.
|
||||
|
||||
String-ops style: replace every configured separator with a single
|
||||
NUL byte then split. NUL cannot legally appear in a release name, so
|
||||
it's a safe sentinel.
|
||||
"""
|
||||
clean, site_tag = strip_site_tag(name)
|
||||
|
||||
DELIM = "\x00"
|
||||
buf = clean
|
||||
for sep in kb.separators:
|
||||
if sep != DELIM:
|
||||
buf = buf.replace(sep, DELIM)
|
||||
|
||||
pieces = [p for p in buf.split(DELIM) if p]
|
||||
tokens = [Token(text=p, index=i) for i, p in enumerate(pieces)]
|
||||
return tokens, site_tag
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers shared across passes
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _parse_season_episode(text: str) -> tuple[int, int | None, int | None] | None:
|
||||
"""Parse a single token as ``SxxExx`` / ``SxxExxExx`` / ``Sxx`` /
|
||||
``Sxx-yy`` (season range) / ``NxNN``.
|
||||
|
||||
Returns ``(season, episode, episode_end)`` or ``None`` if the token
|
||||
is not a season/episode marker. For ``Sxx-yy``, returns the first
|
||||
season with no episode info — the caller is expected to detect the
|
||||
range form and promote ``media_type`` to ``tv_complete`` separately.
|
||||
"""
|
||||
upper = text.upper()
|
||||
|
||||
# SxxExx form (and Sxx, Sxx-yy)
|
||||
if len(upper) >= 3 and upper[0] == "S" and upper[1:3].isdigit():
|
||||
season = int(upper[1:3])
|
||||
rest = upper[3:]
|
||||
|
||||
if not rest:
|
||||
return season, None, None
|
||||
|
||||
# Sxx-yy season-range form: capture the first season, treat as a
|
||||
# complete-series marker (no episode info).
|
||||
if (
|
||||
len(rest) == 3
|
||||
and rest[0] == "-"
|
||||
and rest[1:3].isdigit()
|
||||
):
|
||||
return season, None, None
|
||||
|
||||
episodes: list[int] = []
|
||||
while rest.startswith("E") and len(rest) >= 3 and rest[1:3].isdigit():
|
||||
episodes.append(int(rest[1:3]))
|
||||
rest = rest[3:]
|
||||
|
||||
if not episodes:
|
||||
return None
|
||||
# For chained multi-episode markers (E09E10E11), the range is the
|
||||
# first → last episode. Intermediate values are implied.
|
||||
return season, episodes[0], episodes[-1] if len(episodes) >= 2 else None
|
||||
|
||||
# NxNN form
|
||||
if "X" in upper:
|
||||
parts = upper.split("X")
|
||||
if len(parts) >= 2 and all(p.isdigit() and p for p in parts):
|
||||
season = int(parts[0])
|
||||
episode = int(parts[1])
|
||||
episode_end = int(parts[2]) if len(parts) >= 3 else None
|
||||
return season, episode, episode_end
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def _is_year(text: str) -> bool:
|
||||
"""Return True if ``text`` is a 4-digit year in [1900, 2099]."""
|
||||
return len(text) == 4 and text.isdigit() and 1900 <= int(text) <= 2099
|
||||
|
||||
|
||||
def _split_codec_group(text: str, kb: ReleaseKnowledge) -> tuple[str, str] | None:
|
||||
"""Split a ``codec-GROUP`` token into ``(codec, group)`` if it fits.
|
||||
|
||||
Returns ``None`` if the token doesn't match the ``codec-GROUP``
|
||||
shape. Handles the empty-group case (``x265-``) as ``(codec, "")``.
|
||||
"""
|
||||
if "-" not in text:
|
||||
return None
|
||||
head, _, tail = text.rpartition("-")
|
||||
if head.lower() in kb.codecs:
|
||||
return head, tail
|
||||
return None
|
||||
|
||||
|
||||
def _match_role(text: str, role: TokenRole, kb: ReleaseKnowledge) -> TokenRole | None:
|
||||
"""Return ``role`` if ``text`` matches it under ``kb``, else ``None``."""
|
||||
lower = text.lower()
|
||||
|
||||
if role is TokenRole.YEAR:
|
||||
return TokenRole.YEAR if _is_year(text) else None
|
||||
|
||||
if role is TokenRole.SEASON_EPISODE:
|
||||
return (
|
||||
TokenRole.SEASON_EPISODE
|
||||
if _parse_season_episode(text) is not None
|
||||
else None
|
||||
)
|
||||
|
||||
if role is TokenRole.RESOLUTION:
|
||||
return TokenRole.RESOLUTION if lower in kb.resolutions else None
|
||||
|
||||
if role is TokenRole.SOURCE:
|
||||
return TokenRole.SOURCE if lower in kb.sources else None
|
||||
|
||||
if role is TokenRole.CODEC:
|
||||
return TokenRole.CODEC if lower in kb.codecs else None
|
||||
|
||||
return None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Stage 2a — group detection
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _detect_group(tokens: list[Token], kb: ReleaseKnowledge) -> tuple[str, int | None]:
|
||||
"""Identify the release group by walking tokens right-to-left.
|
||||
|
||||
Returns ``(group_name, token_index_carrying_group)``. ``index`` is
|
||||
``None`` when the group is absent (no trailing ``-`` in the stream).
|
||||
"""
|
||||
# Priority 1: codec-GROUP shape (clearest signal).
|
||||
for tok in reversed(tokens):
|
||||
split = _split_codec_group(tok.text, kb)
|
||||
if split is not None:
|
||||
_, group = split
|
||||
return (group or "UNKNOWN"), tok.index
|
||||
|
||||
# Priority 2: rightmost dash, excluding dashed sources (Web-DL, etc.).
|
||||
for tok in reversed(tokens):
|
||||
if "-" not in tok.text:
|
||||
continue
|
||||
head, _, tail = tok.text.rpartition("-")
|
||||
if (
|
||||
head.lower() in kb.sources
|
||||
or tok.text.lower().replace("-", "") in kb.sources
|
||||
):
|
||||
continue
|
||||
if tail:
|
||||
return tail, tok.index
|
||||
|
||||
return "UNKNOWN", None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Stage 2b — structural annotation (schema-driven)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _annotate_structural(
|
||||
tokens: list[Token],
|
||||
kb: ReleaseKnowledge,
|
||||
schema: GroupSchema,
|
||||
group_token_index: int,
|
||||
) -> list[Token] | None:
|
||||
"""Annotate structural tokens following a known group schema.
|
||||
|
||||
Walks the schema's chunks against the body (tokens up to the group
|
||||
token). For each chunk, scans forward in the body for a matching
|
||||
token — tokens passed over without match are left UNKNOWN (the
|
||||
enricher pass will handle them).
|
||||
|
||||
Returns ``None`` if any mandatory chunk fails to find a match.
|
||||
"""
|
||||
result = list(tokens)
|
||||
|
||||
# The codec-GROUP token carries CODEC + GROUP. Split it now so the
|
||||
# schema walk knows the codec is "pre-consumed" at the end.
|
||||
group_token = result[group_token_index]
|
||||
cg_split = _split_codec_group(group_token.text, kb)
|
||||
codec_pre_consumed = False
|
||||
if cg_split is not None:
|
||||
codec, group = cg_split
|
||||
result[group_token_index] = group_token.with_role(
|
||||
TokenRole.CODEC, codec=codec, group=group or "UNKNOWN"
|
||||
)
|
||||
codec_pre_consumed = True
|
||||
else:
|
||||
head, _, tail = group_token.text.rpartition("-")
|
||||
result[group_token_index] = group_token.with_role(
|
||||
TokenRole.GROUP, group=tail or "UNKNOWN", prefix=head
|
||||
)
|
||||
|
||||
body_end = group_token_index # exclusive
|
||||
tok_idx = 0
|
||||
chunk_idx = 0
|
||||
|
||||
# 1) TITLE — leftmost contiguous tokens up to the first structural
|
||||
# boundary. Title is special because it can be multi-token.
|
||||
while (
|
||||
chunk_idx < len(schema.chunks)
|
||||
and schema.chunks[chunk_idx].role is TokenRole.TITLE
|
||||
):
|
||||
title_end = _find_title_end(result, body_end, kb)
|
||||
for i in range(tok_idx, title_end):
|
||||
result[i] = result[i].with_role(TokenRole.TITLE)
|
||||
tok_idx = title_end
|
||||
chunk_idx += 1
|
||||
|
||||
# 2) Remaining structural chunks. For each, scan forward in the body
|
||||
# for a matching token; tokens passed over remain UNKNOWN.
|
||||
for chunk in schema.chunks[chunk_idx:]:
|
||||
if chunk.role is TokenRole.GROUP:
|
||||
continue
|
||||
if chunk.role is TokenRole.CODEC and codec_pre_consumed:
|
||||
continue
|
||||
|
||||
match_idx = _find_chunk(result, tok_idx, body_end, chunk.role, kb)
|
||||
if match_idx is None:
|
||||
if chunk.optional:
|
||||
continue
|
||||
return None
|
||||
|
||||
result[match_idx] = result[match_idx].with_role(chunk.role)
|
||||
tok_idx = match_idx + 1
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def _find_title_end(
|
||||
tokens: list[Token], body_end: int, kb: ReleaseKnowledge
|
||||
) -> int:
|
||||
"""Return the exclusive index where the title ends.
|
||||
|
||||
The title is the leftmost run of tokens whose text does not match
|
||||
any structural role (year, season/episode, resolution, source,
|
||||
codec). Enricher tokens (audio, HDR, language) are *not* boundaries
|
||||
because they can appear in the middle of the structural sequence;
|
||||
however, in canonical scene names they don't appear inside the title
|
||||
itself, so this heuristic holds in practice.
|
||||
"""
|
||||
for i in range(body_end):
|
||||
text = tokens[i].text
|
||||
if _parse_season_episode(text) is not None:
|
||||
return i
|
||||
if _is_year(text):
|
||||
return i
|
||||
lower = text.lower()
|
||||
if lower in kb.resolutions:
|
||||
return i
|
||||
if lower in kb.sources:
|
||||
return i
|
||||
if lower in kb.codecs:
|
||||
return i
|
||||
# codec-GROUP token (e.g. "x265-KONTRAST") or dashed source (Web-DL).
|
||||
if "-" in text:
|
||||
head, _, _ = text.rpartition("-")
|
||||
if (
|
||||
head.lower() in kb.codecs
|
||||
or head.lower() in kb.sources
|
||||
or text.lower().replace("-", "") in kb.sources
|
||||
):
|
||||
return i
|
||||
return body_end
|
||||
|
||||
|
||||
def _find_chunk(
|
||||
tokens: list[Token],
|
||||
start: int,
|
||||
end: int,
|
||||
role: TokenRole,
|
||||
kb: ReleaseKnowledge,
|
||||
) -> int | None:
|
||||
"""Return the first index in ``[start, end)`` whose token matches ``role``.
|
||||
|
||||
Returns ``None`` if no token in the range matches. Tokens already
|
||||
annotated (non-UNKNOWN) are skipped — they belong to another chunk.
|
||||
"""
|
||||
for i in range(start, end):
|
||||
if tokens[i].role is not TokenRole.UNKNOWN:
|
||||
continue
|
||||
if _match_role(tokens[i].text, role, kb) is not None:
|
||||
return i
|
||||
return None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Stage 2b' — SHITTY annotation (schema-less heuristic)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _annotate_shitty(
|
||||
tokens: list[Token],
|
||||
kb: ReleaseKnowledge,
|
||||
group_index: int | None,
|
||||
) -> list[Token]:
|
||||
"""Schema-less, dictionary-driven annotation.
|
||||
|
||||
SHITTY's job is narrow: for releases that *look* like scene names
|
||||
but don't have a registered group schema, tag every token whose text
|
||||
falls into a known YAML bucket (resolutions, codecs, sources, …).
|
||||
Anything we can't classify stays UNKNOWN. The leftmost run of
|
||||
UNKNOWN tokens becomes the title. Done.
|
||||
|
||||
Anything that requires more reasoning (parenthesized tech blocks,
|
||||
bare-dashed title fragments, year-disguised slug suffixes, …) is
|
||||
PATH OF PAIN territory and stays out of here on purpose.
|
||||
"""
|
||||
result = list(tokens)
|
||||
|
||||
# 1) Group token — split codec-GROUP or tag GROUP. Same logic as EASY.
|
||||
if group_index is not None:
|
||||
gt = result[group_index]
|
||||
cg_split = _split_codec_group(gt.text, kb)
|
||||
if cg_split is not None:
|
||||
codec, group = cg_split
|
||||
result[group_index] = gt.with_role(
|
||||
TokenRole.CODEC, codec=codec, group=group or "UNKNOWN"
|
||||
)
|
||||
else:
|
||||
_, _, tail = gt.text.rpartition("-")
|
||||
result[group_index] = gt.with_role(
|
||||
TokenRole.GROUP, group=tail or "UNKNOWN"
|
||||
)
|
||||
|
||||
# 2) Enrichers (audio / video-meta / edition / language).
|
||||
result = _annotate_enrichers(result, kb)
|
||||
|
||||
# 3) Single pass: tag each UNKNOWN token by looking it up in the kb
|
||||
# buckets. First match wins per token, first occurrence wins per
|
||||
# role (we don't overwrite an already-tagged role).
|
||||
matchers: list[tuple[TokenRole, callable]] = [
|
||||
(TokenRole.SEASON_EPISODE, lambda t: _parse_season_episode(t) is not None),
|
||||
(TokenRole.YEAR, _is_year),
|
||||
(TokenRole.RESOLUTION, lambda t: t.lower() in kb.resolutions),
|
||||
(TokenRole.DISTRIBUTOR, lambda t: t.upper() in kb.distributors),
|
||||
(TokenRole.SOURCE, lambda t: t.lower() in kb.sources),
|
||||
(TokenRole.CODEC, lambda t: t.lower() in kb.codecs),
|
||||
]
|
||||
seen: set[TokenRole] = set()
|
||||
|
||||
for i, tok in enumerate(result):
|
||||
if tok.role is not TokenRole.UNKNOWN:
|
||||
continue
|
||||
for role, matches in matchers:
|
||||
if role in seen:
|
||||
continue
|
||||
if matches(tok.text):
|
||||
result[i] = tok.with_role(role)
|
||||
seen.add(role)
|
||||
break
|
||||
|
||||
# 4) Title = leftmost contiguous UNKNOWN tokens.
|
||||
for i, tok in enumerate(result):
|
||||
if tok.role is not TokenRole.UNKNOWN:
|
||||
break
|
||||
result[i] = tok.with_role(TokenRole.TITLE)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Stage 2c — enricher pass (non-positional roles)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _annotate_enrichers(tokens: list[Token], kb: ReleaseKnowledge) -> list[Token]:
|
||||
"""Tag the remaining UNKNOWN tokens with non-positional roles.
|
||||
|
||||
Multi-token sequences are matched first (so ``DTS.HD.MA`` wins over
|
||||
a single-token ``DTS``). For each sequence match, the first token
|
||||
receives the role + ``extra["sequence"]`` (the canonical joined
|
||||
value), and the trailing members are marked with the same role +
|
||||
``extra["sequence_member"]=True`` so :func:`assemble` extracts the
|
||||
value only from the primary.
|
||||
"""
|
||||
result = list(tokens)
|
||||
|
||||
# Multi-token sequences first.
|
||||
_apply_sequences(
|
||||
result, kb.audio.get("sequences", []), "codec", TokenRole.AUDIO_CODEC
|
||||
)
|
||||
_apply_sequences(
|
||||
result, kb.video_meta.get("sequences", []), "hdr", TokenRole.HDR
|
||||
)
|
||||
_apply_sequences(
|
||||
result, kb.editions.get("sequences", []), "edition", TokenRole.EDITION
|
||||
)
|
||||
|
||||
# Single tokens.
|
||||
known_audio_codecs = {c.upper() for c in kb.audio.get("codecs", [])}
|
||||
known_audio_channels = set(kb.audio.get("channels", []))
|
||||
known_hdr = {h.upper() for h in kb.video_meta.get("hdr", [])} | kb.hdr_extra
|
||||
known_bit_depth = {d.lower() for d in kb.video_meta.get("bit_depth", [])}
|
||||
known_editions = {t.upper() for t in kb.editions.get("tokens", [])}
|
||||
|
||||
# Channel layouts like "5.1" are tokenized as two tokens ("5", "1")
|
||||
# because "." is a separator. Detect consecutive pairs whose joined
|
||||
# value (without any trailing "-GROUP") is in the channel set.
|
||||
_detect_channel_pairs(result, known_audio_channels)
|
||||
|
||||
for i, tok in enumerate(result):
|
||||
if tok.role is not TokenRole.UNKNOWN:
|
||||
continue
|
||||
text = tok.text
|
||||
upper = text.upper()
|
||||
lower = text.lower()
|
||||
|
||||
if upper in known_audio_codecs:
|
||||
result[i] = tok.with_role(TokenRole.AUDIO_CODEC)
|
||||
continue
|
||||
if text in known_audio_channels:
|
||||
result[i] = tok.with_role(TokenRole.AUDIO_CHANNELS)
|
||||
continue
|
||||
if upper in known_hdr:
|
||||
result[i] = tok.with_role(TokenRole.HDR)
|
||||
continue
|
||||
if lower in known_bit_depth:
|
||||
result[i] = tok.with_role(TokenRole.BIT_DEPTH)
|
||||
continue
|
||||
if upper in known_editions:
|
||||
result[i] = tok.with_role(TokenRole.EDITION)
|
||||
continue
|
||||
if upper in kb.language_tokens:
|
||||
result[i] = tok.with_role(TokenRole.LANGUAGE)
|
||||
continue
|
||||
if upper in kb.distributors:
|
||||
result[i] = tok.with_role(TokenRole.DISTRIBUTOR)
|
||||
continue
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def _apply_sequences(
|
||||
tokens: list[Token],
|
||||
sequences: list[dict],
|
||||
value_key: str,
|
||||
role: TokenRole,
|
||||
) -> None:
|
||||
"""Mark the first occurrence of each sequence in place.
|
||||
|
||||
Mutates ``tokens`` (replacing entries with new role-tagged Token
|
||||
instances). Sequences in the YAML must be ordered most-specific
|
||||
first; the first match wins per starting position.
|
||||
"""
|
||||
if not sequences:
|
||||
return
|
||||
|
||||
upper_texts = [t.text.upper() for t in tokens]
|
||||
consumed: set[int] = set()
|
||||
|
||||
for seq in sequences:
|
||||
seq_upper = [s.upper() for s in seq["tokens"]]
|
||||
n = len(seq_upper)
|
||||
for start in range(len(tokens) - n + 1):
|
||||
if any(idx in consumed for idx in range(start, start + n)):
|
||||
continue
|
||||
if any(
|
||||
tokens[start + k].role is not TokenRole.UNKNOWN for k in range(n)
|
||||
):
|
||||
continue
|
||||
if upper_texts[start : start + n] == seq_upper:
|
||||
tokens[start] = tokens[start].with_role(
|
||||
role, sequence=seq[value_key]
|
||||
)
|
||||
for k in range(1, n):
|
||||
tokens[start + k] = tokens[start + k].with_role(
|
||||
role, sequence_member="True"
|
||||
)
|
||||
consumed.update(range(start, start + n))
|
||||
|
||||
|
||||
def _detect_channel_pairs(
|
||||
tokens: list[Token], known_channels: set[str]
|
||||
) -> None:
|
||||
"""Spot two consecutive numeric tokens that form a channel layout.
|
||||
|
||||
Example: ``["5", "1-KTH"]`` → joined ``"5.1"`` (after stripping the
|
||||
``-GROUP`` suffix on the second). The second token may be the trailing
|
||||
codec-GROUP token, in which case it's already tagged CODEC and we
|
||||
skip — we'd corrupt its role.
|
||||
"""
|
||||
for i in range(len(tokens) - 1):
|
||||
first = tokens[i]
|
||||
second = tokens[i + 1]
|
||||
if first.role is not TokenRole.UNKNOWN:
|
||||
continue
|
||||
# Strip a "-GROUP" suffix on the second token before joining.
|
||||
second_text = second.text.split("-")[0]
|
||||
candidate = f"{first.text}.{second_text}"
|
||||
if candidate not in known_channels:
|
||||
continue
|
||||
# Only tag the first token (carries the channel value). The
|
||||
# second token may legitimately remain UNKNOWN (or be the
|
||||
# codec-GROUP token, already tagged CODEC).
|
||||
tokens[i] = first.with_role(
|
||||
TokenRole.AUDIO_CHANNELS, sequence=candidate
|
||||
)
|
||||
if second.role is TokenRole.UNKNOWN:
|
||||
tokens[i + 1] = second.with_role(
|
||||
TokenRole.AUDIO_CHANNELS, sequence_member="True"
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Stage 2 entry point
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def annotate(tokens: list[Token], kb: ReleaseKnowledge) -> list[Token]:
|
||||
"""Annotate token roles.
|
||||
|
||||
Dispatch:
|
||||
|
||||
* If a group is detected AND has a known schema, run the EASY
|
||||
structural walk. If the schema walk aborts on a mandatory chunk
|
||||
mismatch, fall through to SHITTY (the heuristic still does better
|
||||
than giving up).
|
||||
* Otherwise run SHITTY — schema-less, best-effort, never aborts.
|
||||
|
||||
The enricher pass runs in both cases. The pipeline always returns a
|
||||
populated token list; downstream callers don't need to distinguish
|
||||
EASY vs SHITTY at this layer (the parse_path is decided in the
|
||||
service based on whether a schema matched).
|
||||
"""
|
||||
group_name, group_index = _detect_group(tokens, kb)
|
||||
|
||||
schema = kb.group_schema(group_name) if group_index is not None else None
|
||||
if schema is not None and group_index is not None:
|
||||
structural = _annotate_structural(tokens, kb, schema, group_index)
|
||||
if structural is not None:
|
||||
return _annotate_enrichers(structural, kb)
|
||||
|
||||
# SHITTY fallback — heuristic positional pass. ``_annotate_shitty``
|
||||
# runs its own enricher pass internally (it has to, so the title
|
||||
# scan can skip enricher-tagged tokens).
|
||||
return _annotate_shitty(tokens, kb, group_index)
|
||||
|
||||
|
||||
def has_known_schema(tokens: list[Token], kb: ReleaseKnowledge) -> bool:
|
||||
"""Return True if ``tokens`` would take the EASY path in :func:`annotate`."""
|
||||
group_name, group_index = _detect_group(tokens, kb)
|
||||
if group_index is None:
|
||||
return False
|
||||
return kb.group_schema(group_name) is not None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Stage 3 — assemble
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def assemble(
|
||||
annotated: list[Token],
|
||||
site_tag: str | None,
|
||||
raw_name: str,
|
||||
kb: ReleaseKnowledge,
|
||||
) -> dict:
|
||||
"""Fold annotated tokens into a ``ParsedRelease``-compatible dict.
|
||||
|
||||
Returns a dict (not a ``ParsedRelease`` instance) so the caller can
|
||||
layer in additional fields (``parse_path``, ``raw``, …) before
|
||||
instantiation.
|
||||
"""
|
||||
# Pure-punctuation tokens (e.g. a stray "-" left by ` - ` separators in
|
||||
# human-friendly release names) carry no title content and would leak
|
||||
# into the joined title as ``"Show.-.Episode"``. Drop them here.
|
||||
title_parts = [
|
||||
t.text
|
||||
for t in annotated
|
||||
if t.role is TokenRole.TITLE and any(c.isalnum() for c in t.text)
|
||||
]
|
||||
title = ".".join(title_parts) if title_parts else (
|
||||
annotated[0].text if annotated else raw_name
|
||||
)
|
||||
|
||||
year: int | None = None
|
||||
season: int | None = None
|
||||
episode: int | None = None
|
||||
episode_end: int | None = None
|
||||
quality: str | None = None
|
||||
source: str | None = None
|
||||
codec: str | None = None
|
||||
group = "UNKNOWN"
|
||||
audio_codec: str | None = None
|
||||
audio_channels: str | None = None
|
||||
bit_depth: str | None = None
|
||||
hdr_format: str | None = None
|
||||
edition: str | None = None
|
||||
distributor: str | None = None
|
||||
languages: list[str] = []
|
||||
is_season_range = False
|
||||
|
||||
for tok in annotated:
|
||||
# Skip non-primary members of a multi-token sequence.
|
||||
if tok.extra.get("sequence_member") == "True":
|
||||
continue
|
||||
|
||||
role = tok.role
|
||||
if role is TokenRole.YEAR:
|
||||
year = int(tok.text)
|
||||
elif role is TokenRole.SEASON_EPISODE:
|
||||
parsed = _parse_season_episode(tok.text)
|
||||
if parsed is not None:
|
||||
season, episode, episode_end = parsed
|
||||
# Detect Sxx-yy range form to flag it as a multi-season pack.
|
||||
upper = tok.text.upper()
|
||||
if (
|
||||
len(upper) == 6
|
||||
and upper[0] == "S"
|
||||
and upper[1:3].isdigit()
|
||||
and upper[3] == "-"
|
||||
and upper[4:6].isdigit()
|
||||
):
|
||||
is_season_range = True
|
||||
elif role is TokenRole.RESOLUTION:
|
||||
quality = tok.text
|
||||
elif role is TokenRole.SOURCE:
|
||||
source = tok.text
|
||||
elif role is TokenRole.CODEC:
|
||||
codec = tok.extra.get("codec", tok.text)
|
||||
if "group" in tok.extra:
|
||||
group = tok.extra["group"] or "UNKNOWN"
|
||||
elif role is TokenRole.GROUP:
|
||||
group = tok.extra.get("group", tok.text) or "UNKNOWN"
|
||||
elif role is TokenRole.AUDIO_CODEC:
|
||||
if audio_codec is None:
|
||||
audio_codec = tok.extra.get("sequence", tok.text)
|
||||
elif role is TokenRole.AUDIO_CHANNELS:
|
||||
if audio_channels is None:
|
||||
audio_channels = tok.extra.get("sequence", tok.text)
|
||||
elif role is TokenRole.BIT_DEPTH:
|
||||
if bit_depth is None:
|
||||
bit_depth = tok.text.lower()
|
||||
elif role is TokenRole.HDR:
|
||||
if hdr_format is None:
|
||||
hdr_format = tok.extra.get("sequence", tok.text.upper())
|
||||
elif role is TokenRole.EDITION:
|
||||
if edition is None:
|
||||
edition = tok.extra.get("sequence", tok.text.upper())
|
||||
elif role is TokenRole.LANGUAGE:
|
||||
languages.append(tok.text.upper())
|
||||
elif role is TokenRole.DISTRIBUTOR:
|
||||
if distributor is None:
|
||||
distributor = tok.text.upper()
|
||||
|
||||
# Media type heuristic. Doc/concert/integrale tokens win over the
|
||||
# generic tech-based fallback. We look across all tokens (not just
|
||||
# annotated ones) because these markers may be tagged UNKNOWN by the
|
||||
# structural pass — only the assemble step cares about them.
|
||||
upper_tokens = {tok.text.upper() for tok in annotated}
|
||||
doc_tokens = {t.upper() for t in kb.media_type_tokens.get("doc", [])}
|
||||
concert_tokens = {t.upper() for t in kb.media_type_tokens.get("concert", [])}
|
||||
integrale_tokens = {t.upper() for t in kb.media_type_tokens.get("integrale", [])}
|
||||
|
||||
if upper_tokens & doc_tokens:
|
||||
media_type = MediaTypeToken.DOCUMENTARY
|
||||
elif upper_tokens & concert_tokens:
|
||||
media_type = MediaTypeToken.CONCERT
|
||||
elif is_season_range:
|
||||
media_type = MediaTypeToken.TV_COMPLETE
|
||||
elif (
|
||||
edition in {"COMPLETE", "INTEGRALE", "COLLECTION"}
|
||||
or upper_tokens & integrale_tokens
|
||||
) and season is None:
|
||||
media_type = MediaTypeToken.TV_COMPLETE
|
||||
elif season is not None:
|
||||
media_type = MediaTypeToken.TV_SHOW
|
||||
elif any((quality, source, codec, year)):
|
||||
media_type = MediaTypeToken.MOVIE
|
||||
else:
|
||||
media_type = MediaTypeToken.UNKNOWN
|
||||
|
||||
return {
|
||||
"title": title,
|
||||
"title_sanitized": kb.sanitize_for_fs(title),
|
||||
"year": year,
|
||||
"season": season,
|
||||
"episode": episode,
|
||||
"episode_end": episode_end,
|
||||
"quality": quality,
|
||||
"source": source,
|
||||
"codec": codec,
|
||||
"group": group,
|
||||
"media_type": media_type,
|
||||
"site_tag": site_tag,
|
||||
"languages": tuple(languages),
|
||||
"audio_codec": audio_codec,
|
||||
"audio_channels": audio_channels,
|
||||
"bit_depth": bit_depth,
|
||||
"hdr_format": hdr_format,
|
||||
"edition": edition,
|
||||
"distributor": distributor,
|
||||
}
|
||||
@@ -0,0 +1,47 @@
|
||||
"""Group schema value objects.
|
||||
|
||||
A :class:`GroupSchema` describes the canonical chunk layout of releases
|
||||
from a known group (KONTRAST, RARBG, ELiTE, …). It is the EASY-road
|
||||
contract: when a release ends in ``-<GROUP>`` and we know the group,
|
||||
the annotator walks the schema instead of running the heuristic SHITTY
|
||||
matchers.
|
||||
|
||||
Schemas are loaded from ``knowledge/release/release_groups/<group>.yaml``
|
||||
by an infrastructure adapter and surfaced via the
|
||||
:class:`~alfred.domain.release.ports.knowledge.ReleaseKnowledge` port.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
|
||||
from .tokens import TokenRole
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class SchemaChunk:
|
||||
"""One entry in a group's chunk order.
|
||||
|
||||
``role`` is the :class:`TokenRole` the chunk maps to. ``optional``
|
||||
is True for chunks that may be absent (e.g. ``year`` on TV releases,
|
||||
``source`` on bare ELiTE TV releases).
|
||||
"""
|
||||
|
||||
role: TokenRole
|
||||
optional: bool = False
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class GroupSchema:
|
||||
"""Schema for a known release group.
|
||||
|
||||
``chunks`` is the left-to-right canonical order. The annotator walks
|
||||
tokens and chunks in lockstep: an optional chunk that doesn't match
|
||||
the current token is skipped (the chunk index advances, the token
|
||||
index stays), a mandatory chunk that doesn't match aborts the EASY
|
||||
path and falls back to SHITTY.
|
||||
"""
|
||||
|
||||
name: str
|
||||
separator: str
|
||||
chunks: tuple[SchemaChunk, ...]
|
||||
@@ -0,0 +1,139 @@
|
||||
"""Parse-confidence scoring.
|
||||
|
||||
``parse_release`` returns a :class:`ParseReport` alongside its
|
||||
:class:`ParsedRelease`. The report carries:
|
||||
|
||||
- ``confidence``: integer 0–100 derived from which structural and
|
||||
technical fields got populated, minus a penalty per UNKNOWN token
|
||||
left in the annotated stream.
|
||||
- ``road``: which of the three roads the parse took
|
||||
(:class:`Road.EASY` / :class:`Road.SHITTY` / :class:`Road.PATH_OF_PAIN`).
|
||||
- ``unknown_tokens``: textual residue, useful for diagnostics.
|
||||
- ``missing_critical``: structural fields the score-tally found absent
|
||||
(e.g. ``("year", "media_type")``) — the caller can use this to drive
|
||||
PoP recovery (questions, LLM call).
|
||||
|
||||
All weights, penalties and thresholds come from the injected knowledge
|
||||
base (``kb.scoring``), itself loaded from
|
||||
``alfred/knowledge/release/scoring.yaml``. No magic numbers here.
|
||||
|
||||
The scoring functions are pure — they consume the annotated token list
|
||||
and the resulting :class:`ParsedRelease` and return the report. They are
|
||||
called by ``services.parse_release`` after ``assemble`` has run.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from enum import Enum
|
||||
|
||||
from ..ports.knowledge import ReleaseKnowledge
|
||||
from alfred.domain.releases_TO_CHECK.value_objects_old_question_mark import ParsedRelease
|
||||
from .tokens import Token, TokenRole
|
||||
|
||||
|
||||
class Road(str, Enum):
|
||||
"""How the parser handled a given release name.
|
||||
|
||||
Distinct from :class:`~alfred.domain.release.value_objects.TokenizationRoute`,
|
||||
which records the tokenization route (DIRECT / SANITIZED / AI). Road
|
||||
is about confidence in the *result*, not the *method*.
|
||||
"""
|
||||
|
||||
EASY = "easy" # group schema matched — structural annotation
|
||||
SHITTY = "shitty" # no schema, dict-driven annotation, score ≥ threshold
|
||||
PATH_OF_PAIN = "path_of_pain" # score below threshold, needs help
|
||||
|
||||
|
||||
# Critical structural fields — their absence drives the
|
||||
# ``missing_critical`` list in the report.
|
||||
_CRITICAL_FIELDS: tuple[str, ...] = ("title", "media_type", "year")
|
||||
|
||||
|
||||
def _is_tv_shaped(parsed: ParsedRelease) -> bool:
|
||||
"""Season/episode weights only count for releases that *look* like TV."""
|
||||
return parsed.season is not None
|
||||
|
||||
|
||||
def compute_score(
|
||||
parsed: ParsedRelease,
|
||||
annotated: list[Token],
|
||||
kb: ReleaseKnowledge,
|
||||
) -> int:
|
||||
"""Compute a 0–100 confidence score for the parse.
|
||||
|
||||
Each populated field contributes its weight from
|
||||
``kb.scoring["weights"]``. Season/episode only count when the parse
|
||||
looks like TV. ``group == "UNKNOWN"`` is treated as absent.
|
||||
|
||||
Then a penalty is subtracted per residual UNKNOWN token in
|
||||
``annotated``, capped at ``penalties["max_unknown_penalty"]``.
|
||||
|
||||
Result is clamped to ``[0, 100]``.
|
||||
"""
|
||||
weights = kb.scoring["weights"]
|
||||
penalties = kb.scoring["penalties"]
|
||||
|
||||
score = 0
|
||||
if parsed.title:
|
||||
score += weights.get("title", 0)
|
||||
if parsed.media_type and parsed.media_type.value != "unknown":
|
||||
score += weights.get("media_type", 0)
|
||||
if parsed.year is not None:
|
||||
score += weights.get("year", 0)
|
||||
if _is_tv_shaped(parsed):
|
||||
if parsed.season is not None:
|
||||
score += weights.get("season", 0)
|
||||
if parsed.episode is not None:
|
||||
score += weights.get("episode", 0)
|
||||
if parsed.quality:
|
||||
score += weights.get("resolution", 0)
|
||||
if parsed.source:
|
||||
score += weights.get("source", 0)
|
||||
if parsed.codec:
|
||||
score += weights.get("codec", 0)
|
||||
if parsed.group and parsed.group != "UNKNOWN":
|
||||
score += weights.get("group", 0)
|
||||
|
||||
unknown_count = sum(1 for t in annotated if t.role is TokenRole.UNKNOWN)
|
||||
raw_penalty = unknown_count * penalties.get("unknown_token", 0)
|
||||
capped_penalty = min(raw_penalty, penalties.get("max_unknown_penalty", 0))
|
||||
score -= capped_penalty
|
||||
|
||||
return max(0, min(100, score))
|
||||
|
||||
|
||||
def collect_unknown_tokens(annotated: list[Token]) -> tuple[str, ...]:
|
||||
"""Return the text of every token still tagged UNKNOWN."""
|
||||
return tuple(t.text for t in annotated if t.role is TokenRole.UNKNOWN)
|
||||
|
||||
|
||||
def collect_missing_critical(parsed: ParsedRelease) -> tuple[str, ...]:
|
||||
"""Return the names of critical structural fields that are absent."""
|
||||
missing: list[str] = []
|
||||
if not parsed.title:
|
||||
missing.append("title")
|
||||
if not parsed.media_type or parsed.media_type.value == "unknown":
|
||||
missing.append("media_type")
|
||||
if parsed.year is None:
|
||||
missing.append("year")
|
||||
return tuple(missing)
|
||||
|
||||
|
||||
def decide_road(
|
||||
score: int,
|
||||
has_schema: bool,
|
||||
kb: ReleaseKnowledge,
|
||||
) -> Road:
|
||||
"""Pick the road the parse took.
|
||||
|
||||
EASY is decided structurally: if a known group schema matched, the
|
||||
annotation walked the schema, and that's enough — the score does not
|
||||
veto EASY. Otherwise the score decides between SHITTY and
|
||||
PATH_OF_PAIN using ``kb.scoring["thresholds"]["shitty_min"]``.
|
||||
"""
|
||||
if has_schema:
|
||||
return Road.EASY
|
||||
threshold = kb.scoring["thresholds"].get("shitty_min", 60)
|
||||
if score >= threshold:
|
||||
return Road.SHITTY
|
||||
return Road.PATH_OF_PAIN
|
||||
@@ -0,0 +1,120 @@
|
||||
"""Release domain — parsing service.
|
||||
|
||||
Thin orchestrator over the annotate-based pipeline in
|
||||
:mod:`alfred.domain.release.parser.pipeline`. Responsibilities:
|
||||
|
||||
* Strip a leading/trailing ``[site.tag]`` and decide ``parse_path``.
|
||||
* Reject malformed names (forbidden characters) → ``parse_path=AI`` so
|
||||
the LLM can clean them up.
|
||||
* Otherwise call the v2 pipeline (tokenize → annotate → assemble) and
|
||||
wrap the result in :class:`ParsedRelease`.
|
||||
* Score the result and decide the road (EASY / SHITTY / PATH_OF_PAIN)
|
||||
via :mod:`alfred.domain.release.parser.scoring`.
|
||||
|
||||
The public entry point is :func:`parse_release`, which returns
|
||||
``(ParsedRelease, ParseReport)``. The report carries the confidence
|
||||
score, the road, and diagnostic info for downstream callers.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from alfred.domain.releases_TO_CHECK.parser import scoring as _scoring, pipeline as _v2
|
||||
from alfred.domain.releases_TO_CHECK.ports import ReleaseKnowledge
|
||||
from alfred.domain.releases_TO_CHECK.value_objects_old_question_mark import MediaTypeToken, ParsedRelease, ParseReport, TokenizationRoute
|
||||
|
||||
|
||||
def parse_release(
|
||||
name: str, kb: ReleaseKnowledge
|
||||
) -> tuple[ParsedRelease, ParseReport]:
|
||||
"""Parse a release name.
|
||||
|
||||
Returns a tuple ``(ParsedRelease, ParseReport)``. The structural VO
|
||||
is unchanged from the previous single-return contract; the report
|
||||
is new and carries the confidence score + road decision.
|
||||
|
||||
Flow:
|
||||
|
||||
1. Strip a leading/trailing ``[site.tag]`` if present (sets
|
||||
``parse_path="sanitized"``).
|
||||
2. If the remainder still contains truly forbidden chars (anything
|
||||
not in the configured separators), short-circuit to
|
||||
``media_type="unknown"`` / ``parse_path="ai"`` and emit a
|
||||
PATH_OF_PAIN report — the LLM handles these.
|
||||
3. Otherwise run the v2 pipeline: tokenize → annotate (EASY when a
|
||||
group schema is known, SHITTY otherwise) → assemble → score.
|
||||
"""
|
||||
parse_path = TokenizationRoute.DIRECT
|
||||
|
||||
# Apostrophes inside titles ("Don't", "L'avare") are common and should
|
||||
# not push the release through the AI fallback. Strip them up front so
|
||||
# both strip_site_tag and tokenize see "Dont" / "Lavare", which is good
|
||||
# enough for token-level matching. The raw name is preserved on the VO.
|
||||
working_name = name
|
||||
if "'" in working_name:
|
||||
working_name = working_name.replace("'", "")
|
||||
parse_path = TokenizationRoute.SANITIZED
|
||||
|
||||
clean, site_tag = _v2.strip_site_tag(working_name)
|
||||
if site_tag is not None:
|
||||
parse_path = TokenizationRoute.SANITIZED
|
||||
|
||||
if not _is_well_formed(clean, kb):
|
||||
parsed = ParsedRelease(
|
||||
raw=name,
|
||||
clean=clean,
|
||||
title=clean,
|
||||
title_sanitized=kb.sanitize_for_fs(clean),
|
||||
year=None,
|
||||
season=None,
|
||||
episode=None,
|
||||
episode_end=None,
|
||||
quality=None,
|
||||
source=None,
|
||||
codec=None,
|
||||
group="UNKNOWN",
|
||||
media_type=MediaTypeToken.UNKNOWN,
|
||||
site_tag=site_tag,
|
||||
parse_path=TokenizationRoute.AI,
|
||||
)
|
||||
report = ParseReport(
|
||||
confidence=0,
|
||||
road=_scoring.Road.PATH_OF_PAIN.value,
|
||||
unknown_tokens=(clean,),
|
||||
missing_critical=("title", "media_type", "year"),
|
||||
)
|
||||
return parsed, report
|
||||
|
||||
tokens, v2_tag = _v2.tokenize(working_name, kb)
|
||||
annotated = _v2.annotate(tokens, kb)
|
||||
fields = _v2.assemble(annotated, v2_tag, name, kb)
|
||||
|
||||
parsed = ParsedRelease(
|
||||
raw=name,
|
||||
clean=clean,
|
||||
parse_path=parse_path,
|
||||
**fields,
|
||||
)
|
||||
|
||||
has_schema = _v2.has_known_schema(tokens, kb)
|
||||
score = _scoring.compute_score(parsed, annotated, kb)
|
||||
road = _scoring.decide_road(score, has_schema, kb)
|
||||
report = ParseReport(
|
||||
confidence=score,
|
||||
road=road.value,
|
||||
unknown_tokens=_scoring.collect_unknown_tokens(annotated),
|
||||
missing_critical=_scoring.collect_missing_critical(parsed),
|
||||
)
|
||||
return parsed, report
|
||||
|
||||
|
||||
def _is_well_formed(name: str, kb: ReleaseKnowledge) -> bool:
|
||||
"""Return True if ``name`` contains no forbidden characters per scene
|
||||
naming rules.
|
||||
|
||||
Characters listed as token separators (spaces, brackets, parens, …)
|
||||
are NOT considered malforming — the tokenizer handles them. Only
|
||||
truly broken chars like ``@``, ``#``, ``!``, ``%`` make a name
|
||||
malformed.
|
||||
"""
|
||||
tokenizable = set(kb.separators)
|
||||
return not any(c in name for c in kb.forbidden_chars if c not in tokenizable)
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user