Compare commits
73 Commits
9f10f4e0ad
...
unfuck
| Author | SHA1 | Date | |
|---|---|---|---|
| 745dec39f5 | |||
| 42fa6139ed | |||
| 2df7843d8b | |||
| 28304bb162 | |||
| c62ae81275 | |||
| cffafa2e60 | |||
| b3abad4da4 | |||
| 7ff2e6bc4e | |||
| 8f31f880aa | |||
| 1efe9a82c1 | |||
| 0dc053881a | |||
| 97dc799a26 | |||
| fe9857aaed | |||
| cc334a7951 | |||
| 86222d95d1 | |||
| 9e48c70b8a | |||
| 7da0f887e7 | |||
| c22b2b78eb | |||
| 2f160644da | |||
| e65c1df229 | |||
| c0f6d01048 | |||
| de7030fa9c | |||
| 3622c95154 | |||
| c7c11180d9 | |||
| b0e275bd11 | |||
| 6c12c18a27 | |||
| 1427c8a54b | |||
| 8491edac22 | |||
| 02e478a157 | |||
| 3dc73a5214 | |||
| 88f156b7a4 | |||
| 5107cb32c0 | |||
| b7979c0f8b | |||
| 9f1ce94690 | |||
| 5e0ed11672 | |||
| 0246f85ef8 | |||
| e62dc90bd1 | |||
| 688c37bbec | |||
| 757e4045ee | |||
| c3767aacb6 | |||
| 5bcf22b408 | |||
| cfa9f54d9f | |||
| f0aaf50c97 | |||
| a09262b33f | |||
| 9c7cd66d2b | |||
| 83dbed887b | |||
| 0c9489e16b | |||
| 621bb96995 | |||
| 448ef3b79c | |||
| b1c7f35ffb | |||
| 5bbdc9081f | |||
| 5d7b214af2 | |||
| 18267d0165 | |||
| 19fe8a519a | |||
| a0d1846ff2 | |||
| 0fb59a4581 | |||
| e79ca462b8 | |||
| 03aa844d7d | |||
| c303efea48 | |||
| 5db350a1df | |||
| 12dc796ea2 | |||
| 9ddd85929e | |||
| ed7680b58f | |||
| b4c9efd13b | |||
| 98c688f29b | |||
| fcd80763e2 | |||
| 629387591f | |||
| 230a7ab88a | |||
| 3737f66851 | |||
| fd3bd1ad8c | |||
| 7dc7f0c241 | |||
| 075a827b0e | |||
| a2c917618f |
@@ -74,5 +74,11 @@ docs/
|
|||||||
# .md files (project-level Markdown is brol-y; allow-list the ones we track)
|
# .md files (project-level Markdown is brol-y; allow-list the ones we track)
|
||||||
*.md
|
*.md
|
||||||
!CHANGELOG.md
|
!CHANGELOG.md
|
||||||
|
!/README.md
|
||||||
|
!specs/
|
||||||
|
!specs/**/*.md
|
||||||
|
|
||||||
|
# Private dev docs (separate git repo inside; see .claude/CLAUDE.md)
|
||||||
|
/.claude/
|
||||||
|
|
||||||
#
|
#
|
||||||
|
|||||||
+880
@@ -15,8 +15,872 @@ callers).
|
|||||||
|
|
||||||
## [Unreleased]
|
## [Unreleased]
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
|
||||||
|
- **`filesystem` infra + application rewritten as 5 atomic free
|
||||||
|
functions.** On branch `unfuck`. Replaces the monolithic
|
||||||
|
`FileManager` class + scattered helpers with five small, pure ops in
|
||||||
|
`alfred/infrastructure/filesystem/`: `list_dir`, `create_dir`,
|
||||||
|
`link_file`, `move_file`, `move_dir`. Each takes `pathlib.Path`
|
||||||
|
arguments and raises typed exceptions from a dedicated hierarchy
|
||||||
|
(`FilesystemError` → `SourceNotFound` / `DestinationExists` /
|
||||||
|
`NotADirectory` / `NotAFile` / `PermissionDenied` / `CrossDevice` /
|
||||||
|
`FilesystemOSError`) — no more `{"status": "ok" | "error"}` dicts at
|
||||||
|
the infra boundary, no more `get_memory()` reads.
|
||||||
|
- **`filesystem` application: 5 use cases as free functions.** A
|
||||||
|
matching `<op>_use_case(path, …, roots: DirectoryRoots)` wraps each
|
||||||
|
infra op, guards inputs against escaping a new `DirectoryRoots` VO
|
||||||
|
(downloads / torrents / movies / tv_shows), catches infra exceptions,
|
||||||
|
and returns a frozen `<Op>Response` DTO. Roots are now injected, not
|
||||||
|
pulled from the global memory singleton.
|
||||||
|
|
||||||
|
- **Agent tool wrappers partially re-wired** to the new use cases.
|
||||||
|
`list_folder` now delegates to `list_dir_use_case`; `move_media`
|
||||||
|
to `move_file_use_case`; `move_to_destination` chains
|
||||||
|
`create_dir_use_case` + `move_file_use_case`; a new
|
||||||
|
`create_directory` tool wraps `create_dir_use_case`. Roots are
|
||||||
|
loaded once via a module-level `_load_directory_roots()` helper
|
||||||
|
that reads the persisted memory (no more per-call singleton
|
||||||
|
reads inside the use cases themselves).
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
|
||||||
|
- `FileManager` / `MediaOrganizer` / `create_folder` / `move` from the
|
||||||
|
public API of `alfred.infrastructure.filesystem`. Their files remain
|
||||||
|
on disk renamed with an `_OLD` suffix (e.g. `file_manager_OLD.py`) so
|
||||||
|
the migration can finish on a follow-up commit without losing
|
||||||
|
reference material. They are no longer re-exported from `__init__`.
|
||||||
|
- `CreateSeedLinksUseCase` / `ListFolderUseCase` / `MoveMediaUseCase` /
|
||||||
|
`ManageSubtitlesUseCase` / `resolve_destination` from the public API
|
||||||
|
of `alfred.application.filesystem`. Same `_OLD` rename treatment.
|
||||||
|
This intentionally breaks current tool wrappers and tests downstream
|
||||||
|
— re-wiring is the next chunk of work on this branch.
|
||||||
|
- **Agent tools dropped during the refactor** (to be reintroduced
|
||||||
|
when the matching domain/application code lands):
|
||||||
|
`manage_subtitles`, `set_path_for_folder`, `create_seed_links`,
|
||||||
|
`resolve_season_destination`, `resolve_episode_destination`,
|
||||||
|
`resolve_movie_destination`, `resolve_series_destination`.
|
||||||
|
Their wrappers are removed from `alfred.agent.tools.filesystem`;
|
||||||
|
`alfred.agent.tools.__init__` now re-exports only what still
|
||||||
|
imports cleanly. `find_media_imdb_id` (already broken before this
|
||||||
|
branch — name no longer exported by `tools.api`) was also dropped
|
||||||
|
from the package re-exports.
|
||||||
|
|
||||||
### Added
|
### Added
|
||||||
|
|
||||||
|
- **`.alfred` v2 — Phase 4: v2-shaped `rescan_show` + new
|
||||||
|
`rescan_movie` + index anchor-warning + `tmdb_cache_ttl_days`
|
||||||
|
setting.** Fourth and final structural phase of
|
||||||
|
`specs/dot_alfred_v2.md` on branch `refactor/dot-alfred-v2`. The TV
|
||||||
|
+ movie rescan orchestrators now write v2 release aggregates
|
||||||
|
(`SeriesRelease` / `MovieRelease`) via the concrete v2
|
||||||
|
repositories; the library index keeps auto-healing from the new
|
||||||
|
sidecars on its next read (no TMDB call from rescan — that stays
|
||||||
|
Phase 5).
|
||||||
|
- **`rescan_show`** moves from `alfred/application/library/` to
|
||||||
|
`alfred/application/tv_shows/` (symmetry with the new
|
||||||
|
`alfred/application/movies/`). New signature:
|
||||||
|
`(show_root, *, tmdb_id: TmdbId, imdb_id: ImdbId | None = None,
|
||||||
|
series_repo, scanner, prober, kb) -> SeriesRelease`.
|
||||||
|
- **`rescan_movie`** (new — `alfred/application/movies/rescan.py`)
|
||||||
|
locates the main video via `find_video_file`, runs
|
||||||
|
`inspect_release` once, and writes the per-movie `.alfred`
|
||||||
|
sidecar. `added_at = datetime.now(UTC)` on every rescan (the
|
||||||
|
sidecar records reconciliation time, not filesystem mtime).
|
||||||
|
Raises `MovieRescanFailed` when no video is found in the folder.
|
||||||
|
- **PACK semantics in `rescan_show`**: a single-video + no-episode
|
||||||
|
season becomes `SeasonRelease(mode=PACK, folder=…, episodes=())`.
|
||||||
|
The slot map stays empty until the Phase 5 TMDB sync supplies
|
||||||
|
`episode_count` — no fabricated `EpisodeRange` lands in the
|
||||||
|
sidecar. *(Superseded by Phase 4b — see Fixed.)*
|
||||||
|
- **`Settings.tmdb_cache_ttl_days: int = 14`** — placeholder for the
|
||||||
|
Phase 5 TTL policy on library-index entries (`fetched_at + TTL`
|
||||||
|
drives refresh decisions).
|
||||||
|
- **Library-index anchor-mismatch warning** — both
|
||||||
|
`DotAlfredTVShowLibraryIndex` and `DotAlfredMovieLibraryIndex` now
|
||||||
|
cross-check each entry's `metadata.path` against the on-disk
|
||||||
|
folder layout right after a successful parse. Drift is logged as a
|
||||||
|
`WARNING` (one per missing folder, with `tmdb_id`); the heal path
|
||||||
|
stays silent by construction (it always synthesizes from real
|
||||||
|
folder names).
|
||||||
|
- **`.alfred` v2 — Phase 5: TMDB sync orchestrators.** Fifth phase
|
||||||
|
of `specs/dot_alfred_v2.md` on branch `refactor/dot-alfred-v2`.
|
||||||
|
Two new orchestrators refresh the library-root index's
|
||||||
|
TMDB-cached fields from on-disk truth + a single TMDB call:
|
||||||
|
- **`sync_show`** (`alfred/application/tv_shows/sync.py`) calls
|
||||||
|
`TMDBClient.get_tv_show_info`, loads the release via
|
||||||
|
`DotAlfredSeriesReleaseRepository.load_by_tmdb_id`, and upserts
|
||||||
|
the result into `DotAlfredTVShowLibraryIndex`. Honors
|
||||||
|
`Settings.tmdb_cache_ttl_days`; placeholder entries (auto-healed,
|
||||||
|
`status == "unknown"`) always refresh; `force=True` overrides
|
||||||
|
both gates. Raises `ShowNotFoundInLibrary` when neither index nor
|
||||||
|
sidecar carry `tmdb_id`. Indexed shows with a missing per-show
|
||||||
|
sidecar still get a fresh TMDB pass — slot map clears until
|
||||||
|
rescan repopulates it.
|
||||||
|
- **`sync_movie`** (`alfred/application/movies/sync.py`) is the
|
||||||
|
movie-side parallel. Placeholder signature is `name ==
|
||||||
|
metadata.path` (auto-heal copies the folder name into `name`;
|
||||||
|
the sidecar schema requires `name` non-empty so we can't use
|
||||||
|
`name == ""`). When the per-movie sidecar is gone but the
|
||||||
|
index entry remains, sync warns and returns the existing entry
|
||||||
|
unchanged (no upsert possible without a release).
|
||||||
|
- **`TmdbMovieInfo` DTO + `TMDBClient.get_movie_info`** — symmetric
|
||||||
|
to the existing `TmdbShowInfo` / `get_tv_show_info` pair. Carries
|
||||||
|
`tmdb_id`, `imdb_id`, `title`, and `release_year` (parsed from
|
||||||
|
TMDB's `release_date`).
|
||||||
|
- **`load_by_tmdb_id` on the v2 release repositories.** The series
|
||||||
|
repo returns `(SeriesRelease, show_folder_name)` so the sync
|
||||||
|
orchestrator can feed `DotAlfredTVShowLibraryIndex.upsert(...,
|
||||||
|
path=...)`; the movie repo returns `MovieRelease` alone (folder is
|
||||||
|
on `release.folder` already) and is provided as a semantic alias
|
||||||
|
of `find_by_tmdb_id` for symmetry.
|
||||||
|
- **`alfred/application/exceptions.py`** — new module for the two
|
||||||
|
shared `*NotFoundInLibrary` exceptions raised by the sync
|
||||||
|
orchestrators (`ShowNotFoundInLibrary`, `MovieNotFoundInLibrary`).
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
|
||||||
|
- **PACK vs EPISODIC classification (Phase 4b).** The Phase 4
|
||||||
|
walker + `rescan_show` logic classified seasons by parser output
|
||||||
|
(does the filename carry `Exx`?), but PACK vs EPISODIC is a
|
||||||
|
*structural* distinction:
|
||||||
|
- **PACK** = season folder with N flat `SxxEyy` videos.
|
||||||
|
- **EPISODIC** = season folder with N subfolders, each holding
|
||||||
|
one video.
|
||||||
|
The walker now descends two levels under `show_root` and
|
||||||
|
classifies per season folder. Mixed (flat + subfolders) is
|
||||||
|
malformed — warn and skip. `rescan_show` trusts the walker's
|
||||||
|
mode and stops conflating "single un-numbered video" with PACK
|
||||||
|
(that case is now skipped as malformed too). Tests rewritten
|
||||||
|
against the real model. Supersedes the PACK-semantics bullet
|
||||||
|
above in Added.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
|
||||||
|
- **v1 dot_alfred stack and its abstract domain ports.** Deleted
|
||||||
|
`alfred/infrastructure/persistence/dot_alfred/{bridge,repository,
|
||||||
|
serializer,sidecar}.py`, plus the
|
||||||
|
`alfred/domain/{tv_shows,movies}/repositories.py` ABCs
|
||||||
|
(`TVShowRepository` / `MovieRepository`) — zero callers after
|
||||||
|
Phase 4. `dot_alfred/__init__.py` is rewritten as a v2-only
|
||||||
|
re-export (four concrete repositories + `ShowFolderUnknown`).
|
||||||
|
- **`alfred/application/library/` package** (rescan + walker moved
|
||||||
|
to `alfred/application/tv_shows/`).
|
||||||
|
- The two Phase 3 module-level test skips
|
||||||
|
(`test_repository.py`, `test_serializer.py`) are lifted by
|
||||||
|
deleting the quarantined files.
|
||||||
|
- **`MediaWithTracks` mixin + `track_lang_matches` helper** in
|
||||||
|
`alfred.domain.shared.media`. Parked in Phase 4 pending a
|
||||||
|
Phase 5 decision; zero callers across `alfred/` and `tests/`
|
||||||
|
after the v2 aggregates landed, so both go.
|
||||||
|
|
||||||
|
### Internal
|
||||||
|
|
||||||
|
- **Suite**: 1233 → 1277 passing; 10 → 8 skips (only LLM-not-running
|
||||||
|
skips remain — the Phase 3 quarantines are gone with their files).
|
||||||
|
- Phase 5 cleanup sweep returns zero hits for `MediaWithTracks`,
|
||||||
|
v1 dot_alfred symbols, v1 sidecar names, and `alfred.application.
|
||||||
|
library` — the v2 surface is the only one left.
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
|
||||||
|
- **`.alfred` v2 — Phase 3: `TVShow` / `Movie` aggregates become
|
||||||
|
TMDB-only.** Third phase of `specs/dot_alfred_v2.md` on branch
|
||||||
|
`refactor/dot-alfred-v2`. Filesystem-side concerns (file paths,
|
||||||
|
tracks, quality, mode, `added_at`) move to the `releases/` domain
|
||||||
|
added in Phase 1; the TMDB aggregates now carry only identity +
|
||||||
|
TMDB catalog facts.
|
||||||
|
- **`TVShow`** — `tmdb_id: TmdbId` is now the **required primary
|
||||||
|
key**; `imdb_id: ImdbId | None` is the optional secondary anchor.
|
||||||
|
Added `status: str = "unknown"` (raw TMDB string, default matches
|
||||||
|
the v2 library-index auto-heal placeholder). `episode_count`
|
||||||
|
aggregates the TMDB-cached counts on each `Season` (was: sum of
|
||||||
|
materialized `Episode` objects).
|
||||||
|
- **`Season`** — added `episode_count: int = 0` (TMDB-cached,
|
||||||
|
authoritative). **Removed**: `audio_tracks`, `subtitle_tracks`,
|
||||||
|
and the `mode` property (release mode now lives only on
|
||||||
|
`SeasonRelease.mode` — single source of truth).
|
||||||
|
- **`Episode`** — slimmed to identity + title. **Removed**:
|
||||||
|
`file_path`, `file_size`, `audio_tracks`, `subtitle_tracks`. The
|
||||||
|
`MediaWithTracks` mixin is no longer in `Episode`'s MRO; on-disk
|
||||||
|
facts live on the matching `EpisodeRelease` keyed by
|
||||||
|
`(season_number, episode_number)`.
|
||||||
|
- **`Movie`** — `tmdb_id: TmdbId` required, `imdb_id` optional.
|
||||||
|
**Removed**: `file_path`, `file_size`, `quality`, `added_at`,
|
||||||
|
`audio_tracks`, `subtitle_tracks`. `get_filename()` now returns
|
||||||
|
`"Title.Year"` (quality lives on `MovieRelease` and is appended
|
||||||
|
by a release-aware caller — Phase 4 wires this through
|
||||||
|
`MediaOrganizer`).
|
||||||
|
- **`TVShowBuilder` / `SeasonBuilder`** — constructor requires
|
||||||
|
`tmdb_id: TmdbId`; `imdb_id` and `status` are optional.
|
||||||
|
`SeasonBuilder.set_episode_count(int)` replaces the old
|
||||||
|
`set_audio_tracks` / `set_subtitle_tracks` (tracks no longer
|
||||||
|
persisted on `Season`).
|
||||||
|
- **`MovieRelease` carries `added_at: datetime`** (required).
|
||||||
|
Bumped `dot_alfred/v2` `SCHEMA_VERSION` from `1` → `2` to add
|
||||||
|
`added_at: datetime` to `MovieReleaseSidecar`. Round-trip via
|
||||||
|
Pydantic `mode="json"` (datetime ↔ ISO 8601 string). No migration
|
||||||
|
code shipped — no v2.1 sidecars exist in the wild yet.
|
||||||
|
- **No-coercion `TmdbId` contract.** `TVShow(tmdb_id=1396)` now raises
|
||||||
|
— callers pass `TmdbId(1396)`. Same for `imdb_id: ImdbId | None`
|
||||||
|
on `TVShow`/`Movie`. Honest type contract, no ergonomic shim.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
|
||||||
|
- `Season.mode` property (derive from `SeasonRelease.mode` instead).
|
||||||
|
- `Episode.file_path` / `file_size` / `audio_tracks` /
|
||||||
|
`subtitle_tracks`.
|
||||||
|
- `Movie.file_path` / `file_size` / `quality` / `added_at` /
|
||||||
|
`audio_tracks` / `subtitle_tracks`.
|
||||||
|
|
||||||
|
### Internal
|
||||||
|
|
||||||
|
- v1 dot_alfred package (`bridge.py`, `repository.py`,
|
||||||
|
`serializer.py`, `sidecar.py`), the abstract `TVShowRepository` /
|
||||||
|
`MovieRepository` ports typed against the pre-Phase-3 aggregates,
|
||||||
|
and `alfred/application/library/rescan.py` are **intentionally
|
||||||
|
left in tree as a known-red island**. Their tests
|
||||||
|
(`tests/infrastructure/persistence/dot_alfred/test_repository.py`,
|
||||||
|
`test_serializer.py`, `tests/application/library/test_rescan.py`)
|
||||||
|
are module-level skipped with a Phase 4 reference. Phase 4 rewrites
|
||||||
|
`rescan_show` / introduces `rescan_movie` on top of the v2
|
||||||
|
release repositories + library index, then deletes the v1 stack +
|
||||||
|
the abstract ports + the quarantined tests in one swing.
|
||||||
|
- Test suite: 1216 passed, 11 skipped (8 pre-existing + 3 Phase-3
|
||||||
|
quarantines), 4 xfailed. v2 round-trip tests now reference
|
||||||
|
`SCHEMA_VERSION` instead of hard-coded `1` for future-proofing.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
|
||||||
|
- **`.alfred` v2 — Phase 2: new persistence package + TMDB client
|
||||||
|
extensions.** Second phase of `specs/dot_alfred_v2.md` on branch
|
||||||
|
`refactor/dot-alfred-v2`. The new
|
||||||
|
`alfred/infrastructure/persistence/dot_alfred/v2/` package ships
|
||||||
|
the full v2 sidecar stack while leaving v1 (and the existing
|
||||||
|
`TVShow` aggregate) untouched — Phase 3 is the cutover.
|
||||||
|
- **Pydantic DTOs** — `SeriesReleaseSidecar` /
|
||||||
|
`MovieReleaseSidecar` (per-item), `TVShowLibraryIndexSidecar` /
|
||||||
|
`MovieLibraryIndexSidecar` (library-root index). All built on a
|
||||||
|
common `_Strict` base (`extra="forbid"`, `frozen=True`) with a
|
||||||
|
`@model_validator` enforcing `schema_version == 1`.
|
||||||
|
- **Track entries** — `AudioTrackEntry` / `SubtitleEntry` (sidecar
|
||||||
|
cache shape, slimmed from the domain track types). `SubtitleEntry`
|
||||||
|
carries `is_forced` + `is_sdh` as explicit booleans (v1's
|
||||||
|
`type: "sdh"` overload is gone).
|
||||||
|
- **Serializer** — `read_yaml` / `atomic_write_yaml` helpers
|
||||||
|
centralize YAML I/O and atomic writes (`.tmp + os.replace`).
|
||||||
|
`SidecarSchemaError` wraps both YAML parse errors and Pydantic
|
||||||
|
validation errors for uniform catch-and-skip semantics.
|
||||||
|
- **Bridge** — lossless `domain ↔ sidecar` conversion for
|
||||||
|
`SeriesRelease` / `MovieRelease` (round-trippable, including
|
||||||
|
multi-episode ranges and `is_sdh` subtitles); one-way projection
|
||||||
|
for library-index entries (`show_index_entry_from`,
|
||||||
|
`movie_index_entry_from`) that flattens multi-episode files into
|
||||||
|
per-TMDB-slot maps in `seasons[*].episodes`.
|
||||||
|
- **Repositories** —
|
||||||
|
`DotAlfredSeriesReleaseRepository` /
|
||||||
|
`DotAlfredMovieReleaseRepository` walk `library_root/*/` with
|
||||||
|
log+skip on corruption; **`DotAlfredTVShowLibraryIndex`** /
|
||||||
|
**`DotAlfredMovieLibraryIndex`** auto-heal silently on missing or
|
||||||
|
corrupt index files by rebuilding from the per-item sidecars
|
||||||
|
(healed entries keep TMDB-cached fields as placeholders until the
|
||||||
|
next sync repopulates them). Writes are atomic and never auto-heal
|
||||||
|
(read paths handle that).
|
||||||
|
- **TMDB client extensions** — `TmdbSeasonInfo` / `TmdbShowInfo`
|
||||||
|
DTOs + `TMDBClient.get_tv_show_info(tmdb_id)` aggregating
|
||||||
|
`/tv/{id}` + `/tv/{id}/external_ids`. The parsing logic is a pure
|
||||||
|
function (`parse_tv_show_info`) testable without HTTP, with an
|
||||||
|
injectable reference date for deterministic `aired` flag tests.
|
||||||
|
- **`is_sdh` flag on `SubtitleTrack`.** Added to
|
||||||
|
`alfred/domain/shared/media.py::SubtitleTrack` to mirror ffprobe's
|
||||||
|
`hearing_impaired` disposition. Wired through the ffprobe layer
|
||||||
|
(`ffprobe_prober.py`) and the v2 sidecar bridge so SDH information
|
||||||
|
round-trips end-to-end. Defaults to `False` — backwards-compatible
|
||||||
|
for every existing caller.
|
||||||
|
- **37 v2 integration tests** on `tmp_path` covering round-trips
|
||||||
|
(domain ↔ sidecar ↔ YAML ↔ domain), atomic writes (no `.tmp`
|
||||||
|
leftovers), per-item log+skip on corruption / schema mismatch,
|
||||||
|
movie anchor-mismatch warning, full upsert / find / delete on both
|
||||||
|
library indexes, and the auto-heal path on missing / corrupt /
|
||||||
|
schema-mismatched index files. **16 TMDB DTO tests** for the new
|
||||||
|
`parse_tv_show_info` pure function.
|
||||||
|
|
||||||
|
- **`.alfred` v2 — Phase 1: new `releases/` domain.** First step of
|
||||||
|
`specs/dot_alfred_v2.md` on branch `refactor/dot-alfred-v2`. The
|
||||||
|
new `alfred/domain/releases/` package introduces a filesystem-only
|
||||||
|
bounded context separated from TMDB identity (the existing
|
||||||
|
`tv_shows` / `movies` domains). It hosts:
|
||||||
|
- **`EpisodeRange` VO** — covers single-episode files
|
||||||
|
(`EpisodeRange(E02, E02)`) and multi-episode files
|
||||||
|
(`EpisodeRange(E02, E04)` for `SxxE02E03E04.mkv`), with
|
||||||
|
`count()` / `numbers()` / `is_single()` helpers.
|
||||||
|
- **`ReleaseMode` enum** — `PACK` (N video files directly in the
|
||||||
|
season folder) vs `EPISODIC` (N sub-folders, one episode each);
|
||||||
|
classified by the walker, never re-derived.
|
||||||
|
- **Aggregates** — `TrackProfile`, `EpisodeRelease`,
|
||||||
|
`SeasonRelease` (with `episode_count()` summing each file's
|
||||||
|
range), `SeriesRelease`, `MovieRelease`. All frozen
|
||||||
|
dataclasses; mutation via `SeasonReleaseBuilder` /
|
||||||
|
`SeriesReleaseBuilder` (mirror the v1 `TVShowBuilder` pattern,
|
||||||
|
including `from_existing()` round-trip).
|
||||||
|
- **Abstract ports** — `SeriesReleaseRepository`,
|
||||||
|
`MovieReleaseRepository` (concrete `DotAlfred*` arrive in
|
||||||
|
Phase 2).
|
||||||
|
- **`TmdbId` VO** added to `alfred/domain/shared/value_objects.py`
|
||||||
|
(positive int, rejects bool/str/float — symmetry with `ImdbId`).
|
||||||
|
- 73 unit tests covering VO validation, entity invariants, builder
|
||||||
|
sort + overlap detection, and `from_existing()` round-trips. v1
|
||||||
|
code paths untouched at this stage; new domain coexists.
|
||||||
|
|
||||||
|
- **`rescan_show` orchestrator
|
||||||
|
(`alfred/application/library/rescan.py`).** Step 4 of the
|
||||||
|
`specs/dot_alfred.md` plan. Walks an Alfred-managed show folder,
|
||||||
|
runs the existing `inspect_release` pipeline on every video file it
|
||||||
|
finds, and assembles a frozen `TVShow` aggregate persisted via the
|
||||||
|
injected `TVShowRepository`. Reuses the release parser + ffprobe
|
||||||
|
path verbatim — no duplicated parse/probe logic at the library
|
||||||
|
layer. PACK vs EPISODIC inferred per season folder from the
|
||||||
|
on-disk file count + parser output: a single video whose name
|
||||||
|
carries no `Exx` token becomes a PACK season (tracks lifted to the
|
||||||
|
season-level `audio_tracks` / `subtitle_tracks`), anything else
|
||||||
|
becomes EPISODIC (one `Episode` per file). Episode paths are
|
||||||
|
stored relative to the show root for portability. Files that fail
|
||||||
|
to parse a season/episode number, or seasons with mixed numbers,
|
||||||
|
are logged and skipped — the orchestrator never raises. Embedded
|
||||||
|
subtitle tracks are captured from `ffprobe`; adjacent `.srt`
|
||||||
|
files, multi-episode entries (`S01E01E02`), and TMDB-driven PACK
|
||||||
|
detection are tracked as tech debt for a dedicated subtitles /
|
||||||
|
ShowTracker session. 7 integration tests on `tmp_path` with the
|
||||||
|
Foundation layout (S01 EPISODIC + S02 PACK) cover the round-trip
|
||||||
|
through the real `.alfred` repository.
|
||||||
|
- **Show tree walker (`alfred/application/library/walker.py`).**
|
||||||
|
Step 4a foundation. `walk_show(show_root, scanner, kb)` returns a
|
||||||
|
`ShowTree(show_root, season_folders=tuple[SeasonFolder, ...])` —
|
||||||
|
pure structural snapshot, no parsing, no probing. Season folders
|
||||||
|
are detected by a `\bS\d{1,2}\b` token anywhere in the directory
|
||||||
|
name (release-style naming, no Plex `Season 01` / `Specials`
|
||||||
|
conventions). Video files are filtered against
|
||||||
|
`kb.video_extensions`; no recursion into sub-sub-folders. 11 unit
|
||||||
|
tests on `tmp_path` cover detection (case-insensitive, in-word
|
||||||
|
rejection), filtering (subs, NFO, sample files), and edge cases
|
||||||
|
(empty / missing show root).
|
||||||
|
- **Season-level audio/subtitle tracks
|
||||||
|
(`alfred/domain/tv_shows/entities.py`,
|
||||||
|
`alfred/domain/tv_shows/builders.py`).** `Season` now inherits
|
||||||
|
from `MediaWithTracks` and carries `audio_tracks` /
|
||||||
|
`subtitle_tracks` tuples (empty by default). Populated only in
|
||||||
|
PACK mode (the single release covering the whole season); empty in
|
||||||
|
EPISODIC mode where tracks live per-episode. `SeasonBuilder`
|
||||||
|
gains `set_audio_tracks()` / `set_subtitle_tracks()` and forwards
|
||||||
|
them through `from_existing()`. The bridge writes / reads them in
|
||||||
|
the PACK branch via shared `_synth_audio_tracks` /
|
||||||
|
`_synth_subtitle_tracks` helpers used for episodes too.
|
||||||
|
|
||||||
|
- **`DotAlfredTVShowRepository` — filesystem-backed implementation of
|
||||||
|
the `TVShowRepository` port
|
||||||
|
(`alfred/infrastructure/persistence/dot_alfred/repository.py`).**
|
||||||
|
Step 3 of the `specs/dot_alfred.md` plan. Reads and writes one
|
||||||
|
`.alfred` YAML file per show under a configurable `library_root`.
|
||||||
|
`save(show)` writes atomically (`.alfred.tmp` + `os.replace`) into a
|
||||||
|
folder that **must already exist** — the repository never invents a
|
||||||
|
folder name (the upstream `MediaOrganizer` is in charge of placing
|
||||||
|
files; the repo writes the sidecar next to them). `find_by_imdb_id` /
|
||||||
|
`find_all` walk `library_root/*/`, loading each readable sidecar;
|
||||||
|
folders without a sidecar return `None` / are skipped (no implicit
|
||||||
|
cold scan — that is the job of the upcoming `rescan_show` tool).
|
||||||
|
Corrupted YAML and schema violations are logged and skipped, never
|
||||||
|
raised, so a single bad folder does not break the rest of the
|
||||||
|
library. The repo keeps a tiny in-memory `imdb_id → folder_name`
|
||||||
|
index populated on every successful read/save, so subsequent saves
|
||||||
|
find the right destination without re-walking — useful when the show
|
||||||
|
folder name diverges from `show.get_folder_name()` (custom 1080p / 4K
|
||||||
|
variants). 20 integration tests on `tmp_path` cover the round-trip,
|
||||||
|
cold folder / unknown id returns, multi-show `find_all`, corrupted /
|
||||||
|
wrong-schema skipping, atomic write (no `.alfred.tmp` left behind),
|
||||||
|
overwrite, and folder-name fallbacks.
|
||||||
|
- **Sidecar ↔ TVShow bridge
|
||||||
|
(`alfred/infrastructure/persistence/dot_alfred/bridge.py`).**
|
||||||
|
`to_sidecar(show, folder_paths=...)` summarizes the rich domain
|
||||||
|
`AudioTrack` / `SubtitleTrack` to the sidecar's compact form (unique
|
||||||
|
audio languages in track order; subtitle entries derived from
|
||||||
|
`is_forced` and assumed `source="embedded"`). `from_sidecar(sidecar,
|
||||||
|
title=...)` reconstructs the domain `TVShow` with synthesized tracks
|
||||||
|
— one `AudioTrack` per language, one `SubtitleTrack` per entry, with
|
||||||
|
ffprobe-only fields (`codec`, `channels`, `channel_layout`) left as
|
||||||
|
`None`. The bridge is intentionally lossy on probe minutiae the
|
||||||
|
sidecar does not store; this is the documented trade-off from the
|
||||||
|
factual-only spec.
|
||||||
|
|
||||||
|
- **`.alfred` sidecar serializer
|
||||||
|
(`alfred/infrastructure/persistence/dot_alfred/`).** Implements step 2
|
||||||
|
of the `specs/dot_alfred.md` plan. Pure-dict in/out
|
||||||
|
(`serialize(sidecar) -> dict`, `deserialize(data) -> ShowSidecar`) —
|
||||||
|
YAML I/O lives in the repository layer (step 3) and is kept out for
|
||||||
|
trivial testability. Ships the DTOs that mirror the YAML schema
|
||||||
|
field-for-field (`ShowSidecar`, `SeasonSidecar`, `EpisodeSidecar`,
|
||||||
|
`SubtitleEntry`). The sidecar acts as a **scan cache**: it stores
|
||||||
|
only what is genuinely costly to recompute — folder/file paths
|
||||||
|
(skipping the FS walk) and probed track metadata (skipping ffprobe).
|
||||||
|
Release identifiers (group, source, quality, codec) live in folder
|
||||||
|
and file names and are derived on demand by the parser — they are
|
||||||
|
deliberately absent from the schema and rejected on deserialize. The
|
||||||
|
serializer is **strict on schema**: unknown keys at any level raise
|
||||||
|
`SidecarSchemaError`, missing required fields raise clearly, and
|
||||||
|
`bool` cannot sneak in as a season/episode number. Optional fields
|
||||||
|
(`tmdb_id`, empty `audio`/`subtitles`/`episodes`) are omitted from
|
||||||
|
the output rather than emitted as `null` / `[]`. Tests cover
|
||||||
|
round-trip equivalence (DTO → dict → DTO and DTO → YAML text → DTO),
|
||||||
|
the Foundation S01 PACK case (real-world fixture with mixed sub
|
||||||
|
types — superset captured at season scope), and a Breaking Bad S05
|
||||||
|
EPISODIC case. An on-disk `tmp_path` fixture recreates the Foundation
|
||||||
|
folder structure with placeholder files, ready to be reused by the
|
||||||
|
upcoming repository walk tests in step 3.
|
||||||
|
|
||||||
|
- **`TVShowBuilder` / `SeasonBuilder` — sole construction surface for the
|
||||||
|
TVShow aggregate** (`alfred/domain/tv_shows/builders.py`). The aggregate
|
||||||
|
is now fully frozen; building goes through a mutable scratchpad that
|
||||||
|
emits an immutable `TVShow` via `build()`. Both builders offer a
|
||||||
|
`from_existing()` classmethod to seed from a current frozen aggregate
|
||||||
|
and apply modifications. Episodes are emitted sorted by number within a
|
||||||
|
season, seasons sorted by number within the show.
|
||||||
|
- **`SeasonMode` enum** (`PACK` / `EPISODIC`) in
|
||||||
|
`alfred/domain/tv_shows/value_objects.py`. Computed at read time from
|
||||||
|
the season's structural shape (`Season.mode` property): a season with
|
||||||
|
no explicit episodes is `PACK` (a single release covering the whole
|
||||||
|
season), a season with episodes is `EPISODIC` (currently airing, one
|
||||||
|
release per episode). Never stored — the YAML sidecar encodes the
|
||||||
|
mode via the presence/absence of the `episodes:` block.
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
|
||||||
|
- **TVShow aggregate is now frozen all the way down.** `TVShow`,
|
||||||
|
`Season` and `Episode` are all `@dataclass(frozen=True)`. Children
|
||||||
|
are stored as ordered tuples (`tuple[Season, ...]`,
|
||||||
|
`tuple[Episode, ...]`) sorted by their respective numbers, replacing
|
||||||
|
the previous mutable dicts. Lookup helpers `TVShow.get_season(n)` and
|
||||||
|
`Season.get_episode(n)` traverse the tuple lazily via `next()`. The
|
||||||
|
former `add_episode` / `add_season` mutation methods are gone — all
|
||||||
|
construction goes through `TVShowBuilder` / `SeasonBuilder`.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
|
||||||
|
- **ShowTracker-territory fields stripped from the TVShow aggregate.**
|
||||||
|
The aggregate now models only what the `.alfred` sidecar stores
|
||||||
|
(filesystem-observable facts + immutable identity). Dropped from the
|
||||||
|
domain:
|
||||||
|
- `TVShow.status` (`ShowStatus`) and the `ShowStatus` enum entirely,
|
||||||
|
along with its TMDB string mapping (`from_string`).
|
||||||
|
- `TVShow.expected_seasons`, `Season.expected_episodes`,
|
||||||
|
`Season.aired_episodes`, `Season.name`.
|
||||||
|
- `TVShow.collection_status()`, `is_complete_series()`,
|
||||||
|
`missing_episodes()`, `is_ongoing()`, `is_ended()` and the
|
||||||
|
`CollectionStatus` enum.
|
||||||
|
- `Season.is_complete()`, `is_fully_aired()`, `missing_episodes()`
|
||||||
|
and the `aired ≤ expected` validation.
|
||||||
|
- `TVShow.add_episode()` / `TVShow.add_season()` /
|
||||||
|
`Season.add_episode()` — replaced by the builder API.
|
||||||
|
These concerns will reappear in a dedicated `ShowTracker` layer (to
|
||||||
|
be designed) that combines the `.alfred` sidecar with live TMDB data
|
||||||
|
to answer questions like "is this show complete?" or "are new
|
||||||
|
episodes out?". Keeping volatile/derived state out of the aggregate
|
||||||
|
matches the factuel-only philosophy locked in `specs/dot_alfred.md`.
|
||||||
|
|
||||||
|
### Internal
|
||||||
|
|
||||||
|
- **Test suite rewritten for the new aggregate shape.**
|
||||||
|
`tests/domain/test_tv_shows.py` now covers frozen invariants, builder
|
||||||
|
ordering, last-write-wins on duplicates, `from_existing` round-trip,
|
||||||
|
and `SeasonMode` derivation. `tests/infrastructure/test_filesystem_extras.py`
|
||||||
|
helper simplified (no more `ShowStatus.ENDED` / `expected_seasons` on
|
||||||
|
test shows). 1078 tests still green.
|
||||||
|
|
||||||
|
- **Design doc for `.alfred/` sidecar persistence
|
||||||
|
(`specs/dot_alfred.md`).** First entry in the new `specs/` directory.
|
||||||
|
Specifies a per-show `.alfred/` directory holding a `show.yaml` and
|
||||||
|
one `season_NN.yaml` per season, used by the upcoming concrete
|
||||||
|
`TVShowRepository` to cache parse/probe results and avoid full
|
||||||
|
rescans on every library read. Covers schema, naming conventions,
|
||||||
|
cache invalidation strategy (size + mtime), self-healing on
|
||||||
|
drift, atomicity (`os.replace`), edge cases (legacy folders,
|
||||||
|
corrupted sidecars, manual file removal), and a phased
|
||||||
|
implementation plan. No code yet — spec only.
|
||||||
|
|
||||||
|
### Internal
|
||||||
|
|
||||||
|
- **`specs/` is now tracked.** The repo-level `.gitignore` had a
|
||||||
|
blanket `*.md` rule with only `CHANGELOG.md` allow-listed. Added
|
||||||
|
explicit exceptions for `/README.md` (root only — avoids
|
||||||
|
unintentionally exposing fixture READMEs) and `specs/**/*.md` so the
|
||||||
|
new design-doc directory ships with the project. Also added an
|
||||||
|
explicit `/.claude/` ignore line for the private dev-docs sub-repo
|
||||||
|
that sits inside the working tree but is versioned separately.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
|
||||||
|
- **Multi-episode chain (e.g. `S14E09E10E11`) now collapses to a full
|
||||||
|
range.** The parser previously captured `episode=9, episode_end=10`
|
||||||
|
and dropped E11+. It now returns `episode=first, episode_end=last`,
|
||||||
|
with intermediate values implied. Fixture
|
||||||
|
`shitty/archer_multi_episode/` updated from anti-regression-of-bug
|
||||||
|
to anti-regression-of-fix.
|
||||||
|
- **Apostrophes in titles no longer push the release through the AI
|
||||||
|
fallback.** `Honey.Don't.2025.2160p.WEBRip.DSNP.DV.HDR.x265-Amen`
|
||||||
|
previously parsed with `parse_path="ai"` and everything UNKNOWN
|
||||||
|
because `'` is in the forbidden-chars list. Apostrophes are now
|
||||||
|
pre-stripped before the well-formed check, so the parse completes
|
||||||
|
normally (`title=Honey.Dont, year=2025, quality=2160p, ...`); only
|
||||||
|
the title text loses its apostrophe. `parse_path` becomes
|
||||||
|
`sanitized` to surface the cleanup. Side win: PoP fixture
|
||||||
|
`the_prodigy_full_chaos/` also moves from total failure to a
|
||||||
|
partially-correct parse (year, source, codec extracted).
|
||||||
|
- **Season-range markers (`Sxx-yy`) are now recognized as
|
||||||
|
`tv_complete`.** `Der.Tatortreiniger.S01-06.GERMAN...` previously
|
||||||
|
parsed as `media_type=movie` with `S01-06` glued onto the title.
|
||||||
|
The parser now recognizes the range, sets `season=first`,
|
||||||
|
`media_type=tv_complete`, and removes the marker from the title.
|
||||||
|
`is_season_pack` flips to `true`.
|
||||||
|
- **Pure-punctuation TITLE tokens are dropped at assembly.** Releases
|
||||||
|
with surrounding ` - ` separators (`Vinyl - 1x01 - FHD`) previously
|
||||||
|
produced `title="Vinyl.-"`. Such tokens (a stray dash, a wide pipe
|
||||||
|
`|`, …) carry no title content and are now filtered out. Side
|
||||||
|
effect: PoP fixture `khruangbin_yt_wide_pipe/` also benefits — the
|
||||||
|
YouTube wide-pipe no longer leaks into the title.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
|
||||||
|
- **Fullwidth vertical bar `|` (U+FF5C) is now a recognized release-name
|
||||||
|
token separator.** Added to `alfred/knowledge/release/separators.yaml`
|
||||||
|
so CJK release names (and the occasional decorative YouTube-style use)
|
||||||
|
tokenize cleanly instead of leaving the wide pipe glued onto an
|
||||||
|
adjacent token. The tokenizer in
|
||||||
|
`alfred/domain/release/parser/pipeline.py` already iterates the
|
||||||
|
separator list as plain strings (no regex), so a multi-byte UTF-8
|
||||||
|
separator works without any code change.
|
||||||
|
|
||||||
|
- **`InspectedResult.recommended_action` property** — derived hint that
|
||||||
|
collapses the orchestrator's go / wait / skip decision into a single
|
||||||
|
value (``"process"`` / ``"ask_user"`` / ``"skip"``). Centralizes the
|
||||||
|
exclusion logic that was previously dispersed across road /
|
||||||
|
media_type / main_video checks at each call site. Ordering is part of
|
||||||
|
the contract: ``skip`` (no main video, or media_type == ``"other"``)
|
||||||
|
wins over ``ask_user`` (media_type == ``"unknown"`` or road ==
|
||||||
|
``"path_of_pain"``) which wins over ``process``. Surfaced through the
|
||||||
|
``analyze_release`` tool so the LLM can route on it directly.
|
||||||
|
6 new tests in ``tests/application/test_inspect.py`` cover the four
|
||||||
|
branches and the precedence rules.
|
||||||
|
- **`LanguageRepository` port** in `alfred.domain.shared.ports`. Structural
|
||||||
|
Protocol covering `from_iso`, `from_any`, `all`, `__contains__`, `__len__`
|
||||||
|
— the surface previously coupled to the concrete `LanguageRegistry`.
|
||||||
|
Mirrors the `MediaProber` / `FilesystemScanner` pattern: domain code
|
||||||
|
depends on the Protocol, infrastructure provides the YAML-backed
|
||||||
|
adapter. Tests in `tests/infrastructure/test_language_registry.py`.
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
|
||||||
|
- **`Movie` and `Episode` are now frozen dataclasses.** Both entities
|
||||||
|
hold their track collections as `tuple[AudioTrack, ...]` and
|
||||||
|
`tuple[SubtitleTrack, ...]` instead of mutable lists, and are
|
||||||
|
`@dataclass(frozen=True, eq=False)` (identity-based equality
|
||||||
|
preserved via `__eq__`/`__hash__`). `__post_init__` coercion uses
|
||||||
|
`object.__setattr__` for the `imdb_id` / `title` /
|
||||||
|
`season_number` / `episode_number` normalizations. To project
|
||||||
|
enrichment results (probe output, file metadata) callers now rebuild
|
||||||
|
via `dataclasses.replace(...)`. Pattern aligned with the recent
|
||||||
|
`ParsedRelease` freeze. `MediaWithTracks` mixin contract updated to
|
||||||
|
`tuple` accordingly. `Season` and `TVShow` remain mutable for now —
|
||||||
|
freezing the aggregate root would cascade a full reconstruction on
|
||||||
|
every `add_episode`, deferred.
|
||||||
|
- **`SubtitleCandidate` renamed to `SubtitleScanResult`.** The old name
|
||||||
|
conflated "this might become a placed subtitle" with "this is what a
|
||||||
|
scan pass produced". The class is the output of a scan/identify pass
|
||||||
|
— language/format may still be `None`, confidence reflects how sure
|
||||||
|
the classifier is, and `raw_tokens` holds the filename fragments
|
||||||
|
under analysis. `SubtitleScanResult` says that directly. Pure rename
|
||||||
|
with a refreshed docstring in `alfred/domain/subtitles/entities.py`;
|
||||||
|
no behavior change. Touches the domain entity + `__init__` export,
|
||||||
|
the matcher / identifier / utils services, the manage_subtitles use
|
||||||
|
case, the placer, the metadata store, the shared-media cross-ref
|
||||||
|
comment, and the seven test modules that imported the type.
|
||||||
|
|
||||||
|
- **`ParsedRelease` is now frozen; enrichment passes return new
|
||||||
|
instances.** The VO was mutable so `detect_media_type` and
|
||||||
|
`enrich_from_probe` could patch fields in place — a code smell in a
|
||||||
|
value object whose identity *is* its content. `ParsedRelease` is now
|
||||||
|
`@dataclass(frozen=True)`; `languages` is a `tuple[str, ...]`
|
||||||
|
instead of a `list[str]`. `enrich_from_probe` returns a new
|
||||||
|
`ParsedRelease` via `dataclasses.replace` (only allocates when at
|
||||||
|
least one field actually changed). `inspect_release` rebinds
|
||||||
|
`parsed` after both `detect_media_type` (wrapped in `MediaTypeToken`
|
||||||
|
to satisfy the strict isinstance check that now also runs on
|
||||||
|
replace) and `enrich_from_probe`. Parser pipeline now packs
|
||||||
|
`languages` as a tuple in the assemble dict. Callers updated:
|
||||||
|
`inspect_release`, `testing/recognize_folders_in_downloads.py`, and
|
||||||
|
the enrichment tests (22 call sites + language assertions switched
|
||||||
|
to tuple literals).
|
||||||
|
- **`resolve_destination` use cases take `kb` / `prober` as required
|
||||||
|
params; module-level singletons gone.** The four
|
||||||
|
`resolve_{season,episode,movie,series}_destination` use cases now
|
||||||
|
accept `kb: ReleaseKnowledge` and `prober: MediaProber` as required
|
||||||
|
arguments, matching the shape of `inspect_release`. The module-level
|
||||||
|
`_KB = YamlReleaseKnowledge()` and `_PROBER = FfprobeMediaProber()`
|
||||||
|
singletons that previously lived in
|
||||||
|
`alfred/application/filesystem/resolve_destination.py` are removed —
|
||||||
|
the application layer no longer reaches into infrastructure. The
|
||||||
|
singletons now live at the agent-tools frontier
|
||||||
|
(`alfred/agent/tools/filesystem.py`), where the LLM-facing wrappers
|
||||||
|
instantiate them once and thread them through. `analyze_release` no
|
||||||
|
longer needs the dirty `from ... import _KB` indirection. Tests
|
||||||
|
inject their own stubs by keyword (`prober=_StubProber(...)`) instead
|
||||||
|
of monkeypatching a module attribute.
|
||||||
|
- **`ParsePath` enum renamed to `TokenizationRoute`.** The old name
|
||||||
|
collided with `pathlib.Path` in code-reading mental models, and was
|
||||||
|
one letter from `parse_path` (the field that holds the value) — making
|
||||||
|
it harder than it needed to be to spot the type vs the attribute.
|
||||||
|
``TokenizationRoute`` says what it actually captures (DIRECT /
|
||||||
|
SANITIZED / AI = how the name reached the tokenizer), and the class
|
||||||
|
docstring now spells out the orthogonality with ``Road`` (EASY /
|
||||||
|
SHITTY / PATH_OF_PAIN, which captures parser confidence on
|
||||||
|
``ParseReport``). The ``parse_path`` field name stays unchanged —
|
||||||
|
string values too — so YAML fixtures, the ``analyze_release`` tool
|
||||||
|
spec, and any external consumer are untouched.
|
||||||
|
- **`enrich_from_probe` codec mappings moved to YAML.** The three
|
||||||
|
hard-coded module dicts (`_VIDEO_CODEC_MAP`, `_AUDIO_CODEC_MAP`,
|
||||||
|
`_CHANNEL_MAP`) translating ffprobe output to scene tokens
|
||||||
|
(`hevc → x265`, `eac3 → EAC3`, `8 → "7.1"`, …) now live in
|
||||||
|
`alfred/knowledge/release/probe_mappings.yaml` and are loaded into
|
||||||
|
`ReleaseKnowledge.probe_mappings` (new port field, populated by
|
||||||
|
`YamlReleaseKnowledge`). `enrich_from_probe` gains a third `kb`
|
||||||
|
parameter and reads the maps from there. Aligns with the CLAUDE.md
|
||||||
|
rule that lookup tables of domain knowledge belong in YAML, not in
|
||||||
|
Python — and opens the door to a future "learn new codec" pass.
|
||||||
|
Callers updated: `inspect_release`, `testing/recognize_folders_in_downloads.py`,
|
||||||
|
and all 22 sites in `tests/application/test_enrich_from_probe.py`.
|
||||||
|
- **`ParsedRelease.tech_string` is now a derived `@property`**
|
||||||
|
(`alfred/domain/release/value_objects.py`). It computes
|
||||||
|
`quality.source.codec` joined by dots on every access, so it stays in
|
||||||
|
sync with the underlying fields by construction. The stored field is
|
||||||
|
gone from the dataclass, the dict returned by `assemble()` no longer
|
||||||
|
carries the key, `parse_release`'s malformed-name fallback drops the
|
||||||
|
`tech_string=""` kwarg, and `enrich_from_probe` no longer re-derives
|
||||||
|
it after filling `quality`/`source`/`codec`. Closes the
|
||||||
|
parser/enrichment double-source-of-truth that `e79ca46` had to fix
|
||||||
|
reactively. The fixtures runner now injects `tech_string` alongside
|
||||||
|
`is_season_pack` since `asdict()` skips properties.
|
||||||
|
- **`RuleScope.level` is now an enum (`RuleScopeLevel`).** The set of
|
||||||
|
valid levels (global, release_group, movie, show, season, episode)
|
||||||
|
was documented only in a docstring comment and validated nowhere.
|
||||||
|
`RuleScopeLevel(str, Enum)` keeps wire compatibility (YAML
|
||||||
|
serialization, `.value` access) while making the closed set explicit
|
||||||
|
to type-checkers and IDEs. `to_dict()` emits `.value` strings so
|
||||||
|
YAML output is unchanged.
|
||||||
|
- **`FilePath` VO uses `__post_init__` instead of a hand-rolled
|
||||||
|
`__init__`.** Same public API (accepts `str | Path`), same behavior,
|
||||||
|
but the dataclass-generated `__init__` is no longer bypassed. One
|
||||||
|
less smell in the shared VOs.
|
||||||
|
- **`Language` VO is strict by default; `Language.from_raw()` factory
|
||||||
|
for normalization.** The previous `__post_init__` mutated `iso` and
|
||||||
|
`aliases` via `object.__setattr__` on a frozen dataclass — a code
|
||||||
|
smell hiding behind the dataclass facade. Split: the direct
|
||||||
|
constructor now rejects un-normalized input (uppercase iso,
|
||||||
|
whitespace in aliases, etc.), and `Language.from_raw()` handles
|
||||||
|
arbitrary YAML/user input. Only one caller (LanguageRegistry loading
|
||||||
|
the ISO YAML) needed migration.
|
||||||
|
- **`ParsedRelease.normalised` renamed to `clean`.** The field name
|
||||||
|
promised "dots instead of spaces" but in practice held
|
||||||
|
`raw - site_tag - apostrophes` — only used by `season_folder_name()`.
|
||||||
|
Renamed and docstring corrected.
|
||||||
|
- **`ParsedRelease.media_type` / `parse_path` are strict enums.** The
|
||||||
|
fields were already typed as `MediaTypeToken` / `ParsePath`, but a
|
||||||
|
tolerant `__post_init__` coerced raw strings. With both classes
|
||||||
|
being `(str, Enum)`, the coercion served no purpose. Strict
|
||||||
|
constructor; `.value` no longer passed at call sites; dropped the
|
||||||
|
unused `_VALID_MEDIA_TYPES` / `_VALID_PARSE_PATHS` lookup tables.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
|
||||||
|
- **`settings.min_movie_size_bytes`** — orphan Pydantic field +
|
||||||
|
validator. Its only consumer (`MovieService.validate_movie_file`)
|
||||||
|
had been removed during an earlier refactor. The "real movie vs
|
||||||
|
sample" rule now lives in extension-based exclusion
|
||||||
|
(`application/release/supported_media.py`) and PoP. If a size
|
||||||
|
threshold is ever needed, it'll go in a knowledge YAML, not in
|
||||||
|
`settings`.
|
||||||
|
|
||||||
|
### Internal
|
||||||
|
|
||||||
|
- **Flattened `alfred.domain.shared.media/` package into a single
|
||||||
|
`media.py` module.** The 6-file package (audio, video, subtitle,
|
||||||
|
info, matching, tracks_mixin + `__init__`) collapsed into one ~250
|
||||||
|
LoC module. All 12 import sites continue to resolve unchanged
|
||||||
|
(`from alfred.domain.shared.media import AudioTrack, MediaInfo, …`)
|
||||||
|
since Python treats `media.py` and `media/__init__.py`
|
||||||
|
interchangeably for import paths. Easier to scan when the whole
|
||||||
|
bounded-context fits on one screen.
|
||||||
|
- **`SubtitleKnowledgeBase` types `language_registry` against the
|
||||||
|
`LanguageRepository` port** instead of the concrete `LanguageRegistry`
|
||||||
|
class. The default constructor still instantiates the concrete adapter
|
||||||
|
when no repository is injected — behaviour is unchanged for existing
|
||||||
|
callers. Opens the door to in-memory fakes in future tests without
|
||||||
|
loading the full ISO 639 YAML.
|
||||||
|
- **Moved `detect_media_type` and `enrich_from_probe` from
|
||||||
|
`alfred.application.filesystem` to `alfred.application.release`**.
|
||||||
|
They are inspection-pipeline helpers — their natural home is next to
|
||||||
|
`inspect_release`, not next to the filesystem use cases. The move
|
||||||
|
also eliminates a circular-import workaround in
|
||||||
|
`resolve_destination.py`: `inspect_release` can now be imported at
|
||||||
|
module top instead of lazily inside `_resolve_parsed`. Public
|
||||||
|
surface is unchanged for callers that imported the helpers from
|
||||||
|
their full module paths (the only call sites — `inspect.py`, two
|
||||||
|
tests, one testing script — were updated in this commit).
|
||||||
|
|
||||||
|
### Added
|
||||||
|
|
||||||
|
- **`resolve_*_destination` use cases now consume `inspect_release`**.
|
||||||
|
`resolve_episode_destination` and `resolve_movie_destination` reuse
|
||||||
|
their existing `source_file` parameter as the inspection target;
|
||||||
|
`resolve_season_destination` and `resolve_series_destination` gain
|
||||||
|
a new **optional** `source_path` parameter (also threaded through
|
||||||
|
the tool wrappers and YAML specs). When the path exists, ffprobe
|
||||||
|
data fills tokens missing from the release name (e.g. quality) and
|
||||||
|
refreshes `tech_string`, so the destination folder / file names
|
||||||
|
end up more accurate. When the path is missing or absent (back-compat
|
||||||
|
callers), the use cases fall back to parse-only — same behavior as
|
||||||
|
before.
|
||||||
|
|
||||||
|
### Fixed
|
||||||
|
|
||||||
|
- **`enrich_from_probe` now refreshes `tech_string`** after filling
|
||||||
|
`quality` / `source` / `codec`. Previously the field stayed at its
|
||||||
|
parser-time value, so filename builders saw stale tech tokens even
|
||||||
|
after a successful probe. New `TestTechString` class in
|
||||||
|
`tests/application/test_enrich_from_probe.py` locks the behavior.
|
||||||
|
|
||||||
|
### Added
|
||||||
|
|
||||||
|
- **`inspect_release` orchestrator + `InspectedResult` VO**
|
||||||
|
(`alfred/application/release/inspect.py`). Single composition of the
|
||||||
|
four inspection layers: `parse_release` → `detect_media_type` (patches
|
||||||
|
`parsed.media_type`) → `find_main_video` (top-level scan) →
|
||||||
|
`prober.probe` + `enrich_from_probe` when a video exists and the
|
||||||
|
refined media type isn't in `{"unknown", "other"}`. Returns a frozen
|
||||||
|
`InspectedResult(parsed, report, source_path, main_video, media_info,
|
||||||
|
probe_used)` that downstream callers consume directly instead of
|
||||||
|
rebuilding the same chain. `kb` and `prober` are injected — no
|
||||||
|
module-level singletons. Never raises.
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
|
||||||
|
- **`analyze_release` tool now delegates to `inspect_release`** — same
|
||||||
|
output shape, plus two new fields: `confidence` (0–100) and `road`
|
||||||
|
(`"easy"` / `"shitty"` / `"path_of_pain"`) surfaced from the parser's
|
||||||
|
`ParseReport`. The tool spec (`specs/analyze_release.yaml`) documents
|
||||||
|
both fields so the LLM can route releases by confidence.
|
||||||
|
|
||||||
|
- **`MediaProber` port now covers full media probing**: added
|
||||||
|
`probe(video) -> MediaInfo | None` alongside the existing
|
||||||
|
`list_subtitle_streams`. `FfprobeMediaProber` (in
|
||||||
|
`alfred/infrastructure/probe/`) implements both methods and is now
|
||||||
|
the single adapter shelling out to `ffprobe`. The standalone
|
||||||
|
`alfred/infrastructure/filesystem/ffprobe.py` module was removed —
|
||||||
|
all callers (tools, testing scripts) instantiate
|
||||||
|
`FfprobeMediaProber` instead. Unblocks the upcoming
|
||||||
|
`inspect_release` orchestrator, which depends on the port.
|
||||||
|
|
||||||
|
### Removed
|
||||||
|
|
||||||
|
- `alfred/infrastructure/filesystem/ffprobe.py` (folded into the
|
||||||
|
`FfprobeMediaProber` adapter).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [2026-05-20] — Release parser confidence scoring + exclusion
|
||||||
|
|
||||||
|
### Added
|
||||||
|
|
||||||
|
- **Pre-pipeline exclusion helpers** (`alfred/application/release/supported_media.py`):
|
||||||
|
`is_supported_video(path, kb)` (extension-only check against
|
||||||
|
`kb.video_extensions`) and `find_main_video(folder, kb)` (top-level
|
||||||
|
scan, lexicographically-first eligible file, returns `None` when no
|
||||||
|
video qualifies; accepts a bare file as folder for single-file
|
||||||
|
releases). No size threshold, no filename heuristics —
|
||||||
|
PATH_OF_PAIN handles the exotic cases. Foundation for the future
|
||||||
|
`inspect_release` orchestrator.
|
||||||
|
|
||||||
|
- **Release parser — parse-confidence scoring** (`alfred/domain/release/parser/scoring.py`,
|
||||||
|
`alfred/knowledge/release/scoring.yaml`). `parse_release` now returns
|
||||||
|
`(ParsedRelease, ParseReport)`. The new `ParseReport` frozen VO
|
||||||
|
carries a 0–100 `confidence`, a `road` (`"easy"` / `"shitty"` /
|
||||||
|
`"path_of_pain"`), the residual UNKNOWN tokens, and the missing
|
||||||
|
critical fields. EASY is decided structurally (a group schema
|
||||||
|
matched); SHITTY vs PATH_OF_PAIN is decided by score against a
|
||||||
|
YAML-configurable cutoff (default 60). Weights and penalties also
|
||||||
|
live in `scoring.yaml` — title 30, media_type 20, year 15, season
|
||||||
|
10, episode 5, tech 5 each; penalty 5 per UNKNOWN token capped at
|
||||||
|
-30. `Road` is a new enum, distinct from `ParsePath` (which records
|
||||||
|
the tokenization route, not the confidence tier). `ReleaseKnowledge`
|
||||||
|
port gains a `scoring: dict` field.
|
||||||
|
|
||||||
|
### Changed
|
||||||
|
|
||||||
|
- **`parse_release` signature** is now `(name, kb) → tuple[ParsedRelease,
|
||||||
|
ParseReport]` instead of returning a bare `ParsedRelease`. Call
|
||||||
|
sites updated in `application/filesystem/resolve_destination.py` and
|
||||||
|
`agent/tools/filesystem.py`. Tests updated accordingly.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## [2026-05-20] — Release parser v2 (EASY + SHITTY)
|
||||||
|
|
||||||
|
### Added
|
||||||
|
|
||||||
|
- **Release parser v2 — EASY path live** (`alfred/domain/release/parser/`):
|
||||||
|
new annotate-based pipeline (tokenize → annotate → assemble) drives
|
||||||
|
releases from known groups. Exposes `Token` (frozen VO with `index` +
|
||||||
|
`role` + `extra`), `TokenRole` enum (structural/technical/meta families),
|
||||||
|
and `GroupSchema` / `SchemaChunk` value objects.
|
||||||
|
- `pipeline.tokenize`: string-ops separator split (no regex), strips
|
||||||
|
a `[site.tag]` prefix/suffix first.
|
||||||
|
- `pipeline.annotate`: detects the trailing group right-to-left
|
||||||
|
(priority to `codec-GROUP` shape, fallback to any non-source dashed
|
||||||
|
token), looks up its `GroupSchema`, then walks tokens and schema
|
||||||
|
chunks in lockstep — optional chunks that don't match are skipped,
|
||||||
|
mandatory mismatches abort EASY and return `None` so the caller can
|
||||||
|
fall back to SHITTY.
|
||||||
|
- `pipeline.assemble`: folds annotated tokens into a
|
||||||
|
`ParsedRelease`-compatible dict.
|
||||||
|
- `parse_release` (in `release.services`) tries the v2 EASY path first
|
||||||
|
and falls through to the legacy SHITTY heuristic on `None`. Legacy
|
||||||
|
SHITTY/PATH OF PAIN behavior is unchanged.
|
||||||
|
- Knowledge: `alfred/knowledge/release/release_groups/{kontrast,elite,
|
||||||
|
rarbg}.yaml` declare the canonical chunk order per group, loaded via
|
||||||
|
new `ReleaseKnowledge.group_schema(name)` port method.
|
||||||
|
- Tests in `tests/domain/release/test_parser_v2_{scaffolding,easy}.py`
|
||||||
|
cover token VOs, site-tag stripping, group detection, schema-driven
|
||||||
|
annotation (movie, TV episode, season pack with optional source),
|
||||||
|
and field assembly.
|
||||||
|
|
||||||
|
- **Release parser v2 — enricher pass** completes the EASY pipeline.
|
||||||
|
The structural schema walk now tolerates non-positional tokens
|
||||||
|
between chunks (instead of aborting on leftover tokens), and a second
|
||||||
|
pass tags them with audio / video-meta / edition / language roles.
|
||||||
|
Multi-token sequences from `audio.yaml`, `video.yaml`, `editions.yaml`
|
||||||
|
(e.g. `DTS.HD.MA`, `DV.HDR10`, `TrueHD.Atmos`, `DIRECTORS.CUT`) are
|
||||||
|
matched before single tokens. Channel layouts like `5.1` and `7.1`
|
||||||
|
(split into two tokens by the `.` separator) are detected as
|
||||||
|
consecutive pairs. Sequence members carry an `extra["sequence_member"]`
|
||||||
|
marker so `assemble` extracts the canonical value only from the
|
||||||
|
primary token. KONTRAST releases with audio / HDR / edition / language
|
||||||
|
metadata now produce a fully populated `ParsedRelease`.
|
||||||
|
|
||||||
|
- **Streaming distributor as a separate dimension** from encoding source.
|
||||||
|
New `alfred/knowledge/release/distributors.yaml` (NF, AMZN, DSNP, HMAX,
|
||||||
|
ATVP, HULU, PCOK, PMTP, CR) feeds a new `ReleaseKnowledge.distributors`
|
||||||
|
port field, a `TokenRole.DISTRIBUTOR` annotation, and a
|
||||||
|
`ParsedRelease.distributor` field. `WEB-DL` stays the source; the
|
||||||
|
platform that produced the release is now recorded distinctly. The
|
||||||
|
five entries (NF, AMZN, DSNP, HMAX, ATVP) were correspondingly removed
|
||||||
|
from `sources.yaml`.
|
||||||
|
|
||||||
- **Real-world release fixtures** under `tests/fixtures/releases/{easy,shitty,path_of_pain}/`,
|
- **Real-world release fixtures** under `tests/fixtures/releases/{easy,shitty,path_of_pain}/`,
|
||||||
each documenting an expected `ParsedRelease` plus the future `routing`
|
each documenting an expected `ParsedRelease` plus the future `routing`
|
||||||
(library / torrents / seed_hardlinks) for the upcoming `organize_media`
|
(library / torrents / seed_hardlinks) for the upcoming `organize_media`
|
||||||
@@ -54,6 +918,22 @@ callers).
|
|||||||
|
|
||||||
### Changed
|
### Changed
|
||||||
|
|
||||||
|
- **Release parser v2 — SHITTY simplified to dict-driven tagging**.
|
||||||
|
The legacy ~480-line heuristic block in `release/services.py` is gone;
|
||||||
|
`pipeline._annotate_shitty` does a single pass that looks each token
|
||||||
|
up in the kb buckets (resolutions / sources / codecs / distributors /
|
||||||
|
year / `SxxExx`) with first-match-wins semantics, and the leftmost
|
||||||
|
contiguous UNKNOWN run becomes the title. `annotate()` no longer
|
||||||
|
returns `None` — SHITTY is the always-on fallback when no group schema
|
||||||
|
matches. `services.py` shrunk from ~525 to ~85 lines. Four fixtures
|
||||||
|
(`deutschland_franchise_box`, `sleaford_yt_slug`,
|
||||||
|
`super_mario_bilingual`, `predator_space_separators` — the last one
|
||||||
|
moved from `shitty/` → `path_of_pain/`) are now marked
|
||||||
|
`pytest.mark.xfail(strict=False)` documenting PoP-grade pathologies
|
||||||
|
that SHITTY intentionally won't handle. `ReleaseFixture` grows an
|
||||||
|
`xfail_reason` field; the parametrized suite wires the xfail mark
|
||||||
|
automatically.
|
||||||
|
|
||||||
- **`parse_release` tokenizer is now data-driven**: it splits on any character
|
- **`parse_release` tokenizer is now data-driven**: it splits on any character
|
||||||
listed in `separators.yaml` (regex character class) instead of `name.split(".")`.
|
listed in `separators.yaml` (regex character class) instead of `name.split(".")`.
|
||||||
This makes YTS-style releases (`The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]`),
|
This makes YTS-style releases (`The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]`),
|
||||||
|
|||||||
@@ -6,13 +6,13 @@ from collections.abc import AsyncGenerator
|
|||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import Any
|
from typing import Any
|
||||||
|
|
||||||
from alfred.infrastructure.metadata import MetadataStore
|
from alfred.infrastructure.metadata_TO_CHECK import MetadataStore
|
||||||
from alfred.infrastructure.persistence import get_memory
|
from alfred.infrastructure.persistence_TO_CHECK import get_memory
|
||||||
from alfred.settings import settings
|
from alfred.settings import settings
|
||||||
|
|
||||||
from .prompt import PromptBuilder
|
from .prompt import PromptBuilder
|
||||||
from .registry import Tool, make_tools
|
from .registry import Tool, make_tools
|
||||||
from .workflows import WorkflowLoader
|
from .workflows_TO_CHECK import WorkflowLoader
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|||||||
@@ -3,12 +3,12 @@
|
|||||||
import json
|
import json
|
||||||
from typing import Any
|
from typing import Any
|
||||||
|
|
||||||
from alfred.infrastructure.persistence import get_memory
|
from alfred.infrastructure.persistence_TO_CHECK import get_memory
|
||||||
from alfred.infrastructure.persistence.memory import MemoryRegistry
|
from alfred.infrastructure.persistence_TO_CHECK.memory import MemoryRegistry
|
||||||
|
|
||||||
from .expressions import build_expressions_context
|
from .expressions import build_expressions_context
|
||||||
from .registry import Tool
|
from .registry import Tool
|
||||||
from .workflows import WorkflowLoader
|
from .workflows_TO_CHECK import WorkflowLoader
|
||||||
|
|
||||||
# Tools that are always available, regardless of workflow scope.
|
# Tools that are always available, regardless of workflow scope.
|
||||||
# Kept small on purpose — the noyau is what the agent uses to either
|
# Kept small on purpose — the noyau is what the agent uses to either
|
||||||
|
|||||||
@@ -6,8 +6,8 @@ from collections.abc import Callable
|
|||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
from typing import Any
|
from typing import Any
|
||||||
|
|
||||||
from .tools.spec import ToolSpec, ToolSpecError
|
from .tools_TO_CHECK.spec import ToolSpec, ToolSpecError
|
||||||
from .tools.spec_loader import load_tool_specs
|
from .tools_TO_CHECK.spec_loader import load_tool_specs
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
@@ -130,10 +130,10 @@ def make_tools(settings) -> dict[str, Tool]:
|
|||||||
Returns:
|
Returns:
|
||||||
Dictionary mapping tool names to Tool objects.
|
Dictionary mapping tool names to Tool objects.
|
||||||
"""
|
"""
|
||||||
from .tools import api as api_tools # noqa: PLC0415
|
from .tools_TO_CHECK import api as api_tools # noqa: PLC0415
|
||||||
from .tools import filesystem as fs_tools # noqa: PLC0415
|
from .tools_TO_CHECK import filesystem as fs_tools # noqa: PLC0415
|
||||||
from .tools import language as lang_tools # noqa: PLC0415
|
from .tools_TO_CHECK import language as lang_tools # noqa: PLC0415
|
||||||
from .tools import workflow as wf_tools # noqa: PLC0415
|
from .tools_TO_CHECK import workflow as wf_tools # noqa: PLC0415
|
||||||
|
|
||||||
tool_functions = [
|
tool_functions = [
|
||||||
fs_tools.set_path_for_folder,
|
fs_tools.set_path_for_folder,
|
||||||
|
|||||||
@@ -1,22 +0,0 @@
|
|||||||
"""Tools module - filesystem and API tools for the agent."""
|
|
||||||
|
|
||||||
from .api import (
|
|
||||||
add_torrent_by_index,
|
|
||||||
add_torrent_to_qbittorrent,
|
|
||||||
find_media_imdb_id,
|
|
||||||
find_torrent,
|
|
||||||
get_torrent_by_index,
|
|
||||||
)
|
|
||||||
from .filesystem import list_folder, set_path_for_folder
|
|
||||||
from .language import set_language
|
|
||||||
|
|
||||||
__all__ = [
|
|
||||||
"set_path_for_folder",
|
|
||||||
"list_folder",
|
|
||||||
"find_media_imdb_id",
|
|
||||||
"find_torrent",
|
|
||||||
"get_torrent_by_index",
|
|
||||||
"add_torrent_to_qbittorrent",
|
|
||||||
"add_torrent_by_index",
|
|
||||||
"set_language",
|
|
||||||
]
|
|
||||||
@@ -0,0 +1,23 @@
|
|||||||
|
"""Tools module — agent-exposed wrappers.
|
||||||
|
|
||||||
|
Re-exports are intentionally minimal during the ``unfuck`` refactor.
|
||||||
|
Tool wiring (registry / specs / LLM-facing surface) is the last
|
||||||
|
chunk of work on this branch; until then, importers should reach
|
||||||
|
into the submodules directly (``alfred.agent.tools.filesystem``, …).
|
||||||
|
"""
|
||||||
|
|
||||||
|
from .api import (
|
||||||
|
add_torrent_by_index,
|
||||||
|
add_torrent_to_qbittorrent,
|
||||||
|
find_torrent,
|
||||||
|
get_torrent_by_index,
|
||||||
|
)
|
||||||
|
from .language import set_language
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"find_torrent",
|
||||||
|
"get_torrent_by_index",
|
||||||
|
"add_torrent_to_qbittorrent",
|
||||||
|
"add_torrent_by_index",
|
||||||
|
"set_language",
|
||||||
|
]
|
||||||
@@ -3,35 +3,47 @@
|
|||||||
import logging
|
import logging
|
||||||
from typing import Any
|
from typing import Any
|
||||||
|
|
||||||
from alfred.application.movies import SearchMovieUseCase
|
from alfred.application.movies_TO_CHECK import SearchMovieUseCase
|
||||||
from alfred.application.torrents import AddTorrentUseCase, SearchTorrentsUseCase
|
from alfred.application.torrents_TO_CHECK import AddTorrentUseCase, SearchTorrentsUseCase
|
||||||
from alfred.infrastructure.api.knaben import knaben_client
|
from alfred.application.tv_shows_TO_CHECK import SearchShowUseCase
|
||||||
from alfred.infrastructure.api.qbittorrent import qbittorrent_client
|
from alfred.infrastructure.api_TO_CHECK.knaben import knaben_client
|
||||||
from alfred.infrastructure.api.tmdb import tmdb_client
|
from alfred.infrastructure.api_TO_CHECK.qbittorrent import qbittorrent_client
|
||||||
from alfred.infrastructure.persistence import get_memory
|
from alfred.infrastructure.api_TO_CHECK.tmdb import tmdb_client
|
||||||
|
from alfred.infrastructure.persistence_TO_CHECK import get_memory
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
def find_media_imdb_id(media_title: str) -> dict[str, Any]:
|
def search_movies(media_title: str) -> dict[str, Any]:
|
||||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/find_media_imdb_id.yaml."""
|
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/search_movies.yaml."""
|
||||||
use_case = SearchMovieUseCase(tmdb_client)
|
use_case = SearchMovieUseCase(tmdb_client)
|
||||||
response = use_case.execute(media_title)
|
response = use_case.execute(media_title)
|
||||||
result = response.to_dict()
|
result = response.to_dict()
|
||||||
|
|
||||||
if result.get("status") == "ok":
|
if result.get("status") == "ok":
|
||||||
memory = get_memory()
|
memory = get_memory()
|
||||||
memory.stm.set_entity(
|
memory.stm.set_entity("last_movie_search", {"hits": result.get("hits", [])})
|
||||||
"last_media_search",
|
memory.stm.set_topic("searching_movie")
|
||||||
{
|
logger.debug(
|
||||||
"title": result.get("title"),
|
f"Stored movie search result in STM: {len(result.get('hits', []))} hits"
|
||||||
"imdb_id": result.get("imdb_id"),
|
)
|
||||||
"media_type": result.get("media_type"),
|
|
||||||
"tmdb_id": result.get("tmdb_id"),
|
return result
|
||||||
},
|
|
||||||
|
|
||||||
|
def search_shows(show_title: str) -> dict[str, Any]:
|
||||||
|
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/search_shows.yaml."""
|
||||||
|
use_case = SearchShowUseCase(tmdb_client)
|
||||||
|
response = use_case.execute(show_title)
|
||||||
|
result = response.to_dict()
|
||||||
|
|
||||||
|
if result.get("status") == "ok":
|
||||||
|
memory = get_memory()
|
||||||
|
memory.stm.set_entity("last_show_search", {"hits": result.get("hits", [])})
|
||||||
|
memory.stm.set_topic("searching_show")
|
||||||
|
logger.debug(
|
||||||
|
f"Stored show search result in STM: {len(result.get('hits', []))} hits"
|
||||||
)
|
)
|
||||||
memory.stm.set_topic("searching_media")
|
|
||||||
logger.debug(f"Stored media search result in STM: {result.get('title')}")
|
|
||||||
|
|
||||||
return result
|
return result
|
||||||
|
|
||||||
@@ -1,4 +1,20 @@
|
|||||||
"""Filesystem tools for folder management."""
|
"""Filesystem tools for folder management.
|
||||||
|
|
||||||
|
Thin wrappers around the 5 atomic filesystem use cases
|
||||||
|
(``alfred.application.filesystem``) plus a few self-contained tools
|
||||||
|
(``analyze_release``, ``probe_media``, ``learn``, …).
|
||||||
|
|
||||||
|
Tools removed during the ``unfuck`` filesystem refactor — to be
|
||||||
|
rewired in a later step:
|
||||||
|
- ``manage_subtitles`` (depends on the rewritten subtitle services)
|
||||||
|
- ``set_path_for_folder`` (no replacement use case yet)
|
||||||
|
- ``create_seed_links`` (flow has changed: hard-link straight to
|
||||||
|
library, no copy back; will be re-introduced per-file when the
|
||||||
|
organize-release workflow lands)
|
||||||
|
- ``resolve_season_destination`` / ``resolve_episode_destination``
|
||||||
|
/ ``resolve_movie_destination`` / ``resolve_series_destination``
|
||||||
|
(their use cases moved to ``_OLD`` files pending a rewrite)
|
||||||
|
"""
|
||||||
|
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import Any
|
from typing import Any
|
||||||
@@ -7,120 +23,136 @@ import yaml
|
|||||||
|
|
||||||
import alfred as _alfred_pkg
|
import alfred as _alfred_pkg
|
||||||
from alfred.application.filesystem import (
|
from alfred.application.filesystem import (
|
||||||
CreateSeedLinksUseCase,
|
DirectoryRoots,
|
||||||
ListFolderUseCase,
|
create_dir_use_case,
|
||||||
ManageSubtitlesUseCase,
|
list_dir_use_case,
|
||||||
MoveMediaUseCase,
|
move_file_use_case,
|
||||||
SetFolderPathUseCase,
|
|
||||||
)
|
)
|
||||||
from alfred.application.filesystem.detect_media_type import detect_media_type
|
from alfred.infrastructure.knowledge_TO_CHECK.release_kb import YamlReleaseKnowledge
|
||||||
from alfred.application.filesystem.enrich_from_probe import enrich_from_probe
|
from alfred.infrastructure.metadata_TO_CHECK import MetadataStore
|
||||||
from alfred.application.filesystem.resolve_destination import (
|
from alfred.infrastructure.persistence_TO_CHECK import get_memory
|
||||||
resolve_episode_destination as _resolve_episode_destination,
|
from alfred.infrastructure.probe_TO_CHECK import FfprobeMediaProber
|
||||||
)
|
|
||||||
from alfred.application.filesystem.resolve_destination import (
|
# Agent-tools frontier: this is the legitimate home for the singletons that
|
||||||
resolve_movie_destination as _resolve_movie_destination,
|
# back every LLM-exposed wrapper. The use cases below take ``kb`` / ``prober``
|
||||||
)
|
# as required params; tests inject their own stubs.
|
||||||
from alfred.application.filesystem.resolve_destination import (
|
_KB = YamlReleaseKnowledge()
|
||||||
resolve_season_destination as _resolve_season_destination,
|
_PROBER = FfprobeMediaProber()
|
||||||
)
|
|
||||||
from alfred.application.filesystem.resolve_destination import (
|
|
||||||
resolve_series_destination as _resolve_series_destination,
|
|
||||||
)
|
|
||||||
from alfred.infrastructure.filesystem import FileManager, create_folder, move
|
|
||||||
from alfred.infrastructure.filesystem.ffprobe import probe
|
|
||||||
from alfred.infrastructure.filesystem.find_video import find_video_file
|
|
||||||
from alfred.infrastructure.metadata import MetadataStore
|
|
||||||
from alfred.infrastructure.persistence import get_memory
|
|
||||||
|
|
||||||
_LEARNED_ROOT = Path(_alfred_pkg.__file__).parent.parent / "data" / "knowledge"
|
_LEARNED_ROOT = Path(_alfred_pkg.__file__).parent.parent / "data" / "knowledge"
|
||||||
|
|
||||||
|
|
||||||
|
class _RootsNotConfigured(Exception):
|
||||||
|
"""Raised when one of the 4 expected roots is missing from memory."""
|
||||||
|
|
||||||
|
def __init__(self, missing: list[str]):
|
||||||
|
super().__init__(f"Roots not configured: {missing}")
|
||||||
|
self.missing = missing
|
||||||
|
|
||||||
|
|
||||||
|
def _load_directory_roots() -> DirectoryRoots:
|
||||||
|
"""Build :class:`DirectoryRoots` from the persisted memory.
|
||||||
|
|
||||||
|
Reads:
|
||||||
|
- ``ltm.workspace.download`` → ``downloads``
|
||||||
|
- ``ltm.workspace.torrent`` → ``torrents``
|
||||||
|
- ``ltm.library_paths['movies']`` → ``movies``
|
||||||
|
- ``ltm.library_paths['tv_shows']`` → ``tv_shows``
|
||||||
|
|
||||||
|
Raises:
|
||||||
|
_RootsNotConfigured: if any of the four paths is unset.
|
||||||
|
"""
|
||||||
|
memory = get_memory()
|
||||||
|
downloads = memory.ltm.workspace.download
|
||||||
|
torrents = memory.ltm.workspace.torrent
|
||||||
|
movies = memory.ltm.library_paths.get("movies")
|
||||||
|
tv_shows = memory.ltm.library_paths.get("tv_shows")
|
||||||
|
|
||||||
|
missing: list[str] = []
|
||||||
|
if not downloads:
|
||||||
|
missing.append("downloads")
|
||||||
|
if not torrents:
|
||||||
|
missing.append("torrents")
|
||||||
|
if not movies:
|
||||||
|
missing.append("movies")
|
||||||
|
if not tv_shows:
|
||||||
|
missing.append("tv_shows")
|
||||||
|
if missing:
|
||||||
|
raise _RootsNotConfigured(missing)
|
||||||
|
|
||||||
|
return DirectoryRoots(
|
||||||
|
downloads=Path(downloads),
|
||||||
|
torrents=Path(torrents),
|
||||||
|
movies=Path(movies),
|
||||||
|
tv_shows=Path(tv_shows),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _roots_error(exc: _RootsNotConfigured) -> dict[str, Any]:
|
||||||
|
return {
|
||||||
|
"status": "error",
|
||||||
|
"error": "roots_not_configured",
|
||||||
|
"message": (
|
||||||
|
f"Missing roots: {exc.missing}. "
|
||||||
|
"Configure them via /set_path before using filesystem tools."
|
||||||
|
),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# 5 atomic filesystem tools — thin wrappers over the use cases.
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def list_folder(path: str) -> dict[str, Any]:
|
||||||
|
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/list_folder.yaml."""
|
||||||
|
try:
|
||||||
|
roots = _load_directory_roots()
|
||||||
|
except _RootsNotConfigured as e:
|
||||||
|
return _roots_error(e)
|
||||||
|
return list_dir_use_case(Path(path), roots).to_dict()
|
||||||
|
|
||||||
|
|
||||||
|
def create_directory(path: str) -> dict[str, Any]:
|
||||||
|
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/create_directory.yaml."""
|
||||||
|
try:
|
||||||
|
roots = _load_directory_roots()
|
||||||
|
except _RootsNotConfigured as e:
|
||||||
|
return _roots_error(e)
|
||||||
|
return create_dir_use_case(Path(path), roots).to_dict()
|
||||||
|
|
||||||
|
|
||||||
def move_media(source: str, destination: str) -> dict[str, Any]:
|
def move_media(source: str, destination: str) -> dict[str, Any]:
|
||||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/move_media.yaml."""
|
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/move_media.yaml."""
|
||||||
file_manager = FileManager()
|
try:
|
||||||
use_case = MoveMediaUseCase(file_manager)
|
roots = _load_directory_roots()
|
||||||
return use_case.execute(source, destination).to_dict()
|
except _RootsNotConfigured as e:
|
||||||
|
return _roots_error(e)
|
||||||
|
return move_file_use_case(Path(source), Path(destination), roots).to_dict()
|
||||||
|
|
||||||
|
|
||||||
def move_to_destination(source: str, destination: str) -> dict[str, Any]:
|
def move_to_destination(source: str, destination: str) -> dict[str, Any]:
|
||||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/move_to_destination.yaml."""
|
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/move_to_destination.yaml.
|
||||||
parent = str(Path(destination).parent)
|
|
||||||
result = create_folder(parent)
|
Convenience tool that creates the destination's parent directory
|
||||||
if result["status"] != "ok":
|
if missing, then moves the file. Saves the LLM from having to
|
||||||
return result
|
chain ``create_directory`` + ``move_media`` explicitly.
|
||||||
return move(source, destination)
|
"""
|
||||||
|
try:
|
||||||
|
roots = _load_directory_roots()
|
||||||
|
except _RootsNotConfigured as e:
|
||||||
|
return _roots_error(e)
|
||||||
|
|
||||||
|
dst = Path(destination)
|
||||||
|
mkdir_resp = create_dir_use_case(dst.parent, roots)
|
||||||
|
if mkdir_resp.status != "ok":
|
||||||
|
return mkdir_resp.to_dict()
|
||||||
|
return move_file_use_case(Path(source), dst, roots).to_dict()
|
||||||
|
|
||||||
|
|
||||||
def resolve_season_destination(
|
# ---------------------------------------------------------------------------
|
||||||
release_name: str,
|
# Self-contained tools — not impacted by the filesystem refactor.
|
||||||
tmdb_title: str,
|
# ---------------------------------------------------------------------------
|
||||||
tmdb_year: int,
|
|
||||||
confirmed_folder: str | None = None,
|
|
||||||
) -> dict[str, Any]:
|
|
||||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_season_destination.yaml."""
|
|
||||||
return _resolve_season_destination(
|
|
||||||
release_name, tmdb_title, tmdb_year, confirmed_folder
|
|
||||||
).to_dict()
|
|
||||||
|
|
||||||
|
|
||||||
def resolve_episode_destination(
|
|
||||||
release_name: str,
|
|
||||||
source_file: str,
|
|
||||||
tmdb_title: str,
|
|
||||||
tmdb_year: int,
|
|
||||||
tmdb_episode_title: str | None = None,
|
|
||||||
confirmed_folder: str | None = None,
|
|
||||||
) -> dict[str, Any]:
|
|
||||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_episode_destination.yaml."""
|
|
||||||
return _resolve_episode_destination(
|
|
||||||
release_name,
|
|
||||||
source_file,
|
|
||||||
tmdb_title,
|
|
||||||
tmdb_year,
|
|
||||||
tmdb_episode_title,
|
|
||||||
confirmed_folder,
|
|
||||||
).to_dict()
|
|
||||||
|
|
||||||
|
|
||||||
def resolve_movie_destination(
|
|
||||||
release_name: str,
|
|
||||||
source_file: str,
|
|
||||||
tmdb_title: str,
|
|
||||||
tmdb_year: int,
|
|
||||||
) -> dict[str, Any]:
|
|
||||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_movie_destination.yaml."""
|
|
||||||
return _resolve_movie_destination(
|
|
||||||
release_name, source_file, tmdb_title, tmdb_year
|
|
||||||
).to_dict()
|
|
||||||
|
|
||||||
|
|
||||||
def resolve_series_destination(
|
|
||||||
release_name: str,
|
|
||||||
tmdb_title: str,
|
|
||||||
tmdb_year: int,
|
|
||||||
confirmed_folder: str | None = None,
|
|
||||||
) -> dict[str, Any]:
|
|
||||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_series_destination.yaml."""
|
|
||||||
return _resolve_series_destination(
|
|
||||||
release_name, tmdb_title, tmdb_year, confirmed_folder
|
|
||||||
).to_dict()
|
|
||||||
|
|
||||||
|
|
||||||
def create_seed_links(
|
|
||||||
library_file: str, original_download_folder: str
|
|
||||||
) -> dict[str, Any]:
|
|
||||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/create_seed_links.yaml."""
|
|
||||||
file_manager = FileManager()
|
|
||||||
use_case = CreateSeedLinksUseCase(file_manager)
|
|
||||||
return use_case.execute(library_file, original_download_folder).to_dict()
|
|
||||||
|
|
||||||
|
|
||||||
def manage_subtitles(source_video: str, destination_video: str) -> dict[str, Any]:
|
|
||||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/manage_subtitles.yaml."""
|
|
||||||
file_manager = FileManager()
|
|
||||||
use_case = ManageSubtitlesUseCase(file_manager)
|
|
||||||
return use_case.execute(source_video, destination_video).to_dict()
|
|
||||||
|
|
||||||
|
|
||||||
def learn(pack: str, category: str, key: str, values: list[str]) -> dict[str, Any]:
|
def learn(pack: str, category: str, key: str, values: list[str]) -> dict[str, Any]:
|
||||||
@@ -180,32 +212,12 @@ def learn(pack: str, category: str, key: str, values: list[str]) -> dict[str, An
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
def set_path_for_folder(folder_name: str, path_value: str) -> dict[str, Any]:
|
|
||||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/set_path_for_folder.yaml."""
|
|
||||||
file_manager = FileManager()
|
|
||||||
use_case = SetFolderPathUseCase(file_manager)
|
|
||||||
response = use_case.execute(folder_name, path_value)
|
|
||||||
return response.to_dict()
|
|
||||||
|
|
||||||
|
|
||||||
def analyze_release(release_name: str, source_path: str) -> dict[str, Any]:
|
def analyze_release(release_name: str, source_path: str) -> dict[str, Any]:
|
||||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/analyze_release.yaml."""
|
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/analyze_release.yaml."""
|
||||||
from alfred.application.filesystem.resolve_destination import _KB # noqa: PLC0415
|
from alfred.application.release_TO_CHECK import inspect_release # noqa: PLC0415
|
||||||
from alfred.domain.release.services import parse_release # noqa: PLC0415
|
|
||||||
|
|
||||||
path = Path(source_path)
|
|
||||||
parsed = parse_release(release_name, _KB)
|
|
||||||
parsed.media_type = detect_media_type(parsed, path, _KB)
|
|
||||||
|
|
||||||
probe_used = False
|
|
||||||
if parsed.media_type not in ("unknown", "other"):
|
|
||||||
video_file = find_video_file(path, _KB)
|
|
||||||
if video_file:
|
|
||||||
media_info = probe(video_file)
|
|
||||||
if media_info:
|
|
||||||
enrich_from_probe(parsed, media_info)
|
|
||||||
probe_used = True
|
|
||||||
|
|
||||||
|
result = inspect_release(release_name, Path(source_path), _KB, _PROBER)
|
||||||
|
parsed = result.parsed
|
||||||
return {
|
return {
|
||||||
"status": "ok",
|
"status": "ok",
|
||||||
"media_type": parsed.media_type,
|
"media_type": parsed.media_type,
|
||||||
@@ -227,7 +239,10 @@ def analyze_release(release_name: str, source_path: str) -> dict[str, Any]:
|
|||||||
"edition": parsed.edition,
|
"edition": parsed.edition,
|
||||||
"site_tag": parsed.site_tag,
|
"site_tag": parsed.site_tag,
|
||||||
"is_season_pack": parsed.is_season_pack,
|
"is_season_pack": parsed.is_season_pack,
|
||||||
"probe_used": probe_used,
|
"probe_used": result.probe_used,
|
||||||
|
"confidence": result.report.confidence,
|
||||||
|
"road": result.report.road,
|
||||||
|
"recommended_action": result.recommended_action,
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@@ -241,7 +256,7 @@ def probe_media(source_path: str) -> dict[str, Any]:
|
|||||||
"message": f"{source_path} does not exist",
|
"message": f"{source_path} does not exist",
|
||||||
}
|
}
|
||||||
|
|
||||||
media_info = probe(path)
|
media_info = _PROBER.probe(path)
|
||||||
if media_info is None:
|
if media_info is None:
|
||||||
return {
|
return {
|
||||||
"status": "error",
|
"status": "error",
|
||||||
@@ -285,14 +300,6 @@ def probe_media(source_path: str) -> dict[str, Any]:
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
def list_folder(folder_type: str, path: str = ".") -> dict[str, Any]:
|
|
||||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/list_folder.yaml."""
|
|
||||||
file_manager = FileManager()
|
|
||||||
use_case = ListFolderUseCase(file_manager)
|
|
||||||
response = use_case.execute(folder_type, path)
|
|
||||||
return response.to_dict()
|
|
||||||
|
|
||||||
|
|
||||||
def read_release_metadata(release_path: str) -> dict[str, Any]:
|
def read_release_metadata(release_path: str) -> dict[str, Any]:
|
||||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/read_release_metadata.yaml."""
|
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/read_release_metadata.yaml."""
|
||||||
path = Path(release_path)
|
path = Path(release_path)
|
||||||
@@ -3,7 +3,7 @@
|
|||||||
import logging
|
import logging
|
||||||
from typing import Any
|
from typing import Any
|
||||||
|
|
||||||
from alfred.infrastructure.persistence import get_memory
|
from alfred.infrastructure.persistence_TO_CHECK import get_memory
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
+3
@@ -80,3 +80,6 @@ returns:
|
|||||||
site_tag: Source-site tag if present.
|
site_tag: Source-site tag if present.
|
||||||
is_season_pack: True when the folder contains a full season.
|
is_season_pack: True when the folder contains a full season.
|
||||||
probe_used: True when ffprobe successfully enriched the result.
|
probe_used: True when ffprobe successfully enriched the result.
|
||||||
|
confidence: Parser confidence score, 0–100 (higher = more reliable).
|
||||||
|
road: "Parser road: 'easy' (group schema matched), 'shitty' (heuristic but acceptable), or 'path_of_pain' (low confidence — ask the user before auto-routing)."
|
||||||
|
recommended_action: "Orchestrator hint: 'process' (go straight to resolve_*_destination), 'ask_user' (media_type unknown or road=path_of_pain — confirm with the user first), or 'skip' (no main video, or media_type=other — nothing to organize)."
|
||||||
+11
@@ -61,6 +61,17 @@ parameters:
|
|||||||
one.
|
one.
|
||||||
example: Oz.1997.1080p.WEBRip.x265-KONTRAST
|
example: Oz.1997.1080p.WEBRip.x265-KONTRAST
|
||||||
|
|
||||||
|
source_path:
|
||||||
|
description: |
|
||||||
|
Absolute path to the release folder on disk. Optional.
|
||||||
|
why_needed: |
|
||||||
|
When provided, the tool runs ffprobe on the main video inside the
|
||||||
|
folder and uses the probe data to fill quality/codec tokens that
|
||||||
|
may be missing from the release name. The enriched tech tokens
|
||||||
|
end up in the destination folder name, so providing source_path
|
||||||
|
gives more accurate names for releases with sparse metadata.
|
||||||
|
example: /downloads/Oz.S03.1080p.WEBRip.x265-KONTRAST
|
||||||
|
|
||||||
returns:
|
returns:
|
||||||
ok:
|
ok:
|
||||||
description: Paths resolved unambiguously; ready to move.
|
description: Paths resolved unambiguously; ready to move.
|
||||||
+10
@@ -56,6 +56,16 @@ parameters:
|
|||||||
Forces the use case to use this exact folder name and skip detection.
|
Forces the use case to use this exact folder name and skip detection.
|
||||||
example: The.Wire.2002.1080p.BluRay.x265-GROUP
|
example: The.Wire.2002.1080p.BluRay.x265-GROUP
|
||||||
|
|
||||||
|
source_path:
|
||||||
|
description: |
|
||||||
|
Absolute path to the release folder on disk. Optional.
|
||||||
|
why_needed: |
|
||||||
|
When provided, the tool runs ffprobe on the main video inside the
|
||||||
|
folder and uses probe data to fill quality/codec tokens that may
|
||||||
|
be missing from the release name, producing a more accurate
|
||||||
|
destination folder name.
|
||||||
|
example: /downloads/The.Wire.S01-S05.1080p.BluRay.x265-GROUP
|
||||||
|
|
||||||
returns:
|
returns:
|
||||||
ok:
|
ok:
|
||||||
description: Path resolved; ready to move the pack.
|
description: Path resolved; ready to move the pack.
|
||||||
@@ -9,9 +9,9 @@ to reason over the full set.
|
|||||||
import logging
|
import logging
|
||||||
from typing import Any
|
from typing import Any
|
||||||
|
|
||||||
from alfred.infrastructure.persistence import get_memory
|
from alfred.infrastructure.persistence_TO_CHECK import get_memory
|
||||||
|
|
||||||
from ..workflows import WorkflowLoader
|
from ..workflows_TO_CHECK import WorkflowLoader
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
+1
-1
@@ -15,7 +15,7 @@ from alfred.agent.agent import Agent
|
|||||||
from alfred.agent.llm.deepseek import DeepSeekClient
|
from alfred.agent.llm.deepseek import DeepSeekClient
|
||||||
from alfred.agent.llm.exceptions import LLMAPIError, LLMConfigurationError
|
from alfred.agent.llm.exceptions import LLMAPIError, LLMConfigurationError
|
||||||
from alfred.agent.llm.ollama import OllamaClient
|
from alfred.agent.llm.ollama import OllamaClient
|
||||||
from alfred.infrastructure.persistence import get_memory, init_memory
|
from alfred.infrastructure.persistence_TO_CHECK import get_memory, init_memory
|
||||||
from alfred.settings import settings
|
from alfred.settings import settings
|
||||||
|
|
||||||
logging.basicConfig(
|
logging.basicConfig(
|
||||||
|
|||||||
@@ -0,0 +1,26 @@
|
|||||||
|
"""Application-layer exceptions shared across orchestrators.
|
||||||
|
|
||||||
|
Kept in a dedicated module (rather than inside each orchestrator's
|
||||||
|
file) because the sync flows for TV shows and movies raise structurally
|
||||||
|
identical "not found in library" errors — pulling them out makes the
|
||||||
|
shared semantics explicit and avoids cross-imports between the
|
||||||
|
``tv_shows`` and ``movies`` packages.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
|
||||||
|
class ShowNotFoundInLibrary(LookupError):
|
||||||
|
"""Raised when no on-disk TV show carries the requested ``tmdb_id``.
|
||||||
|
|
||||||
|
The sync orchestrator raises this when both the library index and
|
||||||
|
the per-show release repository return ``None`` for a lookup —
|
||||||
|
there is nothing on disk to refresh TMDB facts against.
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
class MovieNotFoundInLibrary(LookupError):
|
||||||
|
"""Raised when no on-disk movie carries the requested ``tmdb_id``.
|
||||||
|
|
||||||
|
Symmetric to :class:`ShowNotFoundInLibrary` for the movies library.
|
||||||
|
"""
|
||||||
@@ -1,47 +1,42 @@
|
|||||||
"""Filesystem use cases."""
|
"""Filesystem application layer — 5 atomic use cases as free functions.
|
||||||
|
|
||||||
from .create_seed_links import CreateSeedLinksUseCase
|
Each use case:
|
||||||
|
- accepts :class:`pathlib.Path` inputs plus a :class:`DirectoryRoots` VO,
|
||||||
|
- guards inputs against escaping configured roots,
|
||||||
|
- calls the matching infra op,
|
||||||
|
- catches :class:`~alfred.infrastructure.filesystem.FilesystemError` and
|
||||||
|
returns a frozen DTO with a normalized error code.
|
||||||
|
|
||||||
|
No global state, no ``get_memory()``. Roots are injected.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from .create_dir import create_dir_use_case
|
||||||
|
from .directory_roots import DirectoryRoots
|
||||||
from .dto import (
|
from .dto import (
|
||||||
CreateSeedLinksResponse,
|
CreateDirResponse,
|
||||||
ListFolderResponse,
|
LinkFileResponse,
|
||||||
ManageSubtitlesResponse,
|
ListDirResponse,
|
||||||
MoveMediaResponse,
|
MoveDirResponse,
|
||||||
PlacedSubtitle,
|
MoveFileResponse,
|
||||||
SetFolderPathResponse,
|
|
||||||
)
|
)
|
||||||
from .list_folder import ListFolderUseCase
|
from .link_file import link_file_use_case
|
||||||
from .manage_subtitles import ManageSubtitlesUseCase
|
from .list_dir import list_dir_use_case
|
||||||
from .move_media import MoveMediaUseCase
|
from .move_dir import move_dir_use_case
|
||||||
from .resolve_destination import (
|
from .move_file import move_file_use_case
|
||||||
ResolvedEpisodeDestination,
|
|
||||||
ResolvedMovieDestination,
|
|
||||||
ResolvedSeasonDestination,
|
|
||||||
ResolvedSeriesDestination,
|
|
||||||
resolve_episode_destination,
|
|
||||||
resolve_movie_destination,
|
|
||||||
resolve_season_destination,
|
|
||||||
resolve_series_destination,
|
|
||||||
)
|
|
||||||
from .set_folder_path import SetFolderPathUseCase
|
|
||||||
|
|
||||||
__all__ = [
|
__all__ = [
|
||||||
"SetFolderPathUseCase",
|
# use cases
|
||||||
"ListFolderUseCase",
|
"list_dir_use_case",
|
||||||
"CreateSeedLinksUseCase",
|
"create_dir_use_case",
|
||||||
"MoveMediaUseCase",
|
"link_file_use_case",
|
||||||
"ManageSubtitlesUseCase",
|
"move_file_use_case",
|
||||||
"ResolvedSeasonDestination",
|
"move_dir_use_case",
|
||||||
"ResolvedEpisodeDestination",
|
# VO
|
||||||
"ResolvedMovieDestination",
|
"DirectoryRoots",
|
||||||
"ResolvedSeriesDestination",
|
# DTOs
|
||||||
"resolve_season_destination",
|
"ListDirResponse",
|
||||||
"resolve_episode_destination",
|
"CreateDirResponse",
|
||||||
"resolve_movie_destination",
|
"LinkFileResponse",
|
||||||
"resolve_series_destination",
|
"MoveFileResponse",
|
||||||
"SetFolderPathResponse",
|
"MoveDirResponse",
|
||||||
"ListFolderResponse",
|
|
||||||
"CreateSeedLinksResponse",
|
|
||||||
"MoveMediaResponse",
|
|
||||||
"ManageSubtitlesResponse",
|
|
||||||
"PlacedSubtitle",
|
|
||||||
]
|
]
|
||||||
|
|||||||
@@ -0,0 +1,41 @@
|
|||||||
|
"""Internal helpers: mapping infra exceptions → error codes.
|
||||||
|
|
||||||
|
Kept private (``_errors``) — only the 5 use cases in this package use
|
||||||
|
it. Centralizes the exception → code translation so every use case
|
||||||
|
returns consistent error payloads.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from alfred.infrastructure.filesystem import (
|
||||||
|
CrossDevice,
|
||||||
|
DestinationExists,
|
||||||
|
FilesystemError,
|
||||||
|
FilesystemOSError,
|
||||||
|
NotADirectory,
|
||||||
|
NotAFile,
|
||||||
|
PermissionDenied,
|
||||||
|
SourceNotFound,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Application-layer error codes (guard violations, not infra).
|
||||||
|
PATH_NOT_ALLOWED = "path_not_allowed"
|
||||||
|
|
||||||
|
|
||||||
|
def code_for(exc: FilesystemError) -> str:
|
||||||
|
"""Return the snake-case error code for an infra exception."""
|
||||||
|
if isinstance(exc, SourceNotFound):
|
||||||
|
return "source_not_found"
|
||||||
|
if isinstance(exc, DestinationExists):
|
||||||
|
return "destination_exists"
|
||||||
|
if isinstance(exc, NotADirectory):
|
||||||
|
return "not_a_directory"
|
||||||
|
if isinstance(exc, NotAFile):
|
||||||
|
return "not_a_file"
|
||||||
|
if isinstance(exc, PermissionDenied):
|
||||||
|
return "permission_denied"
|
||||||
|
if isinstance(exc, CrossDevice):
|
||||||
|
return "cross_device"
|
||||||
|
if isinstance(exc, FilesystemOSError):
|
||||||
|
return "filesystem_os_error"
|
||||||
|
return "filesystem_error"
|
||||||
@@ -0,0 +1,33 @@
|
|||||||
|
"""create_dir use case — create a directory under one of the configured roots."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from alfred.infrastructure.filesystem import FilesystemError, create_dir
|
||||||
|
|
||||||
|
from ._errors import PATH_NOT_ALLOWED, code_for
|
||||||
|
from .directory_roots import DirectoryRoots
|
||||||
|
from .dto import CreateDirResponse
|
||||||
|
|
||||||
|
|
||||||
|
def create_dir_use_case(path: Path, roots: DirectoryRoots) -> CreateDirResponse:
|
||||||
|
"""Create directory ``path`` (and any missing parents) provided it
|
||||||
|
lives under one of the configured roots.
|
||||||
|
|
||||||
|
Idempotent on the infra side: re-running on an existing directory
|
||||||
|
returns ``status="ok"``.
|
||||||
|
"""
|
||||||
|
if not roots.contains(path):
|
||||||
|
return CreateDirResponse(
|
||||||
|
status="error",
|
||||||
|
error=PATH_NOT_ALLOWED,
|
||||||
|
message=f"Path is outside configured roots: {path}",
|
||||||
|
)
|
||||||
|
|
||||||
|
try:
|
||||||
|
create_dir(path)
|
||||||
|
except FilesystemError as e:
|
||||||
|
return CreateDirResponse(status="error", error=code_for(e), message=str(e))
|
||||||
|
|
||||||
|
return CreateDirResponse(status="ok", path=path)
|
||||||
+1
-1
@@ -3,7 +3,7 @@
|
|||||||
import logging
|
import logging
|
||||||
|
|
||||||
from alfred.infrastructure.filesystem import FileManager
|
from alfred.infrastructure.filesystem import FileManager
|
||||||
from alfred.infrastructure.persistence import get_memory
|
from alfred.infrastructure.persistence_TO_CHECK import get_memory
|
||||||
|
|
||||||
from .dto import CreateSeedLinksResponse
|
from .dto import CreateSeedLinksResponse
|
||||||
|
|
||||||
@@ -0,0 +1,56 @@
|
|||||||
|
"""DirectoryRoots — VO carrying the configured filesystem roots.
|
||||||
|
|
||||||
|
Replaces the ad-hoc ``get_memory().ltm.workspace.<x>`` lookups that were
|
||||||
|
sprinkled across the filesystem use cases. By making roots an explicit
|
||||||
|
input, use cases become pure (no global state read) and easy to test.
|
||||||
|
|
||||||
|
The roots are read once at the tool wrapper boundary (where the agent
|
||||||
|
config lives) and threaded through the use cases.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class DirectoryRoots:
|
||||||
|
"""Configured roots of Alfred's filesystem.
|
||||||
|
|
||||||
|
All paths must be absolute and existing directories — validation is
|
||||||
|
expected at the boundary that builds this VO.
|
||||||
|
|
||||||
|
Attributes:
|
||||||
|
downloads: where qBittorrent drops finished torrents.
|
||||||
|
torrents: where seeding hard-links live (mirrors downloads/).
|
||||||
|
movies: library root for movies.
|
||||||
|
tv_shows: library root for TV shows.
|
||||||
|
"""
|
||||||
|
|
||||||
|
downloads: Path
|
||||||
|
torrents: Path
|
||||||
|
movies: Path
|
||||||
|
tv_shows: Path
|
||||||
|
|
||||||
|
def all(self) -> tuple[Path, ...]:
|
||||||
|
"""Return every configured root, in declaration order."""
|
||||||
|
return (self.downloads, self.torrents, self.movies, self.tv_shows)
|
||||||
|
|
||||||
|
def contains(self, path: Path) -> bool:
|
||||||
|
"""Return True if ``path`` is inside one of the configured roots.
|
||||||
|
|
||||||
|
Uses ``Path.resolve()`` to handle symlinks and ``..`` segments,
|
||||||
|
then ``relative_to`` for an exact within-root check.
|
||||||
|
"""
|
||||||
|
try:
|
||||||
|
resolved = path.resolve()
|
||||||
|
except OSError:
|
||||||
|
return False
|
||||||
|
for root in self.all():
|
||||||
|
try:
|
||||||
|
resolved.relative_to(root.resolve())
|
||||||
|
return True
|
||||||
|
except (ValueError, OSError):
|
||||||
|
continue
|
||||||
|
return False
|
||||||
@@ -1,19 +1,28 @@
|
|||||||
"""Filesystem application DTOs."""
|
"""DTOs for the 5 atomic filesystem use cases.
|
||||||
|
|
||||||
|
Each use case returns a small frozen dataclass tagged with a ``status``
|
||||||
|
field. On error, ``error`` (machine-readable code) and ``message``
|
||||||
|
(human-readable) are populated; on success, the relevant payload
|
||||||
|
fields are.
|
||||||
|
|
||||||
|
Error codes mirror the infrastructure exception types (lowercased,
|
||||||
|
snake-cased) — e.g. ``SourceNotFound`` → ``"source_not_found"`` — plus
|
||||||
|
the application-layer ``"path_not_allowed"`` for guard violations.
|
||||||
|
"""
|
||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
from dataclasses import dataclass
|
from dataclasses import dataclass, field
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass(frozen=True)
|
||||||
class CopyMediaResponse:
|
class ListDirResponse:
|
||||||
"""Response from copying a media file."""
|
"""Response from ``list_dir_use_case``."""
|
||||||
|
|
||||||
status: str
|
status: str # "ok" | "error"
|
||||||
source: str | None = None
|
path: Path | None = None
|
||||||
destination: str | None = None
|
entries: tuple[Path, ...] = ()
|
||||||
filename: str | None = None
|
|
||||||
size: int | None = None
|
|
||||||
error: str | None = None
|
error: str | None = None
|
||||||
message: str | None = None
|
message: str | None = None
|
||||||
|
|
||||||
@@ -22,22 +31,33 @@ class CopyMediaResponse:
|
|||||||
return {"status": self.status, "error": self.error, "message": self.message}
|
return {"status": self.status, "error": self.error, "message": self.message}
|
||||||
return {
|
return {
|
||||||
"status": self.status,
|
"status": self.status,
|
||||||
"source": self.source,
|
"path": str(self.path) if self.path else None,
|
||||||
"destination": self.destination,
|
"entries": [str(p) for p in self.entries],
|
||||||
"filename": self.filename,
|
|
||||||
"size": self.size,
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass(frozen=True)
|
||||||
class MoveMediaResponse:
|
class CreateDirResponse:
|
||||||
"""Response from moving a media file."""
|
"""Response from ``create_dir_use_case``."""
|
||||||
|
|
||||||
status: str
|
status: str
|
||||||
source: str | None = None
|
path: Path | None = None
|
||||||
destination: str | None = None
|
error: str | None = None
|
||||||
filename: str | None = None
|
message: str | None = None
|
||||||
size: int | None = None
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
if self.error:
|
||||||
|
return {"status": self.status, "error": self.error, "message": self.message}
|
||||||
|
return {"status": self.status, "path": str(self.path) if self.path else None}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class LinkFileResponse:
|
||||||
|
"""Response from ``link_file_use_case``."""
|
||||||
|
|
||||||
|
status: str
|
||||||
|
source: Path | None = None
|
||||||
|
destination: Path | None = None
|
||||||
error: str | None = None
|
error: str | None = None
|
||||||
message: str | None = None
|
message: str | None = None
|
||||||
|
|
||||||
@@ -46,125 +66,18 @@ class MoveMediaResponse:
|
|||||||
return {"status": self.status, "error": self.error, "message": self.message}
|
return {"status": self.status, "error": self.error, "message": self.message}
|
||||||
return {
|
return {
|
||||||
"status": self.status,
|
"status": self.status,
|
||||||
"source": self.source,
|
"source": str(self.source) if self.source else None,
|
||||||
"destination": self.destination,
|
"destination": str(self.destination) if self.destination else None,
|
||||||
"filename": self.filename,
|
|
||||||
"size": self.size,
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass(frozen=True)
|
||||||
class SetFolderPathResponse:
|
class MoveFileResponse:
|
||||||
"""Response from setting a folder path."""
|
"""Response from ``move_file_use_case``."""
|
||||||
|
|
||||||
status: str
|
status: str
|
||||||
folder_name: str | None = None
|
source: Path | None = None
|
||||||
path: str | None = None
|
destination: Path | None = None
|
||||||
error: str | None = None
|
|
||||||
message: str | None = None
|
|
||||||
|
|
||||||
def to_dict(self):
|
|
||||||
"""Convert to dict for agent compatibility."""
|
|
||||||
result = {"status": self.status}
|
|
||||||
|
|
||||||
if self.error:
|
|
||||||
result["error"] = self.error
|
|
||||||
result["message"] = self.message
|
|
||||||
else:
|
|
||||||
if self.folder_name:
|
|
||||||
result["folder_name"] = self.folder_name
|
|
||||||
if self.path:
|
|
||||||
result["path"] = self.path
|
|
||||||
|
|
||||||
return result
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class PlacedSubtitle:
|
|
||||||
"""One subtitle file successfully placed."""
|
|
||||||
|
|
||||||
source: str
|
|
||||||
destination: str
|
|
||||||
filename: str
|
|
||||||
|
|
||||||
def to_dict(self) -> dict:
|
|
||||||
return {
|
|
||||||
"source": self.source,
|
|
||||||
"destination": self.destination,
|
|
||||||
"filename": self.filename,
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class UnresolvedTrack:
|
|
||||||
"""A subtitle track that needs agent clarification before placement."""
|
|
||||||
|
|
||||||
raw_tokens: list[str]
|
|
||||||
file_path: str | None = None
|
|
||||||
file_size_kb: float | None = None
|
|
||||||
reason: str = "" # "unknown_language" | "low_confidence"
|
|
||||||
|
|
||||||
def to_dict(self) -> dict:
|
|
||||||
return {
|
|
||||||
"raw_tokens": self.raw_tokens,
|
|
||||||
"file_path": self.file_path,
|
|
||||||
"file_size_kb": self.file_size_kb,
|
|
||||||
"reason": self.reason,
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class AvailableSubtitle:
|
|
||||||
"""One subtitle track available on an embedded media item."""
|
|
||||||
|
|
||||||
language: str # ISO 639-2 code
|
|
||||||
subtitle_type: str # "standard" | "sdh" | "forced" | "unknown"
|
|
||||||
|
|
||||||
def to_dict(self) -> dict:
|
|
||||||
return {"language": self.language, "type": self.subtitle_type}
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class ManageSubtitlesResponse:
|
|
||||||
"""Response from the manage_subtitles use case."""
|
|
||||||
|
|
||||||
status: str # "ok" | "needs_clarification" | "error"
|
|
||||||
video_path: str | None = None
|
|
||||||
placed: list[PlacedSubtitle] | None = None
|
|
||||||
skipped_count: int = 0
|
|
||||||
unresolved: list[UnresolvedTrack] | None = None
|
|
||||||
available: list[AvailableSubtitle] | None = None # embedded tracks summary
|
|
||||||
error: str | None = None
|
|
||||||
message: str | None = None
|
|
||||||
|
|
||||||
def to_dict(self) -> dict:
|
|
||||||
if self.error:
|
|
||||||
return {"status": self.status, "error": self.error, "message": self.message}
|
|
||||||
result = {
|
|
||||||
"status": self.status,
|
|
||||||
"video_path": self.video_path,
|
|
||||||
"placed": [p.to_dict() for p in (self.placed or [])],
|
|
||||||
"placed_count": len(self.placed or []),
|
|
||||||
"skipped_count": self.skipped_count,
|
|
||||||
}
|
|
||||||
if self.unresolved:
|
|
||||||
result["unresolved"] = [u.to_dict() for u in self.unresolved]
|
|
||||||
result["unresolved_count"] = len(self.unresolved)
|
|
||||||
if self.available:
|
|
||||||
result["available"] = [a.to_dict() for a in self.available]
|
|
||||||
return result
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class CreateSeedLinksResponse:
|
|
||||||
"""Response from creating seed links for a torrent."""
|
|
||||||
|
|
||||||
status: str
|
|
||||||
torrent_subfolder: str | None = None
|
|
||||||
linked_file: str | None = None
|
|
||||||
copied_files: list[str] | None = None
|
|
||||||
copied_count: int = 0
|
|
||||||
skipped: list[str] | None = None
|
|
||||||
error: str | None = None
|
error: str | None = None
|
||||||
message: str | None = None
|
message: str | None = None
|
||||||
|
|
||||||
@@ -173,41 +86,26 @@ class CreateSeedLinksResponse:
|
|||||||
return {"status": self.status, "error": self.error, "message": self.message}
|
return {"status": self.status, "error": self.error, "message": self.message}
|
||||||
return {
|
return {
|
||||||
"status": self.status,
|
"status": self.status,
|
||||||
"torrent_subfolder": self.torrent_subfolder,
|
"source": str(self.source) if self.source else None,
|
||||||
"linked_file": self.linked_file,
|
"destination": str(self.destination) if self.destination else None,
|
||||||
"copied_files": self.copied_files or [],
|
|
||||||
"copied_count": self.copied_count,
|
|
||||||
"skipped": self.skipped or [],
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass(frozen=True)
|
||||||
class ListFolderResponse:
|
class MoveDirResponse:
|
||||||
"""Response from listing a folder."""
|
"""Response from ``move_dir_use_case``."""
|
||||||
|
|
||||||
status: str
|
status: str
|
||||||
folder_type: str | None = None
|
source: Path | None = None
|
||||||
path: str | None = None
|
destination: Path | None = None
|
||||||
entries: list[str] | None = None
|
|
||||||
count: int | None = None
|
|
||||||
error: str | None = None
|
error: str | None = None
|
||||||
message: str | None = None
|
message: str | None = None
|
||||||
|
|
||||||
def to_dict(self):
|
def to_dict(self) -> dict:
|
||||||
"""Convert to dict for agent compatibility."""
|
|
||||||
result = {"status": self.status}
|
|
||||||
|
|
||||||
if self.error:
|
if self.error:
|
||||||
result["error"] = self.error
|
return {"status": self.status, "error": self.error, "message": self.message}
|
||||||
result["message"] = self.message
|
return {
|
||||||
else:
|
"status": self.status,
|
||||||
if self.folder_type:
|
"source": str(self.source) if self.source else None,
|
||||||
result["folder_type"] = self.folder_type
|
"destination": str(self.destination) if self.destination else None,
|
||||||
if self.path:
|
}
|
||||||
result["path"] = self.path
|
|
||||||
if self.entries is not None:
|
|
||||||
result["entries"] = self.entries
|
|
||||||
if self.count is not None:
|
|
||||||
result["count"] = self.count
|
|
||||||
|
|
||||||
return result
|
|
||||||
|
|||||||
@@ -0,0 +1,188 @@
|
|||||||
|
"""Filesystem application DTOs."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class CopyMediaResponse:
|
||||||
|
"""Response from copying a media file."""
|
||||||
|
|
||||||
|
status: str
|
||||||
|
source: str | None = None
|
||||||
|
destination: str | None = None
|
||||||
|
filename: str | None = None
|
||||||
|
size: int | None = None
|
||||||
|
error: str | None = None
|
||||||
|
message: str | None = None
|
||||||
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
if self.error:
|
||||||
|
return {"status": self.status, "error": self.error, "message": self.message}
|
||||||
|
return {
|
||||||
|
"status": self.status,
|
||||||
|
"source": self.source,
|
||||||
|
"destination": self.destination,
|
||||||
|
"filename": self.filename,
|
||||||
|
"size": self.size,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class MoveMediaResponse:
|
||||||
|
"""Response from moving a media file."""
|
||||||
|
|
||||||
|
status: str
|
||||||
|
source: str | None = None
|
||||||
|
destination: str | None = None
|
||||||
|
filename: str | None = None
|
||||||
|
size: int | None = None
|
||||||
|
error: str | None = None
|
||||||
|
message: str | None = None
|
||||||
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
if self.error:
|
||||||
|
return {"status": self.status, "error": self.error, "message": self.message}
|
||||||
|
return {
|
||||||
|
"status": self.status,
|
||||||
|
"source": self.source,
|
||||||
|
"destination": self.destination,
|
||||||
|
"filename": self.filename,
|
||||||
|
"size": self.size,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class PlacedSubtitle:
|
||||||
|
"""One subtitle file successfully placed."""
|
||||||
|
|
||||||
|
source: str
|
||||||
|
destination: str
|
||||||
|
filename: str
|
||||||
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
return {
|
||||||
|
"source": self.source,
|
||||||
|
"destination": self.destination,
|
||||||
|
"filename": self.filename,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class UnresolvedTrack:
|
||||||
|
"""A subtitle track that needs agent clarification before placement."""
|
||||||
|
|
||||||
|
raw_tokens: list[str]
|
||||||
|
file_path: str | None = None
|
||||||
|
file_size_kb: float | None = None
|
||||||
|
reason: str = "" # "unknown_language" | "low_confidence"
|
||||||
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
return {
|
||||||
|
"raw_tokens": self.raw_tokens,
|
||||||
|
"file_path": self.file_path,
|
||||||
|
"file_size_kb": self.file_size_kb,
|
||||||
|
"reason": self.reason,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class AvailableSubtitle:
|
||||||
|
"""One subtitle track available on an embedded media item."""
|
||||||
|
|
||||||
|
language: str # ISO 639-2 code
|
||||||
|
subtitle_type: str # "standard" | "sdh" | "forced" | "unknown"
|
||||||
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
return {"language": self.language, "type": self.subtitle_type}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ManageSubtitlesResponse:
|
||||||
|
"""Response from the manage_subtitles use case."""
|
||||||
|
|
||||||
|
status: str # "ok" | "needs_clarification" | "error"
|
||||||
|
video_path: str | None = None
|
||||||
|
placed: list[PlacedSubtitle] | None = None
|
||||||
|
skipped_count: int = 0
|
||||||
|
unresolved: list[UnresolvedTrack] | None = None
|
||||||
|
available: list[AvailableSubtitle] | None = None # embedded tracks summary
|
||||||
|
error: str | None = None
|
||||||
|
message: str | None = None
|
||||||
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
if self.error:
|
||||||
|
return {"status": self.status, "error": self.error, "message": self.message}
|
||||||
|
result = {
|
||||||
|
"status": self.status,
|
||||||
|
"video_path": self.video_path,
|
||||||
|
"placed": [p.to_dict() for p in (self.placed or [])],
|
||||||
|
"placed_count": len(self.placed or []),
|
||||||
|
"skipped_count": self.skipped_count,
|
||||||
|
}
|
||||||
|
if self.unresolved:
|
||||||
|
result["unresolved"] = [u.to_dict() for u in self.unresolved]
|
||||||
|
result["unresolved_count"] = len(self.unresolved)
|
||||||
|
if self.available:
|
||||||
|
result["available"] = [a.to_dict() for a in self.available]
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class CreateSeedLinksResponse:
|
||||||
|
"""Response from creating seed links for a torrent."""
|
||||||
|
|
||||||
|
status: str
|
||||||
|
torrent_subfolder: str | None = None
|
||||||
|
linked_file: str | None = None
|
||||||
|
copied_files: list[str] | None = None
|
||||||
|
copied_count: int = 0
|
||||||
|
skipped: list[str] | None = None
|
||||||
|
error: str | None = None
|
||||||
|
message: str | None = None
|
||||||
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
if self.error:
|
||||||
|
return {"status": self.status, "error": self.error, "message": self.message}
|
||||||
|
return {
|
||||||
|
"status": self.status,
|
||||||
|
"torrent_subfolder": self.torrent_subfolder,
|
||||||
|
"linked_file": self.linked_file,
|
||||||
|
"copied_files": self.copied_files or [],
|
||||||
|
"copied_count": self.copied_count,
|
||||||
|
"skipped": self.skipped or [],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ListFolderResponse:
|
||||||
|
"""Response from listing a folder."""
|
||||||
|
|
||||||
|
status: str
|
||||||
|
folder_type: str | None = None # SHOULD BE A PROPERTY
|
||||||
|
path: str | None = None # NOT NONE - Should be path
|
||||||
|
entries: list[str] | None = None # NOT NONE - Empty list of path
|
||||||
|
count: int | None = None # USELESS
|
||||||
|
error: str | None = None
|
||||||
|
message: str | None = None
|
||||||
|
|
||||||
|
def to_dict(self):
|
||||||
|
"""Convert to dict for agent compatibility."""
|
||||||
|
result = {"status": self.status}
|
||||||
|
|
||||||
|
if self.error:
|
||||||
|
result["error"] = self.error
|
||||||
|
result["message"] = self.message
|
||||||
|
else:
|
||||||
|
if self.folder_type:
|
||||||
|
result["folder_type"] = self.folder_type
|
||||||
|
if self.path:
|
||||||
|
result["path"] = self.path
|
||||||
|
if self.entries is not None:
|
||||||
|
result["entries"] = self.entries
|
||||||
|
if self.count is not None:
|
||||||
|
result["count"] = self.count
|
||||||
|
|
||||||
|
return result
|
||||||
@@ -1,82 +0,0 @@
|
|||||||
"""enrich_from_probe — fill missing ParsedRelease fields from MediaInfo."""
|
|
||||||
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from alfred.domain.release.value_objects import ParsedRelease
|
|
||||||
from alfred.domain.shared.media import MediaInfo
|
|
||||||
|
|
||||||
# Map ffprobe codec names to scene-style codec tokens
|
|
||||||
_VIDEO_CODEC_MAP = {
|
|
||||||
"hevc": "x265",
|
|
||||||
"h264": "x264",
|
|
||||||
"h265": "x265",
|
|
||||||
"av1": "AV1",
|
|
||||||
"vp9": "VP9",
|
|
||||||
"mpeg4": "XviD",
|
|
||||||
}
|
|
||||||
|
|
||||||
# Map ffprobe audio codec names to scene-style tokens
|
|
||||||
_AUDIO_CODEC_MAP = {
|
|
||||||
"eac3": "EAC3",
|
|
||||||
"ac3": "AC3",
|
|
||||||
"dts": "DTS",
|
|
||||||
"truehd": "TrueHD",
|
|
||||||
"aac": "AAC",
|
|
||||||
"flac": "FLAC",
|
|
||||||
"opus": "OPUS",
|
|
||||||
"mp3": "MP3",
|
|
||||||
"pcm_s16l": "PCM",
|
|
||||||
"pcm_s24l": "PCM",
|
|
||||||
}
|
|
||||||
|
|
||||||
# Map channel count to standard layout string
|
|
||||||
_CHANNEL_MAP = {
|
|
||||||
8: "7.1",
|
|
||||||
6: "5.1",
|
|
||||||
2: "2.0",
|
|
||||||
1: "1.0",
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
def enrich_from_probe(parsed: ParsedRelease, info: MediaInfo) -> None:
|
|
||||||
"""
|
|
||||||
Fill None fields in parsed using data from ffprobe MediaInfo.
|
|
||||||
|
|
||||||
Only overwrites fields that are currently None — token-level values
|
|
||||||
from the release name always take priority.
|
|
||||||
Mutates parsed in place.
|
|
||||||
"""
|
|
||||||
if parsed.quality is None and info.resolution:
|
|
||||||
parsed.quality = info.resolution
|
|
||||||
|
|
||||||
if parsed.codec is None and info.video_codec:
|
|
||||||
parsed.codec = _VIDEO_CODEC_MAP.get(
|
|
||||||
info.video_codec.lower(), info.video_codec.upper()
|
|
||||||
)
|
|
||||||
|
|
||||||
if parsed.bit_depth is None and info.video_codec:
|
|
||||||
# ffprobe exposes bit depth via pix_fmt — not in MediaInfo yet, skip for now
|
|
||||||
pass
|
|
||||||
|
|
||||||
# Audio — use the default track, fallback to first
|
|
||||||
default_track = next((t for t in info.audio_tracks if t.is_default), None)
|
|
||||||
track = default_track or (info.audio_tracks[0] if info.audio_tracks else None)
|
|
||||||
|
|
||||||
if track:
|
|
||||||
if parsed.audio_codec is None and track.codec:
|
|
||||||
parsed.audio_codec = _AUDIO_CODEC_MAP.get(
|
|
||||||
track.codec.lower(), track.codec.upper()
|
|
||||||
)
|
|
||||||
|
|
||||||
if parsed.audio_channels is None and track.channels:
|
|
||||||
parsed.audio_channels = _CHANNEL_MAP.get(
|
|
||||||
track.channels, f"{track.channels}ch"
|
|
||||||
)
|
|
||||||
|
|
||||||
# Languages — merge ffprobe languages with token-level ones
|
|
||||||
# "und" = undetermined, not useful
|
|
||||||
if info.audio_languages:
|
|
||||||
existing = set(parsed.languages)
|
|
||||||
for lang in info.audio_languages:
|
|
||||||
if lang.lower() != "und" and lang.upper() not in existing:
|
|
||||||
parsed.languages.append(lang)
|
|
||||||
@@ -0,0 +1,40 @@
|
|||||||
|
"""link_file use case — hard-link a file from one root to another."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from alfred.infrastructure.filesystem import FilesystemError, link_file
|
||||||
|
|
||||||
|
from ._errors import PATH_NOT_ALLOWED, code_for
|
||||||
|
from .directory_roots import DirectoryRoots
|
||||||
|
from .dto import LinkFileResponse
|
||||||
|
|
||||||
|
|
||||||
|
def link_file_use_case(
|
||||||
|
src: Path, dst: Path, roots: DirectoryRoots
|
||||||
|
) -> LinkFileResponse:
|
||||||
|
"""Hard-link ``src`` to ``dst``. Both must be under configured roots.
|
||||||
|
|
||||||
|
The destination parent must already exist — the caller is expected
|
||||||
|
to have created it via ``create_dir_use_case`` if needed.
|
||||||
|
"""
|
||||||
|
if not roots.contains(src):
|
||||||
|
return LinkFileResponse(
|
||||||
|
status="error",
|
||||||
|
error=PATH_NOT_ALLOWED,
|
||||||
|
message=f"Source is outside configured roots: {src}",
|
||||||
|
)
|
||||||
|
if not roots.contains(dst):
|
||||||
|
return LinkFileResponse(
|
||||||
|
status="error",
|
||||||
|
error=PATH_NOT_ALLOWED,
|
||||||
|
message=f"Destination is outside configured roots: {dst}",
|
||||||
|
)
|
||||||
|
|
||||||
|
try:
|
||||||
|
link_file(src, dst)
|
||||||
|
except FilesystemError as e:
|
||||||
|
return LinkFileResponse(status="error", error=code_for(e), message=str(e))
|
||||||
|
|
||||||
|
return LinkFileResponse(status="ok", source=src, destination=dst)
|
||||||
@@ -0,0 +1,34 @@
|
|||||||
|
"""list_dir use case — list a directory after guarding it within roots."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from alfred.infrastructure.filesystem import FilesystemError, list_dir
|
||||||
|
|
||||||
|
from ._errors import PATH_NOT_ALLOWED, code_for
|
||||||
|
from .directory_roots import DirectoryRoots
|
||||||
|
from .dto import ListDirResponse
|
||||||
|
|
||||||
|
|
||||||
|
def list_dir_use_case(path: Path, roots: DirectoryRoots) -> ListDirResponse:
|
||||||
|
"""List the immediate children of ``path`` if it lives under one of
|
||||||
|
the configured roots.
|
||||||
|
|
||||||
|
Returns a :class:`ListDirResponse`. On guard failure, status is
|
||||||
|
``"error"`` with ``error="path_not_allowed"``. On infra failure,
|
||||||
|
status is ``"error"`` with a code mapped from the raised exception.
|
||||||
|
"""
|
||||||
|
if not roots.contains(path):
|
||||||
|
return ListDirResponse(
|
||||||
|
status="error",
|
||||||
|
error=PATH_NOT_ALLOWED,
|
||||||
|
message=f"Path is outside configured roots: {path}",
|
||||||
|
)
|
||||||
|
|
||||||
|
try:
|
||||||
|
entries = list_dir(path)
|
||||||
|
except FilesystemError as e:
|
||||||
|
return ListDirResponse(status="error", error=code_for(e), message=str(e))
|
||||||
|
|
||||||
|
return ListDirResponse(status="ok", path=path, entries=tuple(entries))
|
||||||
+18
-18
@@ -3,25 +3,25 @@
|
|||||||
import logging
|
import logging
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
from alfred.domain.shared.value_objects import ImdbId
|
from alfred.application.subtitles_TO_CHECK.placer import (
|
||||||
from alfred.domain.subtitles.entities import SubtitleCandidate
|
|
||||||
from alfred.domain.subtitles.services.identifier import SubtitleIdentifier
|
|
||||||
from alfred.domain.subtitles.services.matcher import SubtitleMatcher
|
|
||||||
from alfred.domain.subtitles.services.pattern_detector import PatternDetector
|
|
||||||
from alfred.application.subtitles.placer import (
|
|
||||||
PlacedTrack,
|
PlacedTrack,
|
||||||
SubtitlePlacer,
|
SubtitlePlacer,
|
||||||
_build_dest_name,
|
_build_dest_name,
|
||||||
)
|
)
|
||||||
from alfred.domain.subtitles.services.utils import available_subtitles
|
from alfred.domain.shared_TO_CHECK.value_objects import ImdbId
|
||||||
from alfred.domain.subtitles.value_objects import ScanStrategy
|
from alfred.domain.subtitles_TO_CHECK.entities import SubtitleScanResult
|
||||||
|
from alfred.domain.subtitles_TO_CHECK.services.identifier import SubtitleIdentifier
|
||||||
|
from alfred.domain.subtitles_TO_CHECK.services.matcher import SubtitleMatcher
|
||||||
|
from alfred.domain.subtitles_TO_CHECK.services.pattern_detector import PatternDetector
|
||||||
|
from alfred.domain.subtitles_TO_CHECK.services.utils import available_subtitles
|
||||||
|
from alfred.domain.subtitles_TO_CHECK.value_objects import ScanStrategy
|
||||||
from alfred.infrastructure.filesystem.scanner import PathlibFilesystemScanner
|
from alfred.infrastructure.filesystem.scanner import PathlibFilesystemScanner
|
||||||
from alfred.infrastructure.knowledge.subtitles.base import SubtitleKnowledgeBase
|
from alfred.infrastructure.knowledge_TO_CHECK.subtitles.base import SubtitleKnowledgeBase
|
||||||
from alfred.infrastructure.knowledge.subtitles.loader import KnowledgeLoader
|
from alfred.infrastructure.knowledge_TO_CHECK.subtitles.loader import KnowledgeLoader
|
||||||
from alfred.infrastructure.persistence.context import get_memory
|
from alfred.infrastructure.persistence_TO_CHECK.context import get_memory
|
||||||
from alfred.infrastructure.probe.ffprobe_prober import FfprobeMediaProber
|
from alfred.infrastructure.probe_TO_CHECK.ffprobe_prober import FfprobeMediaProber
|
||||||
from alfred.infrastructure.subtitle.metadata_store import SubtitleMetadataStore
|
from alfred.infrastructure.subtitle_TO_CHECK.metadata_store import SubtitleMetadataStore
|
||||||
from alfred.infrastructure.subtitle.rule_repository import RuleSetRepository
|
from alfred.infrastructure.subtitle_TO_CHECK.rule_repository import RuleSetRepository
|
||||||
|
|
||||||
from .dto import (
|
from .dto import (
|
||||||
AvailableSubtitle,
|
AvailableSubtitle,
|
||||||
@@ -278,7 +278,7 @@ class ManageSubtitlesUseCase:
|
|||||||
|
|
||||||
|
|
||||||
def _to_unresolved_dto(
|
def _to_unresolved_dto(
|
||||||
track: SubtitleCandidate, min_confidence: float = 0.7
|
track: SubtitleScanResult, min_confidence: float = 0.7
|
||||||
) -> UnresolvedTrack:
|
) -> UnresolvedTrack:
|
||||||
reason = "unknown_language" if track.language is None else "low_confidence"
|
reason = "unknown_language" if track.language is None else "low_confidence"
|
||||||
return UnresolvedTrack(
|
return UnresolvedTrack(
|
||||||
@@ -291,10 +291,10 @@ def _to_unresolved_dto(
|
|||||||
|
|
||||||
def _pair_placed_with_tracks(
|
def _pair_placed_with_tracks(
|
||||||
placed: list[PlacedTrack],
|
placed: list[PlacedTrack],
|
||||||
tracks: list[SubtitleCandidate],
|
tracks: list[SubtitleScanResult],
|
||||||
) -> list[tuple[PlacedTrack, SubtitleCandidate]]:
|
) -> list[tuple[PlacedTrack, SubtitleScanResult]]:
|
||||||
"""
|
"""
|
||||||
Pair each PlacedTrack with its originating SubtitleCandidate by source path.
|
Pair each PlacedTrack with its originating SubtitleScanResult by source path.
|
||||||
Falls back to positional matching if paths don't align.
|
Falls back to positional matching if paths don't align.
|
||||||
"""
|
"""
|
||||||
track_by_path = {t.file_path: t for t in tracks if t.file_path}
|
track_by_path = {t.file_path: t for t in tracks if t.file_path}
|
||||||
@@ -0,0 +1,36 @@
|
|||||||
|
"""move_dir use case — move a directory tree between configured roots."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from alfred.infrastructure.filesystem import FilesystemError, move_dir
|
||||||
|
|
||||||
|
from ._errors import PATH_NOT_ALLOWED, code_for
|
||||||
|
from .directory_roots import DirectoryRoots
|
||||||
|
from .dto import MoveDirResponse
|
||||||
|
|
||||||
|
|
||||||
|
def move_dir_use_case(
|
||||||
|
src: Path, dst: Path, roots: DirectoryRoots
|
||||||
|
) -> MoveDirResponse:
|
||||||
|
"""Move directory ``src`` to ``dst``. Both must be under configured roots."""
|
||||||
|
if not roots.contains(src):
|
||||||
|
return MoveDirResponse(
|
||||||
|
status="error",
|
||||||
|
error=PATH_NOT_ALLOWED,
|
||||||
|
message=f"Source is outside configured roots: {src}",
|
||||||
|
)
|
||||||
|
if not roots.contains(dst):
|
||||||
|
return MoveDirResponse(
|
||||||
|
status="error",
|
||||||
|
error=PATH_NOT_ALLOWED,
|
||||||
|
message=f"Destination is outside configured roots: {dst}",
|
||||||
|
)
|
||||||
|
|
||||||
|
try:
|
||||||
|
move_dir(src, dst)
|
||||||
|
except FilesystemError as e:
|
||||||
|
return MoveDirResponse(status="error", error=code_for(e), message=str(e))
|
||||||
|
|
||||||
|
return MoveDirResponse(status="ok", source=src, destination=dst)
|
||||||
@@ -0,0 +1,36 @@
|
|||||||
|
"""move_file use case — move a file between configured roots."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from alfred.infrastructure.filesystem import FilesystemError, move_file
|
||||||
|
|
||||||
|
from ._errors import PATH_NOT_ALLOWED, code_for
|
||||||
|
from .directory_roots import DirectoryRoots
|
||||||
|
from .dto import MoveFileResponse
|
||||||
|
|
||||||
|
|
||||||
|
def move_file_use_case(
|
||||||
|
src: Path, dst: Path, roots: DirectoryRoots
|
||||||
|
) -> MoveFileResponse:
|
||||||
|
"""Move file ``src`` to ``dst``. Both must be under configured roots."""
|
||||||
|
if not roots.contains(src):
|
||||||
|
return MoveFileResponse(
|
||||||
|
status="error",
|
||||||
|
error=PATH_NOT_ALLOWED,
|
||||||
|
message=f"Source is outside configured roots: {src}",
|
||||||
|
)
|
||||||
|
if not roots.contains(dst):
|
||||||
|
return MoveFileResponse(
|
||||||
|
status="error",
|
||||||
|
error=PATH_NOT_ALLOWED,
|
||||||
|
message=f"Destination is outside configured roots: {dst}",
|
||||||
|
)
|
||||||
|
|
||||||
|
try:
|
||||||
|
move_file(src, dst)
|
||||||
|
except FilesystemError as e:
|
||||||
|
return MoveFileResponse(status="error", error=code_for(e), message=str(e))
|
||||||
|
|
||||||
|
return MoveFileResponse(status="ok", source=src, destination=dst)
|
||||||
+55
-15
@@ -22,16 +22,35 @@ import logging
|
|||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
|
from alfred.application.release_TO_CHECK import inspect_release
|
||||||
from alfred.domain.release import parse_release
|
from alfred.domain.release import parse_release
|
||||||
from alfred.domain.release.ports import ReleaseKnowledge
|
from alfred.domain.releases_TO_CHECK.ports import ReleaseKnowledge
|
||||||
from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
|
from alfred.domain.release.value_objects import ParsedRelease
|
||||||
from alfred.infrastructure.persistence import get_memory
|
from alfred.domain.shared_TO_CHECK.ports import MediaProber
|
||||||
|
from alfred.infrastructure.persistence_TO_CHECK import get_memory
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
# Single module-level knowledge instance. YAML is loaded once at first import.
|
|
||||||
# Tests that need a custom KB can monkeypatch this attribute.
|
def _resolve_parsed(
|
||||||
_KB: ReleaseKnowledge = YamlReleaseKnowledge()
|
release_name: str,
|
||||||
|
source_path: str | None,
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
prober: MediaProber,
|
||||||
|
) -> ParsedRelease:
|
||||||
|
"""Pick the right entry point depending on whether we have a path.
|
||||||
|
|
||||||
|
When ``source_path`` is provided and points to something that exists,
|
||||||
|
we run the full inspection pipeline so probe data can refresh tech
|
||||||
|
fields (which feed every filename builder). Otherwise we fall back
|
||||||
|
to a parse-only path — same behavior as before.
|
||||||
|
"""
|
||||||
|
if source_path:
|
||||||
|
path = Path(source_path)
|
||||||
|
if path.exists():
|
||||||
|
return inspect_release(release_name, path, kb, prober).parsed
|
||||||
|
parsed, _ = parse_release(release_name, kb)
|
||||||
|
return parsed
|
||||||
|
|
||||||
|
|
||||||
def _find_existing_tvshow_folders(
|
def _find_existing_tvshow_folders(
|
||||||
@@ -236,13 +255,20 @@ def resolve_season_destination(
|
|||||||
release_name: str,
|
release_name: str,
|
||||||
tmdb_title: str,
|
tmdb_title: str,
|
||||||
tmdb_year: int,
|
tmdb_year: int,
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
prober: MediaProber,
|
||||||
confirmed_folder: str | None = None,
|
confirmed_folder: str | None = None,
|
||||||
|
source_path: str | None = None,
|
||||||
) -> ResolvedSeasonDestination:
|
) -> ResolvedSeasonDestination:
|
||||||
"""
|
"""
|
||||||
Compute destination paths for a season pack.
|
Compute destination paths for a season pack.
|
||||||
|
|
||||||
Returns series_folder + season_folder. No file paths — the whole
|
Returns series_folder + season_folder. No file paths — the whole
|
||||||
source folder is moved as-is into season_folder.
|
source folder is moved as-is into season_folder.
|
||||||
|
|
||||||
|
When ``source_path`` points to the release on disk, the parser is
|
||||||
|
augmented with ffprobe data so tech tokens missing from the release
|
||||||
|
name (quality / codec) end up in the folder names.
|
||||||
"""
|
"""
|
||||||
tv_root = _get_tv_root()
|
tv_root = _get_tv_root()
|
||||||
if not tv_root:
|
if not tv_root:
|
||||||
@@ -252,8 +278,8 @@ def resolve_season_destination(
|
|||||||
message="TV show library path is not configured.",
|
message="TV show library path is not configured.",
|
||||||
)
|
)
|
||||||
|
|
||||||
parsed = parse_release(release_name, _KB)
|
parsed = _resolve_parsed(release_name, source_path, kb, prober)
|
||||||
tmdb_title_safe = _KB.sanitize_for_fs(tmdb_title)
|
tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
|
||||||
computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)
|
computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)
|
||||||
|
|
||||||
resolved = _resolve_series_folder(
|
resolved = _resolve_series_folder(
|
||||||
@@ -286,6 +312,8 @@ def resolve_episode_destination(
|
|||||||
source_file: str,
|
source_file: str,
|
||||||
tmdb_title: str,
|
tmdb_title: str,
|
||||||
tmdb_year: int,
|
tmdb_year: int,
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
prober: MediaProber,
|
||||||
tmdb_episode_title: str | None = None,
|
tmdb_episode_title: str | None = None,
|
||||||
confirmed_folder: str | None = None,
|
confirmed_folder: str | None = None,
|
||||||
) -> ResolvedEpisodeDestination:
|
) -> ResolvedEpisodeDestination:
|
||||||
@@ -293,6 +321,8 @@ def resolve_episode_destination(
|
|||||||
Compute destination paths for a single episode file.
|
Compute destination paths for a single episode file.
|
||||||
|
|
||||||
Returns series_folder + season_folder + library_file (full path to .mkv).
|
Returns series_folder + season_folder + library_file (full path to .mkv).
|
||||||
|
``source_file`` doubles as the inspection target — when it exists,
|
||||||
|
ffprobe enrichment refreshes tech tokens missing from the release name.
|
||||||
"""
|
"""
|
||||||
tv_root = _get_tv_root()
|
tv_root = _get_tv_root()
|
||||||
if not tv_root:
|
if not tv_root:
|
||||||
@@ -302,11 +332,11 @@ def resolve_episode_destination(
|
|||||||
message="TV show library path is not configured.",
|
message="TV show library path is not configured.",
|
||||||
)
|
)
|
||||||
|
|
||||||
parsed = parse_release(release_name, _KB)
|
parsed = _resolve_parsed(release_name, source_file, kb, prober)
|
||||||
ext = Path(source_file).suffix
|
ext = Path(source_file).suffix
|
||||||
tmdb_title_safe = _KB.sanitize_for_fs(tmdb_title)
|
tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
|
||||||
tmdb_episode_title_safe = (
|
tmdb_episode_title_safe = (
|
||||||
_KB.sanitize_for_fs(tmdb_episode_title) if tmdb_episode_title else None
|
kb.sanitize_for_fs(tmdb_episode_title) if tmdb_episode_title else None
|
||||||
)
|
)
|
||||||
computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)
|
computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)
|
||||||
|
|
||||||
@@ -345,11 +375,15 @@ def resolve_movie_destination(
|
|||||||
source_file: str,
|
source_file: str,
|
||||||
tmdb_title: str,
|
tmdb_title: str,
|
||||||
tmdb_year: int,
|
tmdb_year: int,
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
prober: MediaProber,
|
||||||
) -> ResolvedMovieDestination:
|
) -> ResolvedMovieDestination:
|
||||||
"""
|
"""
|
||||||
Compute destination paths for a movie file.
|
Compute destination paths for a movie file.
|
||||||
|
|
||||||
Returns movie_folder + library_file (full path to .mkv).
|
Returns movie_folder + library_file (full path to .mkv).
|
||||||
|
``source_file`` doubles as the inspection target — when it exists,
|
||||||
|
ffprobe enrichment refreshes tech tokens missing from the release name.
|
||||||
"""
|
"""
|
||||||
memory = get_memory()
|
memory = get_memory()
|
||||||
movies_root = memory.ltm.library_paths.get("movie")
|
movies_root = memory.ltm.library_paths.get("movie")
|
||||||
@@ -360,9 +394,9 @@ def resolve_movie_destination(
|
|||||||
message="Movie library path is not configured.",
|
message="Movie library path is not configured.",
|
||||||
)
|
)
|
||||||
|
|
||||||
parsed = parse_release(release_name, _KB)
|
parsed = _resolve_parsed(release_name, source_file, kb, prober)
|
||||||
ext = Path(source_file).suffix
|
ext = Path(source_file).suffix
|
||||||
tmdb_title_safe = _KB.sanitize_for_fs(tmdb_title)
|
tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
|
||||||
|
|
||||||
folder_name = parsed.movie_folder_name(tmdb_title_safe, tmdb_year)
|
folder_name = parsed.movie_folder_name(tmdb_title_safe, tmdb_year)
|
||||||
filename = parsed.movie_filename(tmdb_title_safe, tmdb_year, ext)
|
filename = parsed.movie_filename(tmdb_title_safe, tmdb_year, ext)
|
||||||
@@ -384,12 +418,18 @@ def resolve_series_destination(
|
|||||||
release_name: str,
|
release_name: str,
|
||||||
tmdb_title: str,
|
tmdb_title: str,
|
||||||
tmdb_year: int,
|
tmdb_year: int,
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
prober: MediaProber,
|
||||||
confirmed_folder: str | None = None,
|
confirmed_folder: str | None = None,
|
||||||
|
source_path: str | None = None,
|
||||||
) -> ResolvedSeriesDestination:
|
) -> ResolvedSeriesDestination:
|
||||||
"""
|
"""
|
||||||
Compute destination path for a complete multi-season series pack.
|
Compute destination path for a complete multi-season series pack.
|
||||||
|
|
||||||
Returns only series_folder — the whole pack lands directly inside it.
|
Returns only series_folder — the whole pack lands directly inside it.
|
||||||
|
|
||||||
|
When ``source_path`` points to the release on disk, ffprobe
|
||||||
|
enrichment refreshes tech tokens missing from the release name.
|
||||||
"""
|
"""
|
||||||
tv_root = _get_tv_root()
|
tv_root = _get_tv_root()
|
||||||
if not tv_root:
|
if not tv_root:
|
||||||
@@ -399,8 +439,8 @@ def resolve_series_destination(
|
|||||||
message="TV show library path is not configured.",
|
message="TV show library path is not configured.",
|
||||||
)
|
)
|
||||||
|
|
||||||
parsed = parse_release(release_name, _KB)
|
parsed = _resolve_parsed(release_name, source_path, kb, prober)
|
||||||
tmdb_title_safe = _KB.sanitize_for_fs(tmdb_title)
|
tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
|
||||||
computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)
|
computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)
|
||||||
|
|
||||||
resolved = _resolve_series_folder(
|
resolved = _resolve_series_folder(
|
||||||
@@ -1,50 +0,0 @@
|
|||||||
"""Set folder path use case."""
|
|
||||||
|
|
||||||
import logging
|
|
||||||
|
|
||||||
from alfred.infrastructure.filesystem import FileManager
|
|
||||||
|
|
||||||
from .dto import SetFolderPathResponse
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
|
|
||||||
class SetFolderPathUseCase:
|
|
||||||
"""
|
|
||||||
Use case for setting a folder path in configuration.
|
|
||||||
|
|
||||||
This orchestrates the FileManager to set folder paths.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, file_manager: FileManager):
|
|
||||||
"""
|
|
||||||
Initialize use case.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
file_manager: FileManager instance
|
|
||||||
"""
|
|
||||||
self.file_manager = file_manager
|
|
||||||
|
|
||||||
def execute(self, folder_name: str, path_value: str) -> SetFolderPathResponse:
|
|
||||||
"""
|
|
||||||
Set a folder path in configuration.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
folder_name: Name of folder to set (download, tvshow, movie, torrent)
|
|
||||||
path_value: Absolute path to the folder
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
SetFolderPathResponse with success or error information
|
|
||||||
"""
|
|
||||||
result = self.file_manager.set_folder_path(folder_name, path_value)
|
|
||||||
|
|
||||||
if result.get("status") == "ok":
|
|
||||||
return SetFolderPathResponse(
|
|
||||||
status="ok",
|
|
||||||
folder_name=result.get("folder_name"),
|
|
||||||
path=result.get("path"),
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
return SetFolderPathResponse(
|
|
||||||
status="error", error=result.get("error"), message=result.get("message")
|
|
||||||
)
|
|
||||||
@@ -1,44 +0,0 @@
|
|||||||
"""Movie application DTOs."""
|
|
||||||
|
|
||||||
from dataclasses import dataclass
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class SearchMovieResponse:
|
|
||||||
"""Response from searching for a movie."""
|
|
||||||
|
|
||||||
status: str
|
|
||||||
imdb_id: str | None = None
|
|
||||||
title: str | None = None
|
|
||||||
media_type: str | None = None
|
|
||||||
tmdb_id: int | None = None
|
|
||||||
overview: str | None = None
|
|
||||||
release_date: str | None = None
|
|
||||||
vote_average: float | None = None
|
|
||||||
error: str | None = None
|
|
||||||
message: str | None = None
|
|
||||||
|
|
||||||
def to_dict(self):
|
|
||||||
"""Convert to dict for agent compatibility."""
|
|
||||||
result = {"status": self.status}
|
|
||||||
|
|
||||||
if self.error:
|
|
||||||
result["error"] = self.error
|
|
||||||
result["message"] = self.message
|
|
||||||
else:
|
|
||||||
if self.imdb_id:
|
|
||||||
result["imdb_id"] = self.imdb_id
|
|
||||||
if self.title:
|
|
||||||
result["title"] = self.title
|
|
||||||
if self.media_type:
|
|
||||||
result["media_type"] = self.media_type
|
|
||||||
if self.tmdb_id:
|
|
||||||
result["tmdb_id"] = self.tmdb_id
|
|
||||||
if self.overview:
|
|
||||||
result["overview"] = self.overview
|
|
||||||
if self.release_date:
|
|
||||||
result["release_date"] = self.release_date
|
|
||||||
if self.vote_average:
|
|
||||||
result["vote_average"] = self.vote_average
|
|
||||||
|
|
||||||
return result
|
|
||||||
@@ -1,93 +0,0 @@
|
|||||||
"""Search movie use case."""
|
|
||||||
|
|
||||||
import logging
|
|
||||||
|
|
||||||
from alfred.infrastructure.api.tmdb import (
|
|
||||||
TMDBAPIError,
|
|
||||||
TMDBClient,
|
|
||||||
TMDBConfigurationError,
|
|
||||||
TMDBNotFoundError,
|
|
||||||
)
|
|
||||||
|
|
||||||
from .dto import SearchMovieResponse
|
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
|
|
||||||
class SearchMovieUseCase:
|
|
||||||
"""
|
|
||||||
Use case for searching a movie and retrieving its IMDb ID.
|
|
||||||
|
|
||||||
This orchestrates the TMDB API client to find movie information.
|
|
||||||
"""
|
|
||||||
|
|
||||||
def __init__(self, tmdb_client: TMDBClient):
|
|
||||||
"""
|
|
||||||
Initialize use case.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
tmdb_client: TMDB API client
|
|
||||||
"""
|
|
||||||
self.tmdb_client = tmdb_client
|
|
||||||
|
|
||||||
def execute(self, media_title: str) -> SearchMovieResponse:
|
|
||||||
"""
|
|
||||||
Search for a movie by title.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
media_title: Title of the movie to search for
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
SearchMovieResponse with movie information or error
|
|
||||||
"""
|
|
||||||
try:
|
|
||||||
# Use the TMDB client to search for media
|
|
||||||
result = self.tmdb_client.search_media(media_title)
|
|
||||||
|
|
||||||
# Check if IMDb ID was found
|
|
||||||
if result.imdb_id:
|
|
||||||
logger.info(f"IMDb ID found for '{media_title}': {result.imdb_id}")
|
|
||||||
return SearchMovieResponse(
|
|
||||||
status="ok",
|
|
||||||
imdb_id=result.imdb_id,
|
|
||||||
title=result.title,
|
|
||||||
media_type=result.media_type,
|
|
||||||
tmdb_id=result.tmdb_id,
|
|
||||||
overview=result.overview,
|
|
||||||
release_date=result.release_date,
|
|
||||||
vote_average=result.vote_average,
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
logger.warning(f"No IMDb ID available for '{media_title}'")
|
|
||||||
return SearchMovieResponse(
|
|
||||||
status="ok",
|
|
||||||
title=result.title,
|
|
||||||
media_type=result.media_type,
|
|
||||||
tmdb_id=result.tmdb_id,
|
|
||||||
error="no_imdb_id",
|
|
||||||
message=f"No IMDb ID available for '{result.title}'",
|
|
||||||
)
|
|
||||||
|
|
||||||
except TMDBNotFoundError as e:
|
|
||||||
logger.info(f"Media not found: {e}")
|
|
||||||
return SearchMovieResponse(
|
|
||||||
status="error", error="not_found", message=str(e)
|
|
||||||
)
|
|
||||||
|
|
||||||
except TMDBConfigurationError as e:
|
|
||||||
logger.error(f"TMDB configuration error: {e}")
|
|
||||||
return SearchMovieResponse(
|
|
||||||
status="error", error="configuration_error", message=str(e)
|
|
||||||
)
|
|
||||||
|
|
||||||
except TMDBAPIError as e:
|
|
||||||
logger.error(f"TMDB API error: {e}")
|
|
||||||
return SearchMovieResponse(
|
|
||||||
status="error", error="api_error", message=str(e)
|
|
||||||
)
|
|
||||||
|
|
||||||
except ValueError as e:
|
|
||||||
logger.error(f"Validation error: {e}")
|
|
||||||
return SearchMovieResponse(
|
|
||||||
status="error", error="validation_failed", message=str(e)
|
|
||||||
)
|
|
||||||
+3
-2
@@ -1,9 +1,10 @@
|
|||||||
"""Movie use cases."""
|
"""Movie use cases."""
|
||||||
|
|
||||||
from .dto import SearchMovieResponse
|
from .dto import MovieHit, SearchMovieResponse
|
||||||
from .search_movie import SearchMovieUseCase
|
from .search_movie import SearchMovieUseCase
|
||||||
|
|
||||||
__all__ = [
|
__all__ = [
|
||||||
"SearchMovieUseCase",
|
"MovieHit",
|
||||||
"SearchMovieResponse",
|
"SearchMovieResponse",
|
||||||
|
"SearchMovieUseCase",
|
||||||
]
|
]
|
||||||
@@ -0,0 +1,40 @@
|
|||||||
|
"""Movie application DTOs."""
|
||||||
|
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class MovieHit:
|
||||||
|
"""One movie hit, flattened for transport to the agent."""
|
||||||
|
|
||||||
|
tmdb_id: int
|
||||||
|
title: str
|
||||||
|
release_year: int | None = None
|
||||||
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
out: dict = {"tmdb_id": self.tmdb_id, "title": self.title}
|
||||||
|
if self.release_year is not None:
|
||||||
|
out["release_year"] = self.release_year
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SearchMovieResponse:
|
||||||
|
"""Response from searching for a movie."""
|
||||||
|
|
||||||
|
status: str
|
||||||
|
hits: list[MovieHit] = field(default_factory=list)
|
||||||
|
error: str | None = None
|
||||||
|
message: str | None = None
|
||||||
|
|
||||||
|
def to_dict(self):
|
||||||
|
"""Convert to dict for agent compatibility."""
|
||||||
|
result: dict = {"status": self.status}
|
||||||
|
|
||||||
|
if self.error:
|
||||||
|
result["error"] = self.error
|
||||||
|
result["message"] = self.message
|
||||||
|
else:
|
||||||
|
result["hits"] = [h.to_dict() for h in self.hits]
|
||||||
|
|
||||||
|
return result
|
||||||
@@ -0,0 +1,60 @@
|
|||||||
|
"""Search movie use case."""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
|
||||||
|
from alfred.infrastructure.api_TO_CHECK.tmdb import (
|
||||||
|
TMDBAPIError,
|
||||||
|
TMDBClient,
|
||||||
|
TMDBConfigurationError,
|
||||||
|
)
|
||||||
|
|
||||||
|
from .dto import MovieHit, SearchMovieResponse
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
class SearchMovieUseCase:
|
||||||
|
"""List movies matching a free-text query via TMDB ``/search/movie``.
|
||||||
|
|
||||||
|
The use case is a thin orchestrator: it asks the client for hits,
|
||||||
|
flattens domain VOs into agent-friendly primitives, and wraps
|
||||||
|
errors. It deliberately does **not** look up ``imdb_id`` —
|
||||||
|
enrichment is the caller's job (via :meth:`TMDBClient.get_movie_info`
|
||||||
|
on a chosen ``tmdb_id``).
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, tmdb_client: TMDBClient):
|
||||||
|
self.tmdb_client = tmdb_client
|
||||||
|
|
||||||
|
def execute(self, media_title: str) -> SearchMovieResponse:
|
||||||
|
try:
|
||||||
|
results = self.tmdb_client.search_movies(media_title)
|
||||||
|
|
||||||
|
hits = [
|
||||||
|
MovieHit(
|
||||||
|
tmdb_id=r.tmdb_id.value,
|
||||||
|
title=str(r.title),
|
||||||
|
release_year=r.release_year.value if r.release_year else None,
|
||||||
|
)
|
||||||
|
for r in results
|
||||||
|
]
|
||||||
|
logger.info(f"search_movies({media_title!r}) → {len(hits)} hits")
|
||||||
|
return SearchMovieResponse(status="ok", hits=hits)
|
||||||
|
|
||||||
|
except TMDBConfigurationError as e:
|
||||||
|
logger.error(f"TMDB configuration error: {e}")
|
||||||
|
return SearchMovieResponse(
|
||||||
|
status="error", error="configuration_error", message=str(e)
|
||||||
|
)
|
||||||
|
|
||||||
|
except TMDBAPIError as e:
|
||||||
|
logger.error(f"TMDB API error: {e}")
|
||||||
|
return SearchMovieResponse(
|
||||||
|
status="error", error="api_error", message=str(e)
|
||||||
|
)
|
||||||
|
|
||||||
|
except ValueError as e:
|
||||||
|
logger.error(f"Validation error: {e}")
|
||||||
|
return SearchMovieResponse(
|
||||||
|
status="error", error="validation_failed", message=str(e)
|
||||||
|
)
|
||||||
@@ -0,0 +1,20 @@
|
|||||||
|
"""Release application layer — orchestrators sitting between domain
|
||||||
|
parsing and infrastructure I/O.
|
||||||
|
|
||||||
|
Public surface:
|
||||||
|
|
||||||
|
- :func:`is_supported_video` / :func:`find_main_video` — pre-pipeline
|
||||||
|
filesystem helpers (extension-only filtering, top-level video pick).
|
||||||
|
- :func:`inspect_release` / :class:`InspectedResult` — full inspection
|
||||||
|
pipeline combining parse + filesystem refinement + probe enrichment.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from .inspect import InspectedResult, inspect_release
|
||||||
|
from .supported_media import find_main_video, is_supported_video
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"InspectedResult",
|
||||||
|
"find_main_video",
|
||||||
|
"inspect_release",
|
||||||
|
"is_supported_video",
|
||||||
|
]
|
||||||
+1
-1
@@ -19,7 +19,7 @@ from __future__ import annotations
|
|||||||
|
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
from alfred.domain.release.ports import ReleaseKnowledge
|
from alfred.domain.releases_TO_CHECK.ports import ReleaseKnowledge
|
||||||
from alfred.domain.release.value_objects import ParsedRelease
|
from alfred.domain.release.value_objects import ParsedRelease
|
||||||
|
|
||||||
|
|
||||||
@@ -0,0 +1,74 @@
|
|||||||
|
"""enrich_from_probe — fill missing ParsedRelease fields from MediaInfo."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import replace
|
||||||
|
|
||||||
|
from alfred.domain.releases_TO_CHECK.ports import ReleaseKnowledge
|
||||||
|
from alfred.domain.release.value_objects import ParsedRelease
|
||||||
|
from alfred.domain.shared_TO_CHECK.media import MediaInfo
|
||||||
|
|
||||||
|
|
||||||
|
def enrich_from_probe(
|
||||||
|
parsed: ParsedRelease, info: MediaInfo, kb: ReleaseKnowledge
|
||||||
|
) -> ParsedRelease:
|
||||||
|
"""
|
||||||
|
Return a new ParsedRelease with None fields filled from ffprobe MediaInfo.
|
||||||
|
|
||||||
|
Only overwrites fields that are currently None — token-level values
|
||||||
|
from the release name always take priority. ``ParsedRelease`` is
|
||||||
|
frozen; this returns a new instance via :func:`dataclasses.replace`.
|
||||||
|
|
||||||
|
Translation tables (ffprobe codec name → scene token, channel count
|
||||||
|
→ layout) live in ``kb.probe_mappings`` (loaded from
|
||||||
|
``alfred/knowledge/release/probe_mappings.yaml``). When ffprobe
|
||||||
|
reports a value with no mapping entry, the fallback is the uppercase
|
||||||
|
raw value so unknown codecs still surface in a predictable form.
|
||||||
|
"""
|
||||||
|
mappings = kb.probe_mappings
|
||||||
|
video_codec_map: dict[str, str] = mappings.get("video_codec", {})
|
||||||
|
audio_codec_map: dict[str, str] = mappings.get("audio_codec", {})
|
||||||
|
channel_map: dict[int, str] = mappings.get("audio_channels", {})
|
||||||
|
|
||||||
|
updates: dict[str, object] = {}
|
||||||
|
|
||||||
|
if parsed.quality is None and info.resolution:
|
||||||
|
updates["quality"] = info.resolution
|
||||||
|
|
||||||
|
if parsed.codec is None and info.video_codec:
|
||||||
|
updates["codec"] = video_codec_map.get(
|
||||||
|
info.video_codec.lower(), info.video_codec.upper()
|
||||||
|
)
|
||||||
|
|
||||||
|
# bit_depth: ffprobe exposes it via pix_fmt — not in MediaInfo yet, skip.
|
||||||
|
|
||||||
|
# Audio — use the default track, fallback to first
|
||||||
|
default_track = next((t for t in info.audio_tracks if t.is_default), None)
|
||||||
|
track = default_track or (info.audio_tracks[0] if info.audio_tracks else None)
|
||||||
|
|
||||||
|
if track:
|
||||||
|
if parsed.audio_codec is None and track.codec:
|
||||||
|
updates["audio_codec"] = audio_codec_map.get(
|
||||||
|
track.codec.lower(), track.codec.upper()
|
||||||
|
)
|
||||||
|
|
||||||
|
if parsed.audio_channels is None and track.channels:
|
||||||
|
updates["audio_channels"] = channel_map.get(
|
||||||
|
track.channels, f"{track.channels}ch"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Languages — merge ffprobe languages with token-level ones
|
||||||
|
# "und" = undetermined, not useful
|
||||||
|
if info.audio_languages:
|
||||||
|
existing_upper = {lang.upper() for lang in parsed.languages}
|
||||||
|
new_languages = list(parsed.languages)
|
||||||
|
for lang in info.audio_languages:
|
||||||
|
if lang.lower() != "und" and lang.upper() not in existing_upper:
|
||||||
|
new_languages.append(lang)
|
||||||
|
existing_upper.add(lang.upper())
|
||||||
|
if len(new_languages) != len(parsed.languages):
|
||||||
|
updates["languages"] = tuple(new_languages)
|
||||||
|
|
||||||
|
if not updates:
|
||||||
|
return parsed
|
||||||
|
return replace(parsed, **updates)
|
||||||
@@ -0,0 +1,192 @@
|
|||||||
|
"""Release inspection orchestrator — the canonical "look at this thing"
|
||||||
|
entry point.
|
||||||
|
|
||||||
|
``inspect_release`` is the single composition of the four layers we
|
||||||
|
care about for a freshly-arrived release:
|
||||||
|
|
||||||
|
1. **Parse the name** — :func:`alfred.domain.release.services.parse_release`
|
||||||
|
gives a ``ParsedRelease`` plus a ``ParseReport`` (confidence + road).
|
||||||
|
2. **Pick the main video** — :func:`find_main_video` runs a top-level
|
||||||
|
scan over the source path. If nothing qualifies the result still
|
||||||
|
completes; downstream callers decide what to do with a videoless
|
||||||
|
release.
|
||||||
|
3. **Refine the media type** — :func:`detect_media_type` uses the
|
||||||
|
on-disk extension mix to override any token-level guess (e.g. a
|
||||||
|
bare ``.iso`` folder becomes ``"other"``). The refined value is
|
||||||
|
patched onto ``parsed`` in place — same convention as
|
||||||
|
``analyze_release`` had before.
|
||||||
|
4. **Probe the video** — the injected :class:`MediaProber` fills in
|
||||||
|
missing technical fields via :func:`enrich_from_probe`. Skipped
|
||||||
|
when there is no main video or when ``media_type`` ended up in
|
||||||
|
``{"unknown", "other"}`` (the probe would tell us nothing useful).
|
||||||
|
|
||||||
|
The return type is :class:`InspectedResult`, a frozen VO that bundles
|
||||||
|
everything downstream callers need (``analyze_release`` tool,
|
||||||
|
``resolve_destination``, future workflow stages) without forcing them
|
||||||
|
to redo the same four calls.
|
||||||
|
|
||||||
|
Design notes:
|
||||||
|
|
||||||
|
- **Application layer.** This module touches both domain
|
||||||
|
(``parse_release``) and infrastructure (``MediaProber`` port). That
|
||||||
|
is exactly application's job — orchestrate.
|
||||||
|
- **Knowledge base is injected.** ``inspect_release`` takes ``kb`` and
|
||||||
|
``prober`` as parameters; no module-level singletons here. Callers
|
||||||
|
(the tool wrapper, tests) decide what to plug in.
|
||||||
|
- **Mutation is contained.** We still mutate ``parsed.media_type`` and
|
||||||
|
let ``enrich_from_probe`` fill its ``None`` fields, because
|
||||||
|
``ParsedRelease`` is intentionally a mutable dataclass. The outer
|
||||||
|
``InspectedResult`` is frozen so the *bundle* is immutable from the
|
||||||
|
caller's perspective.
|
||||||
|
- **Never raises.** Filesystem / probe errors surface as ``None``
|
||||||
|
fields on the result, never as exceptions — same contract as the
|
||||||
|
underlying adapters.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass, replace
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from alfred.application.release_TO_CHECK.detect_media_type import detect_media_type
|
||||||
|
from alfred.application.release_TO_CHECK.enrich_from_probe import enrich_from_probe
|
||||||
|
from alfred.application.release_TO_CHECK.supported_media import find_main_video
|
||||||
|
from alfred.domain.releases_TO_CHECK.ports import ReleaseKnowledge
|
||||||
|
from alfred.domain.releases_TO_CHECK.parser.services import parse_release
|
||||||
|
from alfred.domain.release.value_objects import (
|
||||||
|
MediaTypeToken,
|
||||||
|
ParsedRelease,
|
||||||
|
ParseReport,
|
||||||
|
)
|
||||||
|
from alfred.domain.shared_TO_CHECK.media import MediaInfo
|
||||||
|
from alfred.domain.shared_TO_CHECK.ports import MediaProber
|
||||||
|
|
||||||
|
# Media types for which a probe carries no useful information.
|
||||||
|
_NON_PROBABLE_MEDIA_TYPES = frozenset({"unknown", "other"})
|
||||||
|
|
||||||
|
# Media types for which there's nothing for the organizer to do.
|
||||||
|
# ``other`` covers things like games / ISOs / archives sitting on the
|
||||||
|
# downloads folder. ``unknown`` does NOT belong here — those need a
|
||||||
|
# user decision, not a skip.
|
||||||
|
_SKIPPABLE_MEDIA_TYPES = frozenset({"other"})
|
||||||
|
|
||||||
|
# Roads that signal the parser couldn't reach a confident answer on its
|
||||||
|
# own. ``Road`` values are kept as strings on the report to avoid a
|
||||||
|
# cross-package import here.
|
||||||
|
_ASK_USER_ROADS = frozenset({"path_of_pain"})
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class InspectedResult:
|
||||||
|
"""The full picture of a release: parsed name + filesystem reality.
|
||||||
|
|
||||||
|
Bundles everything the downstream pipeline needs after a single
|
||||||
|
inspection pass:
|
||||||
|
|
||||||
|
- ``parsed`` — :class:`ParsedRelease`, with ``media_type`` already
|
||||||
|
refined by :func:`detect_media_type` and ``None`` tech fields
|
||||||
|
filled in by :func:`enrich_from_probe` when a probe ran.
|
||||||
|
- ``report`` — :class:`ParseReport` from the parser (confidence +
|
||||||
|
road, untouched by inspection).
|
||||||
|
- ``source_path`` — the path the inspector was pointed at (file or
|
||||||
|
folder), as supplied by the caller.
|
||||||
|
- ``main_video`` — the canonical video file inside ``source_path``,
|
||||||
|
or ``None`` if no eligible file was found.
|
||||||
|
- ``media_info`` — the :class:`MediaInfo` snapshot when a probe
|
||||||
|
succeeded; ``None`` when no video was probed (no main video, or
|
||||||
|
``media_type`` in ``{"unknown", "other"}``) or when ffprobe
|
||||||
|
failed.
|
||||||
|
- ``probe_used`` — ``True`` iff ``media_info`` is non-``None`` and
|
||||||
|
``enrich_from_probe`` actually ran. Explicit flag so callers
|
||||||
|
don't have to re-derive the condition.
|
||||||
|
- ``recommended_action`` — derived hint for the orchestrator (see
|
||||||
|
property docstring). Encodes the exclusion / clarification /
|
||||||
|
go-ahead decision in one place so downstream callers don't
|
||||||
|
re-implement the same checks.
|
||||||
|
"""
|
||||||
|
|
||||||
|
parsed: ParsedRelease
|
||||||
|
report: ParseReport
|
||||||
|
source_path: Path
|
||||||
|
main_video: Path | None
|
||||||
|
media_info: MediaInfo | None
|
||||||
|
probe_used: bool
|
||||||
|
|
||||||
|
@property
|
||||||
|
def recommended_action(self) -> str:
|
||||||
|
"""Return one of ``"skip"`` / ``"ask_user"`` / ``"process"``.
|
||||||
|
|
||||||
|
- ``"skip"`` — nothing to organize:
|
||||||
|
* the source has no main video file, **or**
|
||||||
|
* ``media_type`` is ``"other"`` (games / ISOs / archives).
|
||||||
|
- ``"ask_user"`` — a decision is required before any action:
|
||||||
|
* ``media_type`` is ``"unknown"`` (parser couldn't classify), **or**
|
||||||
|
* the parse landed on ``Road.PATH_OF_PAIN``
|
||||||
|
(low-confidence, malformed name, etc.).
|
||||||
|
- ``"process"`` — everything else: a confident parse with a
|
||||||
|
usable media type and a main video on disk. The orchestrator
|
||||||
|
can move straight to the planning step.
|
||||||
|
|
||||||
|
The check ordering matters: ``"skip"`` wins over ``"ask_user"``
|
||||||
|
because if there's no video to organize, no question to the
|
||||||
|
user can change that. ``"ask_user"`` then wins over
|
||||||
|
``"process"`` because a confident parse alone isn't enough if
|
||||||
|
the type or road still flag uncertainty.
|
||||||
|
"""
|
||||||
|
if self.main_video is None:
|
||||||
|
return "skip"
|
||||||
|
if self.parsed.media_type.value in _SKIPPABLE_MEDIA_TYPES:
|
||||||
|
return "skip"
|
||||||
|
if self.parsed.media_type.value == "unknown":
|
||||||
|
return "ask_user"
|
||||||
|
if self.report.road in _ASK_USER_ROADS:
|
||||||
|
return "ask_user"
|
||||||
|
return "process"
|
||||||
|
|
||||||
|
|
||||||
|
def inspect_release(
|
||||||
|
release_name: str,
|
||||||
|
source_path: Path,
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
prober: MediaProber,
|
||||||
|
) -> InspectedResult:
|
||||||
|
"""Run the full inspection pipeline on ``release_name`` /
|
||||||
|
``source_path``.
|
||||||
|
|
||||||
|
See module docstring for the four-step flow. ``kb`` and ``prober``
|
||||||
|
are injected so the caller controls the knowledge base layering
|
||||||
|
and the probe adapter (real ffprobe in production, stubs in tests).
|
||||||
|
|
||||||
|
Never raises. A missing or unreadable ``source_path`` simply
|
||||||
|
results in ``main_video=None`` and ``media_info=None``.
|
||||||
|
"""
|
||||||
|
parsed, report = parse_release(release_name, kb)
|
||||||
|
|
||||||
|
# Step 2: refine media_type from the on-disk extension mix.
|
||||||
|
# detect_media_type tolerates non-existent paths (returns parsed.media_type
|
||||||
|
# untouched), so no need to guard here. ParsedRelease is frozen — use
|
||||||
|
# dataclasses.replace to rebind with the refined value.
|
||||||
|
refined_media_type = MediaTypeToken(detect_media_type(parsed, source_path, kb))
|
||||||
|
if refined_media_type != parsed.media_type:
|
||||||
|
parsed = replace(parsed, media_type=refined_media_type)
|
||||||
|
|
||||||
|
# Step 3: pick the canonical main video (top-level scan only).
|
||||||
|
main_video = find_main_video(source_path, kb)
|
||||||
|
|
||||||
|
# Step 4: probe + enrich, when it makes sense.
|
||||||
|
media_info: MediaInfo | None = None
|
||||||
|
probe_used = False
|
||||||
|
if main_video is not None and parsed.media_type not in _NON_PROBABLE_MEDIA_TYPES:
|
||||||
|
media_info = prober.probe(main_video)
|
||||||
|
if media_info is not None:
|
||||||
|
parsed = enrich_from_probe(parsed, media_info, kb)
|
||||||
|
probe_used = True
|
||||||
|
|
||||||
|
return InspectedResult(
|
||||||
|
parsed=parsed,
|
||||||
|
report=report,
|
||||||
|
source_path=source_path,
|
||||||
|
main_video=main_video,
|
||||||
|
media_info=media_info,
|
||||||
|
probe_used=probe_used,
|
||||||
|
)
|
||||||
@@ -0,0 +1,74 @@
|
|||||||
|
"""Pre-pipeline exclusion — decide which files are worth parsing.
|
||||||
|
|
||||||
|
These helpers live one notch above the domain: they touch the
|
||||||
|
filesystem (``Path.iterdir``, ``Path.suffix``) but carry no parsing
|
||||||
|
logic of their own. The goal is to filter out non-video files and pick
|
||||||
|
the canonical "main video" from a release folder *before* anything
|
||||||
|
hits :func:`~alfred.domain.release.parse_release`.
|
||||||
|
|
||||||
|
Design notes (Phase A bis, 2026-05-20):
|
||||||
|
|
||||||
|
- **Extension is the sole eligibility criterion.** A file is supported
|
||||||
|
iff its suffix is in ``kb.video_extensions``. No size threshold, no
|
||||||
|
filename heuristics ("sample", "trailer", …). If a release packs a
|
||||||
|
bloated featurette or names its sample alphabetically before the
|
||||||
|
main feature, that's PATH_OF_PAIN territory — not this layer's job.
|
||||||
|
|
||||||
|
- **Top-level scan only.** ``find_main_video`` does not descend into
|
||||||
|
subdirectories. Releases that wrap the main video in ``Sample/`` or
|
||||||
|
similar are non-scene-standard and handled by the orchestrator
|
||||||
|
upstream.
|
||||||
|
|
||||||
|
- **Lexicographic tie-break.** When several candidates qualify
|
||||||
|
(legitimate for season packs), we return the first by alphabetical
|
||||||
|
order. Deterministic, no size-based ranking.
|
||||||
|
|
||||||
|
- **Direct ``Path`` I/O.** No ``FilesystemScanner`` port — this layer
|
||||||
|
is application, not domain. If isolation becomes necessary for
|
||||||
|
testing scale, we'll introduce a port then.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from alfred.domain.releases_TO_CHECK.ports.knowledge import ReleaseKnowledge
|
||||||
|
|
||||||
|
|
||||||
|
def is_supported_video(path: Path, kb: ReleaseKnowledge) -> bool:
|
||||||
|
"""Return True when ``path`` is a video file the parser should
|
||||||
|
consider.
|
||||||
|
|
||||||
|
The check is purely extension-based: ``path.suffix.lower()`` must
|
||||||
|
belong to ``kb.video_extensions``. ``path`` must also be a regular
|
||||||
|
file — directories and broken symlinks return False.
|
||||||
|
"""
|
||||||
|
if not path.is_file():
|
||||||
|
return False
|
||||||
|
return path.suffix.lower() in kb.video_extensions
|
||||||
|
|
||||||
|
|
||||||
|
def find_main_video(folder: Path, kb: ReleaseKnowledge) -> Path | None:
|
||||||
|
"""Return the canonical main video file inside ``folder``, or
|
||||||
|
``None`` if there isn't one.
|
||||||
|
|
||||||
|
Behavior:
|
||||||
|
|
||||||
|
- Top-level scan only — subdirectories are ignored.
|
||||||
|
- Eligibility is :func:`is_supported_video`.
|
||||||
|
- When several files qualify, the lexicographically first one wins.
|
||||||
|
- When ``folder`` itself is a video file, it is returned as-is
|
||||||
|
(single-file releases are valid).
|
||||||
|
- When ``folder`` doesn't exist or isn't a directory (and isn't a
|
||||||
|
video file either), returns ``None``.
|
||||||
|
"""
|
||||||
|
if folder.is_file():
|
||||||
|
return folder if is_supported_video(folder, kb) else None
|
||||||
|
|
||||||
|
if not folder.is_dir():
|
||||||
|
return None
|
||||||
|
|
||||||
|
candidates = sorted(
|
||||||
|
child for child in folder.iterdir() if is_supported_video(child, kb)
|
||||||
|
)
|
||||||
|
return candidates[0] if candidates else None
|
||||||
+7
-7
@@ -5,13 +5,13 @@ import os
|
|||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
from alfred.domain.subtitles.entities import SubtitleCandidate
|
from alfred.domain.subtitles_TO_CHECK.entities import SubtitleScanResult
|
||||||
from alfred.domain.subtitles.value_objects import SubtitleType
|
from alfred.domain.subtitles_TO_CHECK.value_objects import SubtitleType
|
||||||
|
|
||||||
logger = logging.getLogger(__name__)
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
def _build_dest_name(track: SubtitleCandidate, video_stem: str) -> str:
|
def _build_dest_name(track: SubtitleScanResult, video_stem: str) -> str:
|
||||||
"""
|
"""
|
||||||
Build the destination filename for a subtitle track.
|
Build the destination filename for a subtitle track.
|
||||||
|
|
||||||
@@ -41,7 +41,7 @@ class PlacedTrack:
|
|||||||
@dataclass
|
@dataclass
|
||||||
class PlaceResult:
|
class PlaceResult:
|
||||||
placed: list[PlacedTrack]
|
placed: list[PlacedTrack]
|
||||||
skipped: list[tuple[SubtitleCandidate, str]] # (track, reason)
|
skipped: list[tuple[SubtitleScanResult, str]] # (track, reason)
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def placed_count(self) -> int:
|
def placed_count(self) -> int:
|
||||||
@@ -54,7 +54,7 @@ class PlaceResult:
|
|||||||
|
|
||||||
class SubtitlePlacer:
|
class SubtitlePlacer:
|
||||||
"""
|
"""
|
||||||
Hard-links matched SubtitleCandidate files next to a destination video.
|
Hard-links matched SubtitleScanResult files next to a destination video.
|
||||||
|
|
||||||
Uses the same hard-link strategy as FileManager.copy_file:
|
Uses the same hard-link strategy as FileManager.copy_file:
|
||||||
instant, no data duplication, qBittorrent keeps seeding.
|
instant, no data duplication, qBittorrent keeps seeding.
|
||||||
@@ -64,11 +64,11 @@ class SubtitlePlacer:
|
|||||||
|
|
||||||
def place(
|
def place(
|
||||||
self,
|
self,
|
||||||
tracks: list[SubtitleCandidate],
|
tracks: list[SubtitleScanResult],
|
||||||
destination_video: Path,
|
destination_video: Path,
|
||||||
) -> PlaceResult:
|
) -> PlaceResult:
|
||||||
placed: list[PlacedTrack] = []
|
placed: list[PlacedTrack] = []
|
||||||
skipped: list[tuple[SubtitleCandidate, str]] = []
|
skipped: list[tuple[SubtitleScanResult, str]] = []
|
||||||
|
|
||||||
dest_dir = destination_video.parent
|
dest_dir = destination_video.parent
|
||||||
|
|
||||||
+1
-1
@@ -2,7 +2,7 @@
|
|||||||
|
|
||||||
import logging
|
import logging
|
||||||
|
|
||||||
from alfred.infrastructure.api.qbittorrent import (
|
from alfred.infrastructure.api_TO_CHECK.qbittorrent import (
|
||||||
QBittorrentAPIError,
|
QBittorrentAPIError,
|
||||||
QBittorrentAuthError,
|
QBittorrentAuthError,
|
||||||
QBittorrentClient,
|
QBittorrentClient,
|
||||||
+1
-1
@@ -2,7 +2,7 @@
|
|||||||
|
|
||||||
import logging
|
import logging
|
||||||
|
|
||||||
from alfred.infrastructure.api.knaben import (
|
from alfred.infrastructure.api_TO_CHECK.knaben import (
|
||||||
KnabenAPIError,
|
KnabenAPIError,
|
||||||
KnabenClient,
|
KnabenClient,
|
||||||
KnabenNotFoundError,
|
KnabenNotFoundError,
|
||||||
@@ -0,0 +1,21 @@
|
|||||||
|
"""TV-show orchestrators — operate on the Alfred-managed TV library tree.
|
||||||
|
|
||||||
|
The TV library is a directory of show folders (one per TV show), each
|
||||||
|
holding season folders containing video files. Modules here walk this
|
||||||
|
tree and reconstruct on-disk :class:`SeriesRelease` aggregates by
|
||||||
|
reusing the existing release pipeline (``inspect_release``) rather
|
||||||
|
than duplicating its parse/probe logic.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from .dto import SearchShowResponse, ShowHit
|
||||||
|
from .search_show import SearchShowUseCase
|
||||||
|
from .walker import SeasonFolder, ShowTree, walk_show
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"SearchShowResponse",
|
||||||
|
"SearchShowUseCase",
|
||||||
|
"SeasonFolder",
|
||||||
|
"ShowHit",
|
||||||
|
"ShowTree",
|
||||||
|
"walk_show",
|
||||||
|
]
|
||||||
@@ -0,0 +1,39 @@
|
|||||||
|
"""TV show application DTOs."""
|
||||||
|
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class ShowHit:
|
||||||
|
"""One TV-show hit, flattened for transport to the agent."""
|
||||||
|
|
||||||
|
tmdb_id: int
|
||||||
|
name: str
|
||||||
|
first_air_year: int | None = None
|
||||||
|
|
||||||
|
def to_dict(self) -> dict:
|
||||||
|
out: dict = {"tmdb_id": self.tmdb_id, "name": self.name}
|
||||||
|
if self.first_air_year is not None:
|
||||||
|
out["first_air_year"] = self.first_air_year
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SearchShowResponse:
|
||||||
|
"""Response from searching for a TV show."""
|
||||||
|
|
||||||
|
status: str
|
||||||
|
hits: list[ShowHit] = field(default_factory=list)
|
||||||
|
error: str | None = None
|
||||||
|
message: str | None = None
|
||||||
|
|
||||||
|
def to_dict(self):
|
||||||
|
result: dict = {"status": self.status}
|
||||||
|
|
||||||
|
if self.error:
|
||||||
|
result["error"] = self.error
|
||||||
|
result["message"] = self.message
|
||||||
|
else:
|
||||||
|
result["hits"] = [h.to_dict() for h in self.hits]
|
||||||
|
|
||||||
|
return result
|
||||||
@@ -0,0 +1,59 @@
|
|||||||
|
"""Search TV show use case."""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
|
||||||
|
from alfred.infrastructure.api_TO_CHECK.tmdb import (
|
||||||
|
TMDBAPIError,
|
||||||
|
TMDBClient,
|
||||||
|
TMDBConfigurationError,
|
||||||
|
)
|
||||||
|
|
||||||
|
from .dto import SearchShowResponse, ShowHit
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
class SearchShowUseCase:
|
||||||
|
"""List TV shows matching a free-text query via TMDB ``/search/tv``.
|
||||||
|
|
||||||
|
Symmetric to :class:`alfred.application.movies.SearchMovieUseCase`:
|
||||||
|
thin orchestrator, flattens domain VOs into agent-friendly
|
||||||
|
primitives, no ``imdb_id`` enrichment (caller follows up with
|
||||||
|
:meth:`TMDBClient.get_tv_show_info` on a chosen ``tmdb_id``).
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, tmdb_client: TMDBClient):
|
||||||
|
self.tmdb_client = tmdb_client
|
||||||
|
|
||||||
|
def execute(self, show_title: str) -> SearchShowResponse:
|
||||||
|
try:
|
||||||
|
results = self.tmdb_client.search_shows(show_title)
|
||||||
|
|
||||||
|
hits = [
|
||||||
|
ShowHit(
|
||||||
|
tmdb_id=r.tmdb_id.value,
|
||||||
|
name=r.name,
|
||||||
|
first_air_year=r.first_air_year,
|
||||||
|
)
|
||||||
|
for r in results
|
||||||
|
]
|
||||||
|
logger.info(f"search_shows({show_title!r}) → {len(hits)} hits")
|
||||||
|
return SearchShowResponse(status="ok", hits=hits)
|
||||||
|
|
||||||
|
except TMDBConfigurationError as e:
|
||||||
|
logger.error(f"TMDB configuration error: {e}")
|
||||||
|
return SearchShowResponse(
|
||||||
|
status="error", error="configuration_error", message=str(e)
|
||||||
|
)
|
||||||
|
|
||||||
|
except TMDBAPIError as e:
|
||||||
|
logger.error(f"TMDB API error: {e}")
|
||||||
|
return SearchShowResponse(
|
||||||
|
status="error", error="api_error", message=str(e)
|
||||||
|
)
|
||||||
|
|
||||||
|
except ValueError as e:
|
||||||
|
logger.error(f"Validation error: {e}")
|
||||||
|
return SearchShowResponse(
|
||||||
|
status="error", error="validation_failed", message=str(e)
|
||||||
|
)
|
||||||
@@ -0,0 +1,208 @@
|
|||||||
|
"""Show tree walker — minimal filesystem traversal of a TV show folder.
|
||||||
|
|
||||||
|
The walker is intentionally dumb: it lists season folders, classifies
|
||||||
|
each one as PACK or EPISODIC by **inspecting its filesystem
|
||||||
|
structure**, and hands the orchestrator a flat list of video files
|
||||||
|
per season. It does not parse release names, run ffprobe, or
|
||||||
|
classify subtitle files. All of that intelligence lives in the
|
||||||
|
existing release pipeline (``inspect_release`` + downstream
|
||||||
|
services); the walker just hands the orchestrator the paths to feed
|
||||||
|
into that pipeline.
|
||||||
|
|
||||||
|
Folder convention
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
Inside an Alfred-managed library, a show root looks like::
|
||||||
|
|
||||||
|
Foundation/
|
||||||
|
Foundation.S01.1080p.WEB-DL.x265-GROUP/ ← PACK season
|
||||||
|
Foundation.S01E01.1080p.WEB-DL.x265.mkv ← flat video
|
||||||
|
Foundation.S01E02.1080p.WEB-DL.x265.mkv
|
||||||
|
...
|
||||||
|
Foundation.S02/ ← EPISODIC season
|
||||||
|
Foundation.S02E01.1080p.WEB-DL.x265-GROUP/ ← episode subfolder
|
||||||
|
Foundation.S02E01.1080p.WEB-DL.x265-GROUP.mkv
|
||||||
|
Foundation.S02E02.1080p.WEB-DL.x265-OTHER/
|
||||||
|
Foundation.S02E02.1080p.WEB-DL.x265-OTHER.mkv
|
||||||
|
|
||||||
|
The walker recognizes a season folder by a ``Sxx`` token anywhere in
|
||||||
|
its name (case-insensitive). It does **not** care about Plex-style
|
||||||
|
names (``Season 01``, ``Specials``) — the Alfred library uses
|
||||||
|
release-style folder names only.
|
||||||
|
|
||||||
|
PACK vs EPISODIC is a **structural distinction**, not a naming one:
|
||||||
|
|
||||||
|
* **PACK** — season folder contains N flat video files. No
|
||||||
|
subfolders.
|
||||||
|
* **EPISODIC** — season folder contains N subfolders, each holding
|
||||||
|
exactly one video.
|
||||||
|
|
||||||
|
A season folder that mixes the two layouts (some flat videos AND
|
||||||
|
some subfolders) is malformed: the walker reports
|
||||||
|
``mode=None`` and an empty ``video_files`` tuple so the
|
||||||
|
orchestrator can warn and skip it.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import re
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from alfred.domain.releases_TO_CHECK.ports import ReleaseKnowledge
|
||||||
|
from alfred.domain.releases_TO_CHECK.value_objects import ReleaseMode
|
||||||
|
from alfred.domain.shared_TO_CHECK.ports import FilesystemScanner
|
||||||
|
|
||||||
|
_LOG = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
# Matches any ``Sxx`` token (1-2 digits) bounded by non-alphanumerics.
|
||||||
|
# Examples that match: ``Foundation.S01.1080p`` , ``S2.Pack`` , ``BBC.s10.bluray``.
|
||||||
|
# Examples that don't: ``Sample`` , ``Soundtrack`` , ``2024.S0E1`` (no S+digits boundary).
|
||||||
|
_SEASON_TOKEN_RE = re.compile(r"(?<![A-Za-z0-9])s(\d{1,2})(?![A-Za-z0-9])", re.IGNORECASE)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class SeasonFolder:
|
||||||
|
"""One season folder discovered inside a show root.
|
||||||
|
|
||||||
|
``mode`` is set by the walker from the FS structure:
|
||||||
|
|
||||||
|
* :attr:`ReleaseMode.PACK` — ``video_files`` lists the season
|
||||||
|
folder's flat videos.
|
||||||
|
* :attr:`ReleaseMode.EPISODIC` — ``video_files`` lists each
|
||||||
|
episode subfolder's single video.
|
||||||
|
* ``None`` — the folder is empty, malformed (mixed layout), or
|
||||||
|
otherwise unclassifiable. ``video_files`` is empty. The
|
||||||
|
orchestrator decides whether to warn/skip.
|
||||||
|
"""
|
||||||
|
|
||||||
|
season_dir: Path
|
||||||
|
mode: ReleaseMode | None
|
||||||
|
video_files: tuple[Path, ...]
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class ShowTree:
|
||||||
|
"""The full structural snapshot of a show on disk."""
|
||||||
|
|
||||||
|
show_root: Path
|
||||||
|
season_folders: tuple[SeasonFolder, ...]
|
||||||
|
|
||||||
|
|
||||||
|
def walk_show(
|
||||||
|
show_root: Path,
|
||||||
|
*,
|
||||||
|
scanner: FilesystemScanner,
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
) -> ShowTree:
|
||||||
|
"""Walk ``show_root`` and return its structural tree.
|
||||||
|
|
||||||
|
The walker:
|
||||||
|
|
||||||
|
* lists direct children of ``show_root``,
|
||||||
|
* keeps the directories whose name contains a ``Sxx`` token,
|
||||||
|
* classifies each season folder as PACK / EPISODIC / unknown by
|
||||||
|
inspecting its direct children (videos vs subfolders),
|
||||||
|
* for EPISODIC, descends one extra level into each episode
|
||||||
|
subfolder to collect its single video,
|
||||||
|
* sorts season folders by name and video files by name within
|
||||||
|
each folder.
|
||||||
|
|
||||||
|
The walker never raises — empty / unreadable / malformed
|
||||||
|
directories surface as a ``SeasonFolder`` with ``mode=None`` and
|
||||||
|
an empty ``video_files`` tuple.
|
||||||
|
"""
|
||||||
|
video_exts = {ext.lower() for ext in kb.video_extensions}
|
||||||
|
season_folders: list[SeasonFolder] = []
|
||||||
|
for entry in scanner.scan_dir(show_root):
|
||||||
|
if not entry.is_dir or not _SEASON_TOKEN_RE.search(entry.name):
|
||||||
|
continue
|
||||||
|
season_folders.append(
|
||||||
|
_classify_season(entry.path, scanner=scanner, video_exts=video_exts)
|
||||||
|
)
|
||||||
|
return ShowTree(
|
||||||
|
show_root=show_root, season_folders=tuple(season_folders)
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# --------------------------------------------------------------------------- #
|
||||||
|
# Season-folder classification #
|
||||||
|
# --------------------------------------------------------------------------- #
|
||||||
|
|
||||||
|
|
||||||
|
def _classify_season(
|
||||||
|
season_dir: Path,
|
||||||
|
*,
|
||||||
|
scanner: FilesystemScanner,
|
||||||
|
video_exts: set[str],
|
||||||
|
) -> SeasonFolder:
|
||||||
|
"""Inspect one season folder and decide PACK / EPISODIC / unknown.
|
||||||
|
|
||||||
|
Looks only at direct children. For EPISODIC, descends one extra
|
||||||
|
level into each subfolder to collect its single video. Mixed
|
||||||
|
layouts (flat videos + subfolders) are reported as ``mode=None``
|
||||||
|
so the orchestrator can skip them with a warning.
|
||||||
|
"""
|
||||||
|
flat_videos: list[Path] = []
|
||||||
|
subdirs: list[Path] = []
|
||||||
|
for child in scanner.scan_dir(season_dir):
|
||||||
|
if child.is_file and child.suffix.lower() in video_exts:
|
||||||
|
flat_videos.append(child.path)
|
||||||
|
elif child.is_dir:
|
||||||
|
subdirs.append(child.path)
|
||||||
|
# Anything else (non-video files like .nfo, .srt at the season
|
||||||
|
# root) is ignored — it doesn't affect classification.
|
||||||
|
|
||||||
|
has_flat = bool(flat_videos)
|
||||||
|
has_subdirs = bool(subdirs)
|
||||||
|
|
||||||
|
if has_flat and has_subdirs:
|
||||||
|
_LOG.warning(
|
||||||
|
"walker: season folder %s mixes flat videos and subfolders — "
|
||||||
|
"malformed layout, skipping",
|
||||||
|
season_dir,
|
||||||
|
)
|
||||||
|
return SeasonFolder(season_dir=season_dir, mode=None, video_files=())
|
||||||
|
|
||||||
|
if has_flat:
|
||||||
|
return SeasonFolder(
|
||||||
|
season_dir=season_dir,
|
||||||
|
mode=ReleaseMode.PACK,
|
||||||
|
video_files=tuple(sorted(flat_videos)),
|
||||||
|
)
|
||||||
|
|
||||||
|
if has_subdirs:
|
||||||
|
episode_videos: list[Path] = []
|
||||||
|
for sub in sorted(subdirs):
|
||||||
|
videos_in_sub = [
|
||||||
|
child.path
|
||||||
|
for child in scanner.scan_dir(sub)
|
||||||
|
if child.is_file and child.suffix.lower() in video_exts
|
||||||
|
]
|
||||||
|
if len(videos_in_sub) == 0:
|
||||||
|
_LOG.warning(
|
||||||
|
"walker: episode subfolder %s contains no video — skipping",
|
||||||
|
sub,
|
||||||
|
)
|
||||||
|
continue
|
||||||
|
if len(videos_in_sub) > 1:
|
||||||
|
_LOG.warning(
|
||||||
|
"walker: episode subfolder %s contains %d videos — "
|
||||||
|
"malformed, skipping season %s",
|
||||||
|
sub,
|
||||||
|
len(videos_in_sub),
|
||||||
|
season_dir,
|
||||||
|
)
|
||||||
|
return SeasonFolder(
|
||||||
|
season_dir=season_dir, mode=None, video_files=()
|
||||||
|
)
|
||||||
|
episode_videos.append(videos_in_sub[0])
|
||||||
|
return SeasonFolder(
|
||||||
|
season_dir=season_dir,
|
||||||
|
mode=ReleaseMode.EPISODIC,
|
||||||
|
video_files=tuple(episode_videos),
|
||||||
|
)
|
||||||
|
|
||||||
|
# No flat videos, no subdirs → empty season folder.
|
||||||
|
return SeasonFolder(season_dir=season_dir, mode=None, video_files=())
|
||||||
@@ -1,104 +0,0 @@
|
|||||||
"""Movie domain entities."""
|
|
||||||
|
|
||||||
from dataclasses import dataclass, field
|
|
||||||
from datetime import datetime
|
|
||||||
|
|
||||||
from ..shared.media import AudioTrack, MediaWithTracks, SubtitleTrack
|
|
||||||
from ..shared.value_objects import FilePath, FileSize, ImdbId
|
|
||||||
from .value_objects import MovieTitle, Quality, ReleaseYear
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass(eq=False)
|
|
||||||
class Movie(MediaWithTracks):
|
|
||||||
"""
|
|
||||||
Movie aggregate root for the movies domain.
|
|
||||||
|
|
||||||
Carries file metadata (path, size) and the tracks discovered by the
|
|
||||||
ffprobe + subtitle scan pipeline. The track lists may be empty when the
|
|
||||||
movie is known but not yet scanned, or when no file is downloaded.
|
|
||||||
|
|
||||||
Track helpers follow the same "C+" contract as ``Episode``: pass a
|
|
||||||
``Language`` for cross-format matching, or a ``str`` for case-insensitive
|
|
||||||
direct comparison.
|
|
||||||
|
|
||||||
Equality is identity-based: two ``Movie`` instances are equal iff they
|
|
||||||
share the same ``imdb_id``, regardless of file/track contents. This is
|
|
||||||
the DDD aggregate invariant — the aggregate is identified by its root id.
|
|
||||||
"""
|
|
||||||
|
|
||||||
imdb_id: ImdbId
|
|
||||||
title: MovieTitle
|
|
||||||
release_year: ReleaseYear | None = None
|
|
||||||
quality: Quality = Quality.UNKNOWN
|
|
||||||
file_path: FilePath | None = None
|
|
||||||
file_size: FileSize | None = None
|
|
||||||
tmdb_id: int | None = None
|
|
||||||
added_at: datetime = field(default_factory=datetime.now)
|
|
||||||
audio_tracks: list[AudioTrack] = field(default_factory=list)
|
|
||||||
subtitle_tracks: list[SubtitleTrack] = field(default_factory=list)
|
|
||||||
|
|
||||||
def __post_init__(self):
|
|
||||||
"""Validate movie entity."""
|
|
||||||
# Ensure ImdbId is actually an ImdbId instance
|
|
||||||
if not isinstance(self.imdb_id, ImdbId):
|
|
||||||
if isinstance(self.imdb_id, str):
|
|
||||||
self.imdb_id = ImdbId(self.imdb_id)
|
|
||||||
else:
|
|
||||||
raise ValueError(
|
|
||||||
f"imdb_id must be ImdbId or str, got {type(self.imdb_id)}"
|
|
||||||
)
|
|
||||||
|
|
||||||
# Ensure MovieTitle is actually a MovieTitle instance
|
|
||||||
if not isinstance(self.title, MovieTitle):
|
|
||||||
if isinstance(self.title, str):
|
|
||||||
self.title = MovieTitle(self.title)
|
|
||||||
else:
|
|
||||||
raise ValueError(
|
|
||||||
f"title must be MovieTitle or str, got {type(self.title)}"
|
|
||||||
)
|
|
||||||
|
|
||||||
def __eq__(self, other: object) -> bool:
|
|
||||||
if not isinstance(other, Movie):
|
|
||||||
return NotImplemented
|
|
||||||
return self.imdb_id == other.imdb_id
|
|
||||||
|
|
||||||
def __hash__(self) -> int:
|
|
||||||
return hash(self.imdb_id)
|
|
||||||
|
|
||||||
# Track helpers (has_audio_in / audio_languages / has_subtitles_in /
|
|
||||||
# has_forced_subs / subtitle_languages) come from MediaWithTracks.
|
|
||||||
|
|
||||||
def get_folder_name(self) -> str:
|
|
||||||
"""
|
|
||||||
Get the folder name for this movie.
|
|
||||||
|
|
||||||
Format: "Title (Year)"
|
|
||||||
Example: "Inception (2010)"
|
|
||||||
"""
|
|
||||||
if self.release_year:
|
|
||||||
return f"{self.title.value} ({self.release_year.value})"
|
|
||||||
return self.title.value
|
|
||||||
|
|
||||||
def get_filename(self) -> str:
|
|
||||||
"""
|
|
||||||
Get the suggested filename for this movie.
|
|
||||||
|
|
||||||
Format: "Title.Year.Quality.ext"
|
|
||||||
Example: "Inception.2010.1080p.mkv"
|
|
||||||
"""
|
|
||||||
parts = [self.title.normalized()]
|
|
||||||
|
|
||||||
if self.release_year:
|
|
||||||
parts.append(str(self.release_year.value))
|
|
||||||
|
|
||||||
if self.quality != Quality.UNKNOWN:
|
|
||||||
parts.append(self.quality.value)
|
|
||||||
|
|
||||||
# Extension will be added based on actual file
|
|
||||||
return ".".join(parts)
|
|
||||||
|
|
||||||
def __str__(self) -> str:
|
|
||||||
return f"{self.title.value} ({self.release_year.value if self.release_year else 'Unknown'})"
|
|
||||||
|
|
||||||
def __repr__(self) -> str:
|
|
||||||
return f"Movie(imdb_id={self.imdb_id}, title='{self.title.value}')"
|
|
||||||
@@ -1,73 +0,0 @@
|
|||||||
"""Movie repository interfaces (abstract)."""
|
|
||||||
|
|
||||||
from abc import ABC, abstractmethod
|
|
||||||
|
|
||||||
from ..shared.value_objects import ImdbId
|
|
||||||
from .entities import Movie
|
|
||||||
|
|
||||||
|
|
||||||
class MovieRepository(ABC):
|
|
||||||
"""
|
|
||||||
Abstract repository for movie persistence.
|
|
||||||
|
|
||||||
This defines the interface that infrastructure implementations must follow.
|
|
||||||
"""
|
|
||||||
|
|
||||||
@abstractmethod
|
|
||||||
def save(self, movie: Movie) -> None:
|
|
||||||
"""
|
|
||||||
Save a movie to the repository.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
movie: Movie entity to save
|
|
||||||
"""
|
|
||||||
pass
|
|
||||||
|
|
||||||
@abstractmethod
|
|
||||||
def find_by_imdb_id(self, imdb_id: ImdbId) -> Movie | None:
|
|
||||||
"""
|
|
||||||
Find a movie by its IMDb ID.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
imdb_id: IMDb ID to search for
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
Movie if found, None otherwise
|
|
||||||
"""
|
|
||||||
pass
|
|
||||||
|
|
||||||
@abstractmethod
|
|
||||||
def find_all(self) -> list[Movie]:
|
|
||||||
"""
|
|
||||||
Get all movies in the repository.
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
List of all movies
|
|
||||||
"""
|
|
||||||
pass
|
|
||||||
|
|
||||||
@abstractmethod
|
|
||||||
def delete(self, imdb_id: ImdbId) -> bool:
|
|
||||||
"""
|
|
||||||
Delete a movie from the repository.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
imdb_id: IMDb ID of the movie to delete
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
True if deleted, False if not found
|
|
||||||
"""
|
|
||||||
pass
|
|
||||||
|
|
||||||
@abstractmethod
|
|
||||||
def exists(self, imdb_id: ImdbId) -> bool:
|
|
||||||
"""
|
|
||||||
Check if a movie exists in the repository.
|
|
||||||
|
|
||||||
Args:
|
|
||||||
imdb_id: IMDb ID to check
|
|
||||||
|
|
||||||
Returns:
|
|
||||||
True if exists, False otherwise
|
|
||||||
"""
|
|
||||||
pass
|
|
||||||
@@ -0,0 +1,91 @@
|
|||||||
|
"""Movie domain entities."""
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
|
||||||
|
from ..shared_TO_CHECK.value_objects import ImdbId, TmdbId
|
||||||
|
from .value_objects import MovieTitle, ReleaseYear
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True, eq=False)
|
||||||
|
class Movie:
|
||||||
|
"""
|
||||||
|
Movie aggregate root for the movies domain.
|
||||||
|
|
||||||
|
TMDB-only aggregate: carries identity (``tmdb_id`` + optional
|
||||||
|
``imdb_id``) plus the catalog facts that come from TMDB (``title``,
|
||||||
|
``release_year``). Filesystem-side concerns (file path, quality,
|
||||||
|
tracks, ``added_at``) live on :class:`alfred.domain.releases.entities.
|
||||||
|
MovieRelease`, the per-movie release aggregate persisted alongside.
|
||||||
|
|
||||||
|
Frozen: rebuild via ``dataclasses.replace`` to project metadata
|
||||||
|
updates (e.g. a TMDB refresh) onto a new instance.
|
||||||
|
|
||||||
|
Equality is identity-based on ``tmdb_id``: two ``Movie`` instances
|
||||||
|
are equal iff they share the same primary key. ``imdb_id`` is a
|
||||||
|
secondary anchor and not part of the identity.
|
||||||
|
"""
|
||||||
|
|
||||||
|
tmdb_id: TmdbId
|
||||||
|
title: MovieTitle
|
||||||
|
imdb_id: ImdbId | None = None
|
||||||
|
release_year: ReleaseYear | None = None
|
||||||
|
|
||||||
|
def __post_init__(self) -> None:
|
||||||
|
if not isinstance(self.tmdb_id, TmdbId):
|
||||||
|
raise ValueError(
|
||||||
|
f"tmdb_id must be TmdbId, got {type(self.tmdb_id)}"
|
||||||
|
)
|
||||||
|
if not isinstance(self.title, MovieTitle):
|
||||||
|
if isinstance(self.title, str):
|
||||||
|
object.__setattr__(self, "title", MovieTitle(self.title))
|
||||||
|
else:
|
||||||
|
raise ValueError(
|
||||||
|
f"title must be MovieTitle or str, got {type(self.title)}"
|
||||||
|
)
|
||||||
|
if self.imdb_id is not None and not isinstance(self.imdb_id, ImdbId):
|
||||||
|
raise ValueError(
|
||||||
|
f"imdb_id must be ImdbId or None, got {type(self.imdb_id)}"
|
||||||
|
)
|
||||||
|
|
||||||
|
def __eq__(self, other: object) -> bool:
|
||||||
|
if not isinstance(other, Movie):
|
||||||
|
return NotImplemented
|
||||||
|
return self.tmdb_id == other.tmdb_id
|
||||||
|
|
||||||
|
def __hash__(self) -> int:
|
||||||
|
return hash(self.tmdb_id)
|
||||||
|
|
||||||
|
# WRONG
|
||||||
|
def get_folder_name(self) -> str:
|
||||||
|
"""
|
||||||
|
Get the folder name for this movie.
|
||||||
|
|
||||||
|
Format: "Title (Year)"
|
||||||
|
Example: "Inception (2010)"
|
||||||
|
"""
|
||||||
|
if self.release_year:
|
||||||
|
return f"{self.title.value} ({self.release_year.value})"
|
||||||
|
return self.title.value
|
||||||
|
|
||||||
|
# WRONG
|
||||||
|
def get_filename(self) -> str:
|
||||||
|
"""
|
||||||
|
Get the suggested base filename (without extension) for this movie.
|
||||||
|
|
||||||
|
Format: ``Title.Year`` (quality lives on
|
||||||
|
:class:`alfred.domain.releases.entities.MovieRelease` now and is
|
||||||
|
appended by the release-aware caller — typically the rescan /
|
||||||
|
organize flow, after Phase 4).
|
||||||
|
|
||||||
|
Example: ``Inception.2010``.
|
||||||
|
"""
|
||||||
|
parts = [self.title.normalized()]
|
||||||
|
if self.release_year:
|
||||||
|
parts.append(str(self.release_year.value))
|
||||||
|
return ".".join(parts)
|
||||||
|
|
||||||
|
def __str__(self) -> str:
|
||||||
|
return f"{self.title.value} ({self.release_year.value if self.release_year else 'Unknown'})"
|
||||||
|
|
||||||
|
def __repr__(self) -> str:
|
||||||
|
return f"Movie(tmdb_id={self.tmdb_id}, title='{self.title.value}')"
|
||||||
@@ -1,6 +1,6 @@
|
|||||||
"""Movie domain exceptions."""
|
"""Movie domain exceptions."""
|
||||||
|
|
||||||
from ..shared.exceptions import DomainException, NotFoundError
|
from ..shared_TO_CHECK.exceptions import DomainException, NotFoundError
|
||||||
|
|
||||||
|
|
||||||
class MovieNotFound(NotFoundError):
|
class MovieNotFound(NotFoundError):
|
||||||
+3
-15
@@ -3,8 +3,7 @@
|
|||||||
from dataclasses import dataclass
|
from dataclasses import dataclass
|
||||||
from enum import Enum
|
from enum import Enum
|
||||||
|
|
||||||
from ..shared.exceptions import ValidationError
|
from ..shared_TO_CHECK.exceptions import ValidationError
|
||||||
from ..shared.value_objects import to_dot_folder_name
|
|
||||||
|
|
||||||
|
|
||||||
class Quality(Enum):
|
class Quality(Enum):
|
||||||
@@ -56,18 +55,11 @@ class MovieTitle:
|
|||||||
f"Movie title must be a string, got {type(self.value)}"
|
f"Movie title must be a string, got {type(self.value)}"
|
||||||
)
|
)
|
||||||
|
|
||||||
if len(self.value) > 500:
|
if len(self.value) > 150:
|
||||||
raise ValidationError(
|
raise ValidationError(
|
||||||
f"Movie title too long: {len(self.value)} characters (max 500)"
|
f"Movie title too long: {len(self.value)} characters (max 150)"
|
||||||
)
|
)
|
||||||
|
|
||||||
def normalized(self) -> str:
|
|
||||||
"""
|
|
||||||
Return normalized title for file system usage.
|
|
||||||
|
|
||||||
Removes special characters and replaces spaces with dots.
|
|
||||||
"""
|
|
||||||
return to_dot_folder_name(self.value)
|
|
||||||
|
|
||||||
def __str__(self) -> str:
|
def __str__(self) -> str:
|
||||||
return self.value
|
return self.value
|
||||||
@@ -93,10 +85,6 @@ class ReleaseYear:
|
|||||||
f"Release year must be an integer, got {type(self.value)}"
|
f"Release year must be an integer, got {type(self.value)}"
|
||||||
)
|
)
|
||||||
|
|
||||||
# Movies started around 1888, and we shouldn't have movies from the future
|
|
||||||
if self.value < 1888 or self.value > 2100:
|
|
||||||
raise ValidationError(f"Invalid release year: {self.value}")
|
|
||||||
|
|
||||||
def __str__(self) -> str:
|
def __str__(self) -> str:
|
||||||
return str(self.value)
|
return str(self.value)
|
||||||
|
|
||||||
@@ -1,6 +0,0 @@
|
|||||||
"""Release domain — release name parsing and naming conventions."""
|
|
||||||
|
|
||||||
from .services import parse_release
|
|
||||||
from .value_objects import ParsedRelease
|
|
||||||
|
|
||||||
__all__ = ["ParsedRelease", "parse_release"]
|
|
||||||
@@ -1,52 +0,0 @@
|
|||||||
"""ReleaseKnowledge port — the read-only query surface that
|
|
||||||
``parse_release`` and ``ParsedRelease`` need from the release knowledge
|
|
||||||
base, expressed as a structural Protocol so the domain never imports any
|
|
||||||
concrete loader.
|
|
||||||
|
|
||||||
The concrete YAML-backed implementation lives in
|
|
||||||
``alfred/infrastructure/knowledge/release_kb.py``. Tests can supply any
|
|
||||||
object that satisfies this shape (e.g. a simple dataclass).
|
|
||||||
"""
|
|
||||||
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from typing import Protocol
|
|
||||||
|
|
||||||
|
|
||||||
class ReleaseKnowledge(Protocol):
|
|
||||||
"""Read-only snapshot of release-name parsing knowledge."""
|
|
||||||
|
|
||||||
# --- Token sets used by the tokenizer / matchers ---
|
|
||||||
|
|
||||||
resolutions: set[str]
|
|
||||||
sources: set[str]
|
|
||||||
codecs: set[str]
|
|
||||||
language_tokens: set[str]
|
|
||||||
forbidden_chars: set[str]
|
|
||||||
hdr_extra: set[str]
|
|
||||||
|
|
||||||
# --- Structured knowledge (loaded from YAML as dicts) ---
|
|
||||||
|
|
||||||
audio: dict
|
|
||||||
video_meta: dict
|
|
||||||
editions: dict
|
|
||||||
media_type_tokens: dict
|
|
||||||
|
|
||||||
# --- Tokenizer separators ---
|
|
||||||
|
|
||||||
separators: list[str]
|
|
||||||
|
|
||||||
# --- File-extension sets (used by application/infra modules that work
|
|
||||||
# directly with filesystem paths, e.g. media-type detection, video
|
|
||||||
# lookup). Domain parsing itself doesn't touch these. ---
|
|
||||||
|
|
||||||
video_extensions: set[str]
|
|
||||||
non_video_extensions: set[str]
|
|
||||||
subtitle_extensions: set[str]
|
|
||||||
metadata_extensions: set[str]
|
|
||||||
|
|
||||||
# --- Filesystem sanitization (Option B: pre-sanitize at parse time) ---
|
|
||||||
|
|
||||||
def sanitize_for_fs(self, text: str) -> str:
|
|
||||||
"""Strip filesystem-forbidden characters from ``text``."""
|
|
||||||
...
|
|
||||||
@@ -1,506 +0,0 @@
|
|||||||
"""Release domain — parsing service."""
|
|
||||||
|
|
||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
import re
|
|
||||||
|
|
||||||
from .ports import ReleaseKnowledge
|
|
||||||
from .value_objects import MediaTypeToken, ParsedRelease, ParsePath
|
|
||||||
|
|
||||||
|
|
||||||
def _tokenize(name: str, kb: ReleaseKnowledge) -> list[str]:
|
|
||||||
"""Split a release name on the configured separators, dropping empty tokens."""
|
|
||||||
pattern = "[" + re.escape("".join(kb.separators)) + "]+"
|
|
||||||
return [t for t in re.split(pattern, name) if t]
|
|
||||||
|
|
||||||
|
|
||||||
def parse_release(name: str, kb: ReleaseKnowledge) -> ParsedRelease:
|
|
||||||
"""
|
|
||||||
Parse a release name and return a ParsedRelease.
|
|
||||||
|
|
||||||
Flow:
|
|
||||||
1. Strip a leading/trailing [site.tag] if present (sets parse_path="sanitized").
|
|
||||||
2. Check the remainder for truly forbidden chars (anything not in the
|
|
||||||
configured separators list). If any remain → media_type="unknown",
|
|
||||||
parse_path="ai", and the LLM handles it.
|
|
||||||
3. Tokenize using the configured separators (".", " ", "[", "]", "(", ")", "_", ...)
|
|
||||||
and run token-level matchers (season/episode, tech, languages, audio,
|
|
||||||
video, edition, title, year).
|
|
||||||
"""
|
|
||||||
parse_path = ParsePath.DIRECT.value
|
|
||||||
|
|
||||||
# Always try to extract a bracket-enclosed site tag first.
|
|
||||||
clean, site_tag = _strip_site_tag(name)
|
|
||||||
if site_tag is not None:
|
|
||||||
parse_path = ParsePath.SANITIZED.value
|
|
||||||
|
|
||||||
if not _is_well_formed(clean, kb):
|
|
||||||
return ParsedRelease(
|
|
||||||
raw=name,
|
|
||||||
normalised=clean,
|
|
||||||
title=clean,
|
|
||||||
title_sanitized=kb.sanitize_for_fs(clean),
|
|
||||||
year=None,
|
|
||||||
season=None,
|
|
||||||
episode=None,
|
|
||||||
episode_end=None,
|
|
||||||
quality=None,
|
|
||||||
source=None,
|
|
||||||
codec=None,
|
|
||||||
group="UNKNOWN",
|
|
||||||
tech_string="",
|
|
||||||
media_type=MediaTypeToken.UNKNOWN.value,
|
|
||||||
site_tag=site_tag,
|
|
||||||
parse_path=ParsePath.AI.value,
|
|
||||||
)
|
|
||||||
|
|
||||||
name = clean
|
|
||||||
tokens = _tokenize(name, kb)
|
|
||||||
|
|
||||||
season, episode, episode_end = _extract_season_episode(tokens)
|
|
||||||
quality, source, codec, group, tech_tokens = _extract_tech(tokens, kb)
|
|
||||||
languages, lang_tokens = _extract_languages(tokens, kb)
|
|
||||||
audio_codec, audio_channels, audio_tokens = _extract_audio(tokens, kb)
|
|
||||||
bit_depth, hdr_format, video_tokens = _extract_video_meta(tokens, kb)
|
|
||||||
edition, edition_tokens = _extract_edition(tokens, kb)
|
|
||||||
title = _extract_title(
|
|
||||||
tokens,
|
|
||||||
tech_tokens | lang_tokens | audio_tokens | video_tokens | edition_tokens,
|
|
||||||
kb,
|
|
||||||
)
|
|
||||||
year = _extract_year(tokens, title)
|
|
||||||
media_type = _infer_media_type(
|
|
||||||
season, quality, source, codec, year, edition, tokens, kb
|
|
||||||
)
|
|
||||||
|
|
||||||
tech_parts = [p for p in [quality, source, codec] if p]
|
|
||||||
tech_string = ".".join(tech_parts)
|
|
||||||
|
|
||||||
return ParsedRelease(
|
|
||||||
raw=name,
|
|
||||||
normalised=name,
|
|
||||||
title=title,
|
|
||||||
title_sanitized=kb.sanitize_for_fs(title),
|
|
||||||
year=year,
|
|
||||||
season=season,
|
|
||||||
episode=episode,
|
|
||||||
episode_end=episode_end,
|
|
||||||
quality=quality,
|
|
||||||
source=source,
|
|
||||||
codec=codec,
|
|
||||||
group=group,
|
|
||||||
tech_string=tech_string,
|
|
||||||
media_type=media_type,
|
|
||||||
site_tag=site_tag,
|
|
||||||
parse_path=parse_path,
|
|
||||||
languages=languages,
|
|
||||||
audio_codec=audio_codec,
|
|
||||||
audio_channels=audio_channels,
|
|
||||||
bit_depth=bit_depth,
|
|
||||||
hdr_format=hdr_format,
|
|
||||||
edition=edition,
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def _infer_media_type(
|
|
||||||
season: int | None,
|
|
||||||
quality: str | None,
|
|
||||||
source: str | None,
|
|
||||||
codec: str | None,
|
|
||||||
year: int | None,
|
|
||||||
edition: str | None,
|
|
||||||
tokens: list[str],
|
|
||||||
kb: ReleaseKnowledge,
|
|
||||||
) -> str:
|
|
||||||
"""
|
|
||||||
Infer media_type from token-level evidence only (no filesystem access).
|
|
||||||
|
|
||||||
- documentary : DOC token present
|
|
||||||
- concert : CONCERT token present
|
|
||||||
- tv_complete : INTEGRALE/COMPLETE token, no season
|
|
||||||
- tv_show : season token found
|
|
||||||
- movie : no season, at least one tech marker
|
|
||||||
- unknown : no conclusive evidence
|
|
||||||
"""
|
|
||||||
upper_tokens = {t.upper() for t in tokens}
|
|
||||||
|
|
||||||
doc_tokens = {t.upper() for t in kb.media_type_tokens.get("doc", [])}
|
|
||||||
concert_tokens = {t.upper() for t in kb.media_type_tokens.get("concert", [])}
|
|
||||||
integrale_tokens = {t.upper() for t in kb.media_type_tokens.get("integrale", [])}
|
|
||||||
|
|
||||||
if upper_tokens & doc_tokens:
|
|
||||||
return MediaTypeToken.DOCUMENTARY.value
|
|
||||||
if upper_tokens & concert_tokens:
|
|
||||||
return MediaTypeToken.CONCERT.value
|
|
||||||
if (
|
|
||||||
edition in {"COMPLETE", "INTEGRALE", "COLLECTION"}
|
|
||||||
or upper_tokens & integrale_tokens
|
|
||||||
) and season is None:
|
|
||||||
return MediaTypeToken.TV_COMPLETE.value
|
|
||||||
if season is not None:
|
|
||||||
return MediaTypeToken.TV_SHOW.value
|
|
||||||
if any([quality, source, codec, year]):
|
|
||||||
return MediaTypeToken.MOVIE.value
|
|
||||||
return MediaTypeToken.UNKNOWN.value
|
|
||||||
|
|
||||||
|
|
||||||
def _is_well_formed(name: str, kb: ReleaseKnowledge) -> bool:
|
|
||||||
"""Return True if name contains no forbidden characters per scene naming rules.
|
|
||||||
|
|
||||||
Characters listed as token separators (spaces, brackets, parens, …) are NOT
|
|
||||||
considered malforming — the tokenizer handles them. Only truly broken chars
|
|
||||||
like '@', '#', '!', '%' make a name malformed.
|
|
||||||
"""
|
|
||||||
tokenizable = set(kb.separators)
|
|
||||||
return not any(c in name for c in kb.forbidden_chars if c not in tokenizable)
|
|
||||||
|
|
||||||
|
|
||||||
def _strip_site_tag(name: str) -> tuple[str, str | None]:
|
|
||||||
"""
|
|
||||||
Strip a site watermark tag from the release name and return (clean_name, tag).
|
|
||||||
|
|
||||||
Handles two positions:
|
|
||||||
- Prefix: "[ OxTorrent.vc ] The.Title.S01..."
|
|
||||||
- Suffix: "The.Title.S01...-NTb[TGx]"
|
|
||||||
|
|
||||||
Anything between [...] is treated as a site tag.
|
|
||||||
Returns (original_name, None) if no tag found.
|
|
||||||
"""
|
|
||||||
s = name.strip()
|
|
||||||
|
|
||||||
if s.startswith("["):
|
|
||||||
close = s.find("]")
|
|
||||||
if close != -1:
|
|
||||||
tag = s[1:close].strip()
|
|
||||||
remainder = s[close + 1 :].strip()
|
|
||||||
if tag and remainder:
|
|
||||||
return remainder, tag
|
|
||||||
|
|
||||||
if s.endswith("]"):
|
|
||||||
open_bracket = s.rfind("[")
|
|
||||||
if open_bracket != -1:
|
|
||||||
tag = s[open_bracket + 1 : -1].strip()
|
|
||||||
remainder = s[:open_bracket].strip()
|
|
||||||
if tag and remainder:
|
|
||||||
return remainder, tag
|
|
||||||
|
|
||||||
return s, None
|
|
||||||
|
|
||||||
|
|
||||||
def _parse_season_episode(tok: str) -> tuple[int, int | None, int | None] | None:
|
|
||||||
"""
|
|
||||||
Parse a single token as a season/episode marker.
|
|
||||||
|
|
||||||
Handles:
|
|
||||||
- SxxExx / SxxExxExx / Sxx (canonical scene form)
|
|
||||||
- NxNN / NxNNxNN (alt form: 1x05, 12x07x08)
|
|
||||||
|
|
||||||
Returns (season, episode, episode_end) or None if not a season token.
|
|
||||||
"""
|
|
||||||
upper = tok.upper()
|
|
||||||
|
|
||||||
# SxxExx form
|
|
||||||
if len(upper) >= 3 and upper[0] == "S" and upper[1:3].isdigit():
|
|
||||||
season = int(upper[1:3])
|
|
||||||
rest = upper[3:]
|
|
||||||
|
|
||||||
if not rest:
|
|
||||||
return season, None, None
|
|
||||||
|
|
||||||
episodes: list[int] = []
|
|
||||||
while rest.startswith("E") and len(rest) >= 3 and rest[1:3].isdigit():
|
|
||||||
episodes.append(int(rest[1:3]))
|
|
||||||
rest = rest[3:]
|
|
||||||
|
|
||||||
if not episodes:
|
|
||||||
return None # malformed token like "S03XYZ"
|
|
||||||
|
|
||||||
return season, episodes[0], episodes[1] if len(episodes) >= 2 else None
|
|
||||||
|
|
||||||
# NxNN form — split on "X" (uppercased), all parts must be digits
|
|
||||||
if "X" in upper:
|
|
||||||
parts = upper.split("X")
|
|
||||||
if len(parts) >= 2 and all(p.isdigit() and p for p in parts):
|
|
||||||
season = int(parts[0])
|
|
||||||
episode = int(parts[1])
|
|
||||||
episode_end = int(parts[2]) if len(parts) >= 3 else None
|
|
||||||
return season, episode, episode_end
|
|
||||||
|
|
||||||
return None
|
|
||||||
|
|
||||||
|
|
||||||
def _extract_season_episode(
|
|
||||||
tokens: list[str],
|
|
||||||
) -> tuple[int | None, int | None, int | None]:
|
|
||||||
for tok in tokens:
|
|
||||||
parsed = _parse_season_episode(tok)
|
|
||||||
if parsed is not None:
|
|
||||||
return parsed
|
|
||||||
return None, None, None
|
|
||||||
|
|
||||||
|
|
||||||
def _extract_tech(
|
|
||||||
tokens: list[str],
|
|
||||||
kb: ReleaseKnowledge,
|
|
||||||
) -> tuple[str | None, str | None, str | None, str, set[str]]:
|
|
||||||
"""
|
|
||||||
Extract quality, source, codec, group from tokens.
|
|
||||||
|
|
||||||
Returns (quality, source, codec, group, tech_token_set).
|
|
||||||
|
|
||||||
Group extraction strategy (in priority order):
|
|
||||||
1. Token where prefix is a known codec: x265-GROUP
|
|
||||||
2. Rightmost token with a dash that isn't a known source
|
|
||||||
"""
|
|
||||||
quality: str | None = None
|
|
||||||
source: str | None = None
|
|
||||||
codec: str | None = None
|
|
||||||
group = "UNKNOWN"
|
|
||||||
tech_tokens: set[str] = set()
|
|
||||||
|
|
||||||
for tok in tokens:
|
|
||||||
tl = tok.lower()
|
|
||||||
|
|
||||||
if tl in kb.resolutions:
|
|
||||||
quality = tok
|
|
||||||
tech_tokens.add(tok)
|
|
||||||
continue
|
|
||||||
|
|
||||||
if tl in kb.sources:
|
|
||||||
source = tok
|
|
||||||
tech_tokens.add(tok)
|
|
||||||
continue
|
|
||||||
|
|
||||||
if "-" in tok:
|
|
||||||
parts = tok.rsplit("-", 1)
|
|
||||||
# codec-GROUP (highest priority for group)
|
|
||||||
if parts[0].lower() in kb.codecs:
|
|
||||||
codec = parts[0]
|
|
||||||
group = parts[1] if parts[1] else "UNKNOWN"
|
|
||||||
tech_tokens.add(tok)
|
|
||||||
continue
|
|
||||||
# source with dash: Web-DL, WEB-DL, etc.
|
|
||||||
if parts[0].lower() in kb.sources or tok.lower().replace("-", "") in kb.sources:
|
|
||||||
source = tok
|
|
||||||
tech_tokens.add(tok)
|
|
||||||
continue
|
|
||||||
|
|
||||||
if tl in kb.codecs:
|
|
||||||
codec = tok
|
|
||||||
tech_tokens.add(tok)
|
|
||||||
|
|
||||||
# Fallback: rightmost token with a dash that isn't a known source
|
|
||||||
if group == "UNKNOWN":
|
|
||||||
for tok in reversed(tokens):
|
|
||||||
if "-" in tok:
|
|
||||||
parts = tok.rsplit("-", 1)
|
|
||||||
tl = tok.lower()
|
|
||||||
if tl in kb.sources or tok.lower().replace("-", "") in kb.sources:
|
|
||||||
continue
|
|
||||||
if parts[1]:
|
|
||||||
group = parts[1]
|
|
||||||
break
|
|
||||||
|
|
||||||
return quality, source, codec, group, tech_tokens
|
|
||||||
|
|
||||||
|
|
||||||
def _is_year_token(tok: str) -> bool:
|
|
||||||
"""Return True if tok is a 4-digit year between 1900 and 2099."""
|
|
||||||
return len(tok) == 4 and tok.isdigit() and 1900 <= int(tok) <= 2099
|
|
||||||
|
|
||||||
|
|
||||||
def _extract_title(
|
|
||||||
tokens: list[str], tech_tokens: set[str], kb: ReleaseKnowledge
|
|
||||||
) -> str:
|
|
||||||
"""Extract the title portion: everything before the first season/year/tech token."""
|
|
||||||
title_parts = []
|
|
||||||
known_tech = kb.resolutions | kb.sources | kb.codecs
|
|
||||||
for tok in tokens:
|
|
||||||
if _parse_season_episode(tok) is not None:
|
|
||||||
break
|
|
||||||
if _is_year_token(tok):
|
|
||||||
break
|
|
||||||
if tok in tech_tokens or tok.lower() in known_tech:
|
|
||||||
break
|
|
||||||
if "-" in tok and any(p.lower() in kb.codecs | kb.sources for p in tok.split("-")):
|
|
||||||
break
|
|
||||||
title_parts.append(tok)
|
|
||||||
|
|
||||||
return ".".join(title_parts) if title_parts else tokens[0]
|
|
||||||
|
|
||||||
|
|
||||||
def _extract_year(tokens: list[str], title: str) -> int | None:
|
|
||||||
"""Extract a 4-digit year from tokens (only after the title)."""
|
|
||||||
title_len = len(title.split("."))
|
|
||||||
for tok in tokens[title_len:]:
|
|
||||||
if _is_year_token(tok):
|
|
||||||
return int(tok)
|
|
||||||
return None
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Sequence matcher
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
|
|
||||||
def _match_sequences(
|
|
||||||
tokens: list[str],
|
|
||||||
sequences: list[dict],
|
|
||||||
key: str,
|
|
||||||
) -> tuple[str | None, set[str]]:
|
|
||||||
"""
|
|
||||||
Try to match multi-token sequences against consecutive tokens.
|
|
||||||
|
|
||||||
Returns (matched_value, set_of_matched_tokens) or (None, empty_set).
|
|
||||||
Sequences must be ordered most-specific first in the YAML.
|
|
||||||
"""
|
|
||||||
upper_tokens = [t.upper() for t in tokens]
|
|
||||||
for seq in sequences:
|
|
||||||
seq_upper = [s.upper() for s in seq["tokens"]]
|
|
||||||
n = len(seq_upper)
|
|
||||||
for i in range(len(upper_tokens) - n + 1):
|
|
||||||
if upper_tokens[i : i + n] == seq_upper:
|
|
||||||
matched = set(tokens[i : i + n])
|
|
||||||
return seq[key], matched
|
|
||||||
return None, set()
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Language extraction
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
|
|
||||||
def _extract_languages(
|
|
||||||
tokens: list[str], kb: ReleaseKnowledge
|
|
||||||
) -> tuple[list[str], set[str]]:
|
|
||||||
"""Extract language tokens. Returns (languages, matched_token_set)."""
|
|
||||||
languages = []
|
|
||||||
lang_tokens: set[str] = set()
|
|
||||||
for tok in tokens:
|
|
||||||
if tok.upper() in kb.language_tokens:
|
|
||||||
languages.append(tok.upper())
|
|
||||||
lang_tokens.add(tok)
|
|
||||||
return languages, lang_tokens
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Audio extraction
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
|
|
||||||
def _extract_audio(
|
|
||||||
tokens: list[str], kb: ReleaseKnowledge,
|
|
||||||
) -> tuple[str | None, str | None, set[str]]:
|
|
||||||
"""
|
|
||||||
Extract audio codec and channel layout.
|
|
||||||
|
|
||||||
Returns (audio_codec, audio_channels, matched_token_set).
|
|
||||||
Sequences are tried first (DTS.HD.MA, TrueHD.Atmos, …), then single tokens.
|
|
||||||
"""
|
|
||||||
audio_codec: str | None = None
|
|
||||||
audio_channels: str | None = None
|
|
||||||
audio_tokens: set[str] = set()
|
|
||||||
|
|
||||||
known_codecs = {c.upper() for c in kb.audio.get("codecs", [])}
|
|
||||||
known_channels = set(kb.audio.get("channels", []))
|
|
||||||
|
|
||||||
# Try multi-token sequences first
|
|
||||||
matched_codec, matched_set = _match_sequences(
|
|
||||||
tokens, kb.audio.get("sequences", []), "codec"
|
|
||||||
)
|
|
||||||
if matched_codec:
|
|
||||||
audio_codec = matched_codec
|
|
||||||
audio_tokens |= matched_set
|
|
||||||
|
|
||||||
# Channel layouts like "5.1" or "7.1" are split into two tokens by normalize —
|
|
||||||
# detect them as consecutive pairs "X" + "Y" where "X.Y" is a known channel.
|
|
||||||
# The second token may have a "-GROUP" suffix (e.g. "1-KTH" → strip it).
|
|
||||||
for i in range(len(tokens) - 1):
|
|
||||||
second = tokens[i + 1].split("-")[0]
|
|
||||||
candidate = f"{tokens[i]}.{second}"
|
|
||||||
if candidate in known_channels and audio_channels is None:
|
|
||||||
audio_channels = candidate
|
|
||||||
audio_tokens.add(tokens[i])
|
|
||||||
audio_tokens.add(tokens[i + 1])
|
|
||||||
|
|
||||||
for tok in tokens:
|
|
||||||
if tok in audio_tokens:
|
|
||||||
continue
|
|
||||||
if tok.upper() in known_codecs and audio_codec is None:
|
|
||||||
audio_codec = tok
|
|
||||||
audio_tokens.add(tok)
|
|
||||||
elif tok in known_channels and audio_channels is None:
|
|
||||||
audio_channels = tok
|
|
||||||
audio_tokens.add(tok)
|
|
||||||
|
|
||||||
return audio_codec, audio_channels, audio_tokens
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Video metadata extraction (bit depth, HDR)
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
|
|
||||||
def _extract_video_meta(
|
|
||||||
tokens: list[str], kb: ReleaseKnowledge,
|
|
||||||
) -> tuple[str | None, str | None, set[str]]:
|
|
||||||
"""
|
|
||||||
Extract bit depth and HDR format.
|
|
||||||
|
|
||||||
Returns (bit_depth, hdr_format, matched_token_set).
|
|
||||||
"""
|
|
||||||
bit_depth: str | None = None
|
|
||||||
hdr_format: str | None = None
|
|
||||||
video_tokens: set[str] = set()
|
|
||||||
|
|
||||||
known_hdr = {h.upper() for h in kb.video_meta.get("hdr", [])} | kb.hdr_extra
|
|
||||||
known_depth = {d.lower() for d in kb.video_meta.get("bit_depth", [])}
|
|
||||||
|
|
||||||
# Try HDR sequences first
|
|
||||||
matched_hdr, matched_set = _match_sequences(
|
|
||||||
tokens, kb.video_meta.get("sequences", []), "hdr"
|
|
||||||
)
|
|
||||||
if matched_hdr:
|
|
||||||
hdr_format = matched_hdr
|
|
||||||
video_tokens |= matched_set
|
|
||||||
|
|
||||||
for tok in tokens:
|
|
||||||
if tok in video_tokens:
|
|
||||||
continue
|
|
||||||
if tok.upper() in known_hdr and hdr_format is None:
|
|
||||||
hdr_format = tok.upper()
|
|
||||||
video_tokens.add(tok)
|
|
||||||
elif tok.lower() in known_depth and bit_depth is None:
|
|
||||||
bit_depth = tok.lower()
|
|
||||||
video_tokens.add(tok)
|
|
||||||
|
|
||||||
return bit_depth, hdr_format, video_tokens
|
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Edition extraction
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
|
|
||||||
def _extract_edition(
|
|
||||||
tokens: list[str], kb: ReleaseKnowledge
|
|
||||||
) -> tuple[str | None, set[str]]:
|
|
||||||
"""
|
|
||||||
Extract release edition (UNRATED, EXTENDED, DIRECTORS.CUT, …).
|
|
||||||
|
|
||||||
Returns (edition, matched_token_set).
|
|
||||||
"""
|
|
||||||
known_tokens = {t.upper() for t in kb.editions.get("tokens", [])}
|
|
||||||
|
|
||||||
# Try multi-token sequences first
|
|
||||||
matched_edition, matched_set = _match_sequences(
|
|
||||||
tokens, kb.editions.get("sequences", []), "edition"
|
|
||||||
)
|
|
||||||
if matched_edition:
|
|
||||||
return matched_edition, matched_set
|
|
||||||
|
|
||||||
for tok in tokens:
|
|
||||||
if tok.upper() in known_tokens:
|
|
||||||
return tok.upper(), {tok}
|
|
||||||
|
|
||||||
return None, set()
|
|
||||||
@@ -0,0 +1,38 @@
|
|||||||
|
"""Filesystem release aggregates — what the user owns on disk.
|
||||||
|
|
||||||
|
This bounded context is intentionally separated from
|
||||||
|
``alfred.domain.tv_shows`` / ``alfred.domain.movies`` (TMDB identity).
|
||||||
|
A :class:`SeriesRelease` describes the physical files on disk for one
|
||||||
|
show; a :class:`TVShow` describes the work as catalogued by TMDB. The
|
||||||
|
two are linked by :class:`~alfred.domain.shared.value_objects.TmdbId`
|
||||||
|
in the persistence layer, never by direct reference.
|
||||||
|
|
||||||
|
Not to be confused with ``alfred.domain.release`` (singular) which
|
||||||
|
parses release **names** (strings → tokens). The two packages may be
|
||||||
|
merged later; for now they coexist as separate concerns.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from .builders import SeasonReleaseBuilder, SeriesReleaseBuilder
|
||||||
|
from .entities import (
|
||||||
|
EpisodeRelease,
|
||||||
|
MovieRelease,
|
||||||
|
SeasonRelease,
|
||||||
|
SeriesRelease,
|
||||||
|
TrackProfile,
|
||||||
|
)
|
||||||
|
from .repositories import MovieReleaseRepository, SeriesReleaseRepository
|
||||||
|
from .value_objects import EpisodeRange, ReleaseMode
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"EpisodeRange",
|
||||||
|
"EpisodeRelease",
|
||||||
|
"MovieRelease",
|
||||||
|
"MovieReleaseRepository",
|
||||||
|
"ReleaseMode",
|
||||||
|
"SeasonRelease",
|
||||||
|
"SeasonReleaseBuilder",
|
||||||
|
"SeriesRelease",
|
||||||
|
"SeriesReleaseBuilder",
|
||||||
|
"SeriesReleaseRepository",
|
||||||
|
"TrackProfile",
|
||||||
|
]
|
||||||
@@ -0,0 +1,243 @@
|
|||||||
|
"""Builders for the filesystem release aggregates.
|
||||||
|
|
||||||
|
The aggregates are frozen — :class:`SeriesRelease`, :class:`SeasonRelease`,
|
||||||
|
and :class:`EpisodeRelease` are ``@dataclass(frozen=True)`` and offer no
|
||||||
|
mutation methods. All construction goes through these builders, which
|
||||||
|
assemble the aggregate piece by piece and emit a frozen instance via
|
||||||
|
``build()``.
|
||||||
|
|
||||||
|
Typical usage during a filesystem walk::
|
||||||
|
|
||||||
|
builder = SeriesReleaseBuilder(tmdb_id=TmdbId(84958), imdb_id=ImdbId("tt0804484"))
|
||||||
|
sb = builder.season_builder(SeasonNumber(1), folder="Show.S01", mode=ReleaseMode.PACK)
|
||||||
|
sb.add_episode(EpisodeRelease(
|
||||||
|
episodes=EpisodeRange(EpisodeNumber(1), EpisodeNumber(1)),
|
||||||
|
file_path=FilePath("Show.S01/Show.S01E01.mkv"),
|
||||||
|
tracks=TrackProfile(),
|
||||||
|
))
|
||||||
|
release = builder.build()
|
||||||
|
|
||||||
|
Builders are **single-use scratchpads**: they hold mutable state during
|
||||||
|
construction, then produce an immutable aggregate.
|
||||||
|
|
||||||
|
Invariants enforced at ``build()`` time:
|
||||||
|
|
||||||
|
* Seasons are emitted sorted by ``season_number``.
|
||||||
|
* Episodes within each season are emitted sorted by their
|
||||||
|
``EpisodeRange.start`` (so a season with ``E01-E03`` + ``E04`` is
|
||||||
|
emitted in that order).
|
||||||
|
* No two ``EpisodeRelease`` within a season may overlap (same TMDB
|
||||||
|
episode covered by two distinct files) — raises ``ValidationError``.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from ..shared_TO_CHECK.exceptions import ValidationError
|
||||||
|
from ..shared_TO_CHECK.value_objects import ImdbId, TmdbId
|
||||||
|
from ..tv_shows.value_objects import SeasonNumber
|
||||||
|
from .entities import (
|
||||||
|
EpisodeRelease,
|
||||||
|
SeasonRelease,
|
||||||
|
SeriesRelease,
|
||||||
|
)
|
||||||
|
from .value_objects import ReleaseMode
|
||||||
|
|
||||||
|
|
||||||
|
# ════════════════════════════════════════════════════════════════════════════
|
||||||
|
# MovieReleaseBuilder
|
||||||
|
# ════════════════════════════════════════════════════════════════════════════
|
||||||
|
# ...
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
# ════════════════════════════════════════════════════════════════════════════
|
||||||
|
# SeasonReleaseBuilder
|
||||||
|
# ════════════════════════════════════════════════════════════════════════════
|
||||||
|
|
||||||
|
|
||||||
|
class SeasonReleaseBuilder:
|
||||||
|
"""
|
||||||
|
Mutable scratchpad for a :class:`SeasonRelease`.
|
||||||
|
|
||||||
|
Episodes are appended in arbitrary order; ``build()`` sorts them by
|
||||||
|
their range start before emitting the frozen aggregate and verifies
|
||||||
|
there are no overlapping ranges.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
season_number: SeasonNumber | int,
|
||||||
|
*,
|
||||||
|
folder: str,
|
||||||
|
mode: ReleaseMode,
|
||||||
|
) -> None:
|
||||||
|
if isinstance(season_number, int):
|
||||||
|
season_number = SeasonNumber(season_number)
|
||||||
|
self._season_number: SeasonNumber = season_number
|
||||||
|
self._folder: str = folder
|
||||||
|
self._mode: ReleaseMode = mode
|
||||||
|
self._episodes: list[EpisodeRelease] = []
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def from_existing(cls, season: SeasonRelease) -> SeasonReleaseBuilder:
|
||||||
|
"""Seed a builder from an existing frozen :class:`SeasonRelease`."""
|
||||||
|
builder = cls(
|
||||||
|
season.season_number,
|
||||||
|
folder=season.folder,
|
||||||
|
mode=season.mode,
|
||||||
|
)
|
||||||
|
builder._episodes = list(season.episodes)
|
||||||
|
return builder
|
||||||
|
|
||||||
|
@property
|
||||||
|
def season_number(self) -> SeasonNumber:
|
||||||
|
return self._season_number
|
||||||
|
|
||||||
|
@property
|
||||||
|
def mode(self) -> ReleaseMode:
|
||||||
|
return self._mode
|
||||||
|
|
||||||
|
def set_folder(self, folder: str) -> SeasonReleaseBuilder:
|
||||||
|
self._folder = folder
|
||||||
|
return self
|
||||||
|
|
||||||
|
def set_mode(self, mode: ReleaseMode) -> SeasonReleaseBuilder:
|
||||||
|
self._mode = mode
|
||||||
|
return self
|
||||||
|
|
||||||
|
def add_episode(self, episode: EpisodeRelease) -> SeasonReleaseBuilder:
|
||||||
|
"""Append a physical-file :class:`EpisodeRelease` to this season."""
|
||||||
|
self._episodes.append(episode)
|
||||||
|
return self
|
||||||
|
|
||||||
|
def build(self) -> SeasonRelease:
|
||||||
|
"""Emit a frozen :class:`SeasonRelease` with episodes sorted.
|
||||||
|
|
||||||
|
Raises :class:`ValidationError` if any two episode ranges overlap
|
||||||
|
(same TMDB slot claimed by two distinct files).
|
||||||
|
"""
|
||||||
|
ordered = tuple(
|
||||||
|
sorted(self._episodes, key=lambda ep: ep.episodes.start.value)
|
||||||
|
)
|
||||||
|
# Overlap check — ranges are inclusive on both ends, sorted by start.
|
||||||
|
for prev, curr in zip(ordered, ordered[1:], strict=False):
|
||||||
|
if curr.episodes.start.value <= prev.episodes.end.value:
|
||||||
|
raise ValidationError(
|
||||||
|
f"SeasonRelease season {self._season_number}: overlapping "
|
||||||
|
f"episode ranges {prev.episodes} and {curr.episodes}"
|
||||||
|
)
|
||||||
|
return SeasonRelease(
|
||||||
|
season_number=self._season_number,
|
||||||
|
folder=self._folder,
|
||||||
|
mode=self._mode,
|
||||||
|
episodes=ordered,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# ════════════════════════════════════════════════════════════════════════════
|
||||||
|
# SeriesReleaseBuilder
|
||||||
|
# ════════════════════════════════════════════════════════════════════════════
|
||||||
|
|
||||||
|
|
||||||
|
class SeriesReleaseBuilder:
|
||||||
|
"""
|
||||||
|
Mutable scratchpad for the :class:`SeriesRelease` aggregate root.
|
||||||
|
|
||||||
|
Seasons are tracked via internal :class:`SeasonReleaseBuilder`
|
||||||
|
instances keyed by :class:`SeasonNumber`.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
*,
|
||||||
|
tmdb_id: TmdbId | int,
|
||||||
|
imdb_id: ImdbId | str | None = None,
|
||||||
|
) -> None:
|
||||||
|
if isinstance(tmdb_id, int):
|
||||||
|
tmdb_id = TmdbId(tmdb_id)
|
||||||
|
if isinstance(imdb_id, str):
|
||||||
|
imdb_id = ImdbId(imdb_id)
|
||||||
|
self._tmdb_id: TmdbId = tmdb_id
|
||||||
|
self._imdb_id: ImdbId | None = imdb_id
|
||||||
|
self._season_builders: dict[SeasonNumber, SeasonReleaseBuilder] = {}
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def from_existing(cls, release: SeriesRelease) -> SeriesReleaseBuilder:
|
||||||
|
"""Seed a builder from an existing frozen :class:`SeriesRelease`."""
|
||||||
|
builder = cls(
|
||||||
|
tmdb_id=release.tmdb_id,
|
||||||
|
imdb_id=release.imdb_id,
|
||||||
|
)
|
||||||
|
for season in release.seasons:
|
||||||
|
builder._season_builders[season.season_number] = (
|
||||||
|
SeasonReleaseBuilder.from_existing(season)
|
||||||
|
)
|
||||||
|
return builder
|
||||||
|
|
||||||
|
# ── Top-level mutators ─────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def set_imdb_id(self, imdb_id: ImdbId | str | None) -> SeriesReleaseBuilder:
|
||||||
|
if isinstance(imdb_id, str):
|
||||||
|
imdb_id = ImdbId(imdb_id)
|
||||||
|
self._imdb_id = imdb_id
|
||||||
|
return self
|
||||||
|
|
||||||
|
# ── Content ────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def season_builder(
|
||||||
|
self,
|
||||||
|
season_number: SeasonNumber | int,
|
||||||
|
*,
|
||||||
|
folder: str | None = None,
|
||||||
|
mode: ReleaseMode | None = None,
|
||||||
|
) -> SeasonReleaseBuilder:
|
||||||
|
"""
|
||||||
|
Return (creating if needed) the :class:`SeasonReleaseBuilder` for a
|
||||||
|
season.
|
||||||
|
|
||||||
|
``folder`` and ``mode`` are required when the builder does not yet
|
||||||
|
exist for this season; subsequent calls may pass them to override.
|
||||||
|
"""
|
||||||
|
if isinstance(season_number, int):
|
||||||
|
season_number = SeasonNumber(season_number)
|
||||||
|
sb = self._season_builders.get(season_number)
|
||||||
|
if sb is None:
|
||||||
|
if folder is None or mode is None:
|
||||||
|
raise ValidationError(
|
||||||
|
f"season_builder({season_number}): folder and mode "
|
||||||
|
f"are required to create a new season builder"
|
||||||
|
)
|
||||||
|
sb = SeasonReleaseBuilder(season_number, folder=folder, mode=mode)
|
||||||
|
self._season_builders[season_number] = sb
|
||||||
|
else:
|
||||||
|
if folder is not None:
|
||||||
|
sb.set_folder(folder)
|
||||||
|
if mode is not None:
|
||||||
|
sb.set_mode(mode)
|
||||||
|
return sb
|
||||||
|
|
||||||
|
def add_season(self, season: SeasonRelease) -> SeriesReleaseBuilder:
|
||||||
|
"""
|
||||||
|
Attach (or replace) a fully-built :class:`SeasonRelease`.
|
||||||
|
|
||||||
|
Replaces any existing season with the same number.
|
||||||
|
"""
|
||||||
|
self._season_builders[season.season_number] = (
|
||||||
|
SeasonReleaseBuilder.from_existing(season)
|
||||||
|
)
|
||||||
|
return self
|
||||||
|
|
||||||
|
# ── Emit ───────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def build(self) -> SeriesRelease:
|
||||||
|
"""Emit a frozen :class:`SeriesRelease` with seasons sorted by number."""
|
||||||
|
ordered_seasons = tuple(
|
||||||
|
self._season_builders[n].build()
|
||||||
|
for n in sorted(self._season_builders, key=lambda x: x.value)
|
||||||
|
)
|
||||||
|
return SeriesRelease(
|
||||||
|
tmdb_id=self._tmdb_id,
|
||||||
|
imdb_id=self._imdb_id,
|
||||||
|
seasons=ordered_seasons,
|
||||||
|
)
|
||||||
@@ -0,0 +1,217 @@
|
|||||||
|
"""Filesystem release aggregates.
|
||||||
|
|
||||||
|
The release domain models what the user owns on disk — one
|
||||||
|
:class:`SeriesRelease` per show, one :class:`MovieRelease` per movie.
|
||||||
|
TMDB identity (title, status, episode_count, …) lives in the
|
||||||
|
``tv_shows`` / ``movies`` domains and is linked via the
|
||||||
|
:class:`~alfred.domain.shared.value_objects.TmdbId` natural key.
|
||||||
|
|
||||||
|
All entities are frozen. Mutation goes through the builders in
|
||||||
|
:mod:`alfred.domain.releases.builders`.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from datetime import datetime
|
||||||
|
|
||||||
|
from ..shared_TO_CHECK.exceptions import ValidationError
|
||||||
|
from ..shared_TO_CHECK.media import AudioTrack, SubtitleTrack
|
||||||
|
from ..shared_TO_CHECK.value_objects import FilePath, ImdbId, TmdbId
|
||||||
|
from ..tv_shows.value_objects import SeasonNumber
|
||||||
|
from .value_objects import EpisodeRange, ReleaseMode
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"EpisodeRelease",
|
||||||
|
"MovieRelease",
|
||||||
|
"SeasonRelease",
|
||||||
|
"SeriesRelease",
|
||||||
|
"TrackProfile",
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class TrackProfile:
|
||||||
|
"""
|
||||||
|
Audio + subtitle tracks of one physical file.
|
||||||
|
|
||||||
|
Tracks live per-file (not per-season): every ``EpisodeRelease`` and
|
||||||
|
``MovieRelease`` carries its own ``TrackProfile``. Season-level
|
||||||
|
aggregation is computed by the caller when needed.
|
||||||
|
"""
|
||||||
|
|
||||||
|
audio_tracks: tuple[AudioTrack, ...] = ()
|
||||||
|
subtitle_tracks: tuple[SubtitleTrack, ...] = ()
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class EpisodeRelease:
|
||||||
|
"""
|
||||||
|
One physical episode file (or multi-episode file) on disk.
|
||||||
|
|
||||||
|
:attr:`episodes` is an :class:`EpisodeRange` — a single ``.mkv``
|
||||||
|
that covers ``S01E02E03`` carries ``EpisodeRange(start=E02, end=E03)``
|
||||||
|
and is recorded once. The library index lists it under each covered
|
||||||
|
slot (``E02``, ``E03``) for symmetric lookups.
|
||||||
|
|
||||||
|
:attr:`file_path` is **relative to the show root** (e.g.
|
||||||
|
``"Show.S01/Show.S01E02.mkv"`` for PACK,
|
||||||
|
``"Show.S01/Show.S01E02-RG/Show.S01E02-RG.mkv"`` for EPISODIC).
|
||||||
|
The caller (repository) prepends the absolute show root when
|
||||||
|
needed.
|
||||||
|
"""
|
||||||
|
|
||||||
|
episodes: EpisodeRange
|
||||||
|
file_path: FilePath
|
||||||
|
tracks: TrackProfile = TrackProfile()
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class SeasonRelease:
|
||||||
|
"""
|
||||||
|
All physical files on disk for one season of a show.
|
||||||
|
|
||||||
|
The :attr:`mode` flag records the filesystem layout:
|
||||||
|
|
||||||
|
* :attr:`ReleaseMode.PACK` — the season folder contains N video
|
||||||
|
files directly. ``episodes`` lists each ``.mkv`` in the folder.
|
||||||
|
* :attr:`ReleaseMode.EPISODIC` — the season folder contains N
|
||||||
|
sub-folders, each with one episode. ``episodes`` lists each
|
||||||
|
``(subfolder, file)`` pair.
|
||||||
|
|
||||||
|
:attr:`folder` is the season folder name, relative to the show root.
|
||||||
|
|
||||||
|
Invariant: every ``EpisodeRelease.episodes`` range stays within
|
||||||
|
sane bounds (validated at construction). Cross-episode duplicate
|
||||||
|
detection (two files claiming the same TMDB slot) is the
|
||||||
|
builder's job, not the entity's.
|
||||||
|
"""
|
||||||
|
|
||||||
|
season_number: SeasonNumber
|
||||||
|
folder: str
|
||||||
|
mode: ReleaseMode
|
||||||
|
episodes: tuple[EpisodeRelease, ...] = ()
|
||||||
|
|
||||||
|
def __post_init__(self) -> None:
|
||||||
|
if not isinstance(self.season_number, SeasonNumber):
|
||||||
|
raise ValidationError(
|
||||||
|
f"SeasonRelease.season_number must be SeasonNumber, "
|
||||||
|
f"got {type(self.season_number)}"
|
||||||
|
)
|
||||||
|
if not isinstance(self.mode, ReleaseMode):
|
||||||
|
raise ValidationError(
|
||||||
|
f"SeasonRelease.mode must be ReleaseMode, got {type(self.mode)}"
|
||||||
|
)
|
||||||
|
if not isinstance(self.folder, str) or not self.folder:
|
||||||
|
raise ValidationError(
|
||||||
|
f"SeasonRelease.folder must be a non-empty string, "
|
||||||
|
f"got {self.folder!r}"
|
||||||
|
)
|
||||||
|
|
||||||
|
def episode_count(self) -> int:
|
||||||
|
"""
|
||||||
|
Total number of TMDB episode slots covered by all physical files.
|
||||||
|
|
||||||
|
Sums each :meth:`EpisodeRange.count` — a season with two files
|
||||||
|
``E01`` + ``E02-E03`` returns ``3`` (one slot from the first
|
||||||
|
file, two from the second).
|
||||||
|
|
||||||
|
Compared by the caller against the library index's TMDB
|
||||||
|
``episode_count`` to detect incomplete seasons.
|
||||||
|
"""
|
||||||
|
return sum(ep.episodes.count() for ep in self.episodes)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class SeriesRelease:
|
||||||
|
"""
|
||||||
|
All physical seasons on disk for one show.
|
||||||
|
|
||||||
|
Anchored to TMDB by :attr:`tmdb_id` (primary key). :attr:`imdb_id`
|
||||||
|
is optional and stored as a secondary anchor — useful for the
|
||||||
|
occasional show without TMDB coverage, and for cross-checking
|
||||||
|
when both ids are known.
|
||||||
|
|
||||||
|
Seasons are exposed sorted by ``season_number`` (the builder
|
||||||
|
enforces this on emit). No duplicate ``season_number`` is
|
||||||
|
permitted across :attr:`seasons`.
|
||||||
|
"""
|
||||||
|
|
||||||
|
tmdb_id: TmdbId
|
||||||
|
imdb_id: ImdbId | None
|
||||||
|
seasons: tuple[SeasonRelease, ...] = ()
|
||||||
|
|
||||||
|
def __post_init__(self) -> None:
|
||||||
|
if not isinstance(self.tmdb_id, TmdbId):
|
||||||
|
raise ValidationError(
|
||||||
|
f"SeriesRelease.tmdb_id must be TmdbId, got {type(self.tmdb_id)}"
|
||||||
|
)
|
||||||
|
if self.imdb_id is not None and not isinstance(self.imdb_id, ImdbId):
|
||||||
|
raise ValidationError(
|
||||||
|
f"SeriesRelease.imdb_id must be ImdbId or None, "
|
||||||
|
f"got {type(self.imdb_id)}"
|
||||||
|
)
|
||||||
|
seen: set[int] = set()
|
||||||
|
for s in self.seasons:
|
||||||
|
if s.season_number.value in seen:
|
||||||
|
raise ValidationError(
|
||||||
|
f"SeriesRelease has duplicate season "
|
||||||
|
f"{s.season_number}"
|
||||||
|
)
|
||||||
|
seen.add(s.season_number.value)
|
||||||
|
|
||||||
|
def get_season(self, season_number: SeasonNumber) -> SeasonRelease | None:
|
||||||
|
"""Return the :class:`SeasonRelease` for ``season_number`` or ``None``."""
|
||||||
|
for s in self.seasons:
|
||||||
|
if s.season_number == season_number:
|
||||||
|
return s
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class MovieRelease:
|
||||||
|
"""
|
||||||
|
A single physical movie file on disk.
|
||||||
|
|
||||||
|
Anchored to TMDB by :attr:`tmdb_id`; :attr:`imdb_id` optional
|
||||||
|
secondary anchor.
|
||||||
|
|
||||||
|
:attr:`folder` is the movie folder name relative to the
|
||||||
|
``movies/`` library root. :attr:`file_path` is the video file
|
||||||
|
name relative to the folder (movies are one folder, one file in
|
||||||
|
Alfred's layout — no sub-folders).
|
||||||
|
|
||||||
|
:attr:`added_at` is the UTC timestamp at which the release was
|
||||||
|
first observed in the library — set by the caller (organizer /
|
||||||
|
rescan) when the aggregate is built. Persisted by the v2 movie
|
||||||
|
sidecar; not derived from the filesystem (mtime drifts across
|
||||||
|
moves and hard-links).
|
||||||
|
"""
|
||||||
|
|
||||||
|
tmdb_id: TmdbId
|
||||||
|
imdb_id: ImdbId | None
|
||||||
|
folder: str
|
||||||
|
file_path: FilePath
|
||||||
|
added_at: datetime
|
||||||
|
tracks: TrackProfile = TrackProfile()
|
||||||
|
|
||||||
|
def __post_init__(self) -> None:
|
||||||
|
if not isinstance(self.tmdb_id, TmdbId):
|
||||||
|
raise ValidationError(
|
||||||
|
f"MovieRelease.tmdb_id must be TmdbId, got {type(self.tmdb_id)}"
|
||||||
|
)
|
||||||
|
if self.imdb_id is not None and not isinstance(self.imdb_id, ImdbId):
|
||||||
|
raise ValidationError(
|
||||||
|
f"MovieRelease.imdb_id must be ImdbId or None, "
|
||||||
|
f"got {type(self.imdb_id)}"
|
||||||
|
)
|
||||||
|
if not isinstance(self.folder, str) or not self.folder:
|
||||||
|
raise ValidationError(
|
||||||
|
f"MovieRelease.folder must be a non-empty string, "
|
||||||
|
f"got {self.folder!r}"
|
||||||
|
)
|
||||||
|
if not isinstance(self.added_at, datetime):
|
||||||
|
raise ValidationError(
|
||||||
|
f"MovieRelease.added_at must be datetime, "
|
||||||
|
f"got {type(self.added_at)}"
|
||||||
|
)
|
||||||
@@ -0,0 +1,27 @@
|
|||||||
|
"""Release parser v2 — annotate-based pipeline.
|
||||||
|
|
||||||
|
This package is the future home of ``parse_release``. It restructures the
|
||||||
|
parsing logic around a **tokenize → annotate → assemble** pipeline:
|
||||||
|
|
||||||
|
1. **tokenize**: split the release name into atomic tokens.
|
||||||
|
2. **annotate**: walk tokens left-to-right, assigning each one a
|
||||||
|
:class:`TokenRole` (TITLE, YEAR, SEASON, RESOLUTION, …) using the
|
||||||
|
injected :class:`~alfred.domain.release.ports.knowledge.ReleaseKnowledge`.
|
||||||
|
3. **assemble**: fold the annotated tokens into a :class:`ParsedRelease`.
|
||||||
|
|
||||||
|
The pipeline has three internal paths driven by the detected release group:
|
||||||
|
|
||||||
|
- **EASY**: known group (KONTRAST, RARBG, …) with a schema-driven layout
|
||||||
|
declared in ``knowledge/release/release_groups/<group>.yaml``.
|
||||||
|
- **SHITTY**: unknown group, best-effort matching against the global
|
||||||
|
knowledge sets, with a 0-100 confidence score.
|
||||||
|
- **PATH OF PAIN**: score below threshold OR critical chunks missing —
|
||||||
|
signaled to the caller, who decides whether to involve the LLM/user.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from .schema import GroupSchema, SchemaChunk
|
||||||
|
from .tokens import Token, TokenRole
|
||||||
|
|
||||||
|
__all__ = ["GroupSchema", "SchemaChunk", "Token", "TokenRole"]
|
||||||
@@ -0,0 +1,762 @@
|
|||||||
|
"""Annotate-based pipeline.
|
||||||
|
|
||||||
|
Three stages:
|
||||||
|
|
||||||
|
1. :func:`tokenize` — release name → ``list[Token]`` (all UNKNOWN), plus
|
||||||
|
a separately-returned site tag (e.g. ``[YTS.MX]``) that is never
|
||||||
|
tokenized.
|
||||||
|
2. :func:`annotate` — promote each token's :class:`TokenRole` using the
|
||||||
|
injected knowledge base. Two sub-passes:
|
||||||
|
|
||||||
|
a. **Structural** (schema-driven, EASY only). Detects the group at
|
||||||
|
the right end, looks up its :class:`GroupSchema`, then matches
|
||||||
|
the schema's chunk sequence against the token stream. Between
|
||||||
|
two structural chunks, any number of unmatched tokens may
|
||||||
|
remain — they are left UNKNOWN for the enricher pass to handle.
|
||||||
|
b. **Enrichers** (non-positional). Walks UNKNOWN tokens and tags
|
||||||
|
audio / video-meta / edition / language roles. Multi-token
|
||||||
|
sequences (``DTS.HD.MA``, ``DV.HDR10``, ``DIRECTORS.CUT``) are
|
||||||
|
matched first, single tokens after.
|
||||||
|
|
||||||
|
3. :func:`assemble` — fold annotated tokens into a
|
||||||
|
:class:`~alfred.domain.release.value_objects.ParsedRelease`-compatible
|
||||||
|
dict.
|
||||||
|
|
||||||
|
The pipeline is **pure**: no I/O, no TMDB, no probe. All knowledge
|
||||||
|
arrives through ``kb: ReleaseKnowledge``.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from ..ports.knowledge import ReleaseKnowledge
|
||||||
|
from alfred.domain.releases_TO_CHECK.value_objects_old_question_mark import MediaTypeToken
|
||||||
|
from .schema import GroupSchema
|
||||||
|
from .tokens import Token, TokenRole
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Stage 1 — tokenize
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def strip_site_tag(name: str) -> tuple[str, str | None]:
|
||||||
|
"""Split off a ``[site.tag]`` prefix or suffix.
|
||||||
|
|
||||||
|
Returns ``(clean_name, tag)``. If no tag is found, returns
|
||||||
|
``(name.strip(), None)``.
|
||||||
|
"""
|
||||||
|
s = name.strip()
|
||||||
|
|
||||||
|
if s.startswith("["):
|
||||||
|
close = s.find("]")
|
||||||
|
if close != -1:
|
||||||
|
tag = s[1:close].strip()
|
||||||
|
remainder = s[close + 1 :].strip()
|
||||||
|
if tag and remainder:
|
||||||
|
return remainder, tag
|
||||||
|
|
||||||
|
if s.endswith("]"):
|
||||||
|
open_bracket = s.rfind("[")
|
||||||
|
if open_bracket != -1:
|
||||||
|
tag = s[open_bracket + 1 : -1].strip()
|
||||||
|
remainder = s[:open_bracket].strip()
|
||||||
|
if tag and remainder:
|
||||||
|
return remainder, tag
|
||||||
|
|
||||||
|
return s, None
|
||||||
|
|
||||||
|
|
||||||
|
def tokenize(name: str, kb: ReleaseKnowledge) -> tuple[list[Token], str | None]:
|
||||||
|
"""Split ``name`` into tokens after stripping any site tag.
|
||||||
|
|
||||||
|
String-ops style: replace every configured separator with a single
|
||||||
|
NUL byte then split. NUL cannot legally appear in a release name, so
|
||||||
|
it's a safe sentinel.
|
||||||
|
"""
|
||||||
|
clean, site_tag = strip_site_tag(name)
|
||||||
|
|
||||||
|
DELIM = "\x00"
|
||||||
|
buf = clean
|
||||||
|
for sep in kb.separators:
|
||||||
|
if sep != DELIM:
|
||||||
|
buf = buf.replace(sep, DELIM)
|
||||||
|
|
||||||
|
pieces = [p for p in buf.split(DELIM) if p]
|
||||||
|
tokens = [Token(text=p, index=i) for i, p in enumerate(pieces)]
|
||||||
|
return tokens, site_tag
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Helpers shared across passes
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_season_episode(text: str) -> tuple[int, int | None, int | None] | None:
|
||||||
|
"""Parse a single token as ``SxxExx`` / ``SxxExxExx`` / ``Sxx`` /
|
||||||
|
``Sxx-yy`` (season range) / ``NxNN``.
|
||||||
|
|
||||||
|
Returns ``(season, episode, episode_end)`` or ``None`` if the token
|
||||||
|
is not a season/episode marker. For ``Sxx-yy``, returns the first
|
||||||
|
season with no episode info — the caller is expected to detect the
|
||||||
|
range form and promote ``media_type`` to ``tv_complete`` separately.
|
||||||
|
"""
|
||||||
|
upper = text.upper()
|
||||||
|
|
||||||
|
# SxxExx form (and Sxx, Sxx-yy)
|
||||||
|
if len(upper) >= 3 and upper[0] == "S" and upper[1:3].isdigit():
|
||||||
|
season = int(upper[1:3])
|
||||||
|
rest = upper[3:]
|
||||||
|
|
||||||
|
if not rest:
|
||||||
|
return season, None, None
|
||||||
|
|
||||||
|
# Sxx-yy season-range form: capture the first season, treat as a
|
||||||
|
# complete-series marker (no episode info).
|
||||||
|
if (
|
||||||
|
len(rest) == 3
|
||||||
|
and rest[0] == "-"
|
||||||
|
and rest[1:3].isdigit()
|
||||||
|
):
|
||||||
|
return season, None, None
|
||||||
|
|
||||||
|
episodes: list[int] = []
|
||||||
|
while rest.startswith("E") and len(rest) >= 3 and rest[1:3].isdigit():
|
||||||
|
episodes.append(int(rest[1:3]))
|
||||||
|
rest = rest[3:]
|
||||||
|
|
||||||
|
if not episodes:
|
||||||
|
return None
|
||||||
|
# For chained multi-episode markers (E09E10E11), the range is the
|
||||||
|
# first → last episode. Intermediate values are implied.
|
||||||
|
return season, episodes[0], episodes[-1] if len(episodes) >= 2 else None
|
||||||
|
|
||||||
|
# NxNN form
|
||||||
|
if "X" in upper:
|
||||||
|
parts = upper.split("X")
|
||||||
|
if len(parts) >= 2 and all(p.isdigit() and p for p in parts):
|
||||||
|
season = int(parts[0])
|
||||||
|
episode = int(parts[1])
|
||||||
|
episode_end = int(parts[2]) if len(parts) >= 3 else None
|
||||||
|
return season, episode, episode_end
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _is_year(text: str) -> bool:
|
||||||
|
"""Return True if ``text`` is a 4-digit year in [1900, 2099]."""
|
||||||
|
return len(text) == 4 and text.isdigit() and 1900 <= int(text) <= 2099
|
||||||
|
|
||||||
|
|
||||||
|
def _split_codec_group(text: str, kb: ReleaseKnowledge) -> tuple[str, str] | None:
|
||||||
|
"""Split a ``codec-GROUP`` token into ``(codec, group)`` if it fits.
|
||||||
|
|
||||||
|
Returns ``None`` if the token doesn't match the ``codec-GROUP``
|
||||||
|
shape. Handles the empty-group case (``x265-``) as ``(codec, "")``.
|
||||||
|
"""
|
||||||
|
if "-" not in text:
|
||||||
|
return None
|
||||||
|
head, _, tail = text.rpartition("-")
|
||||||
|
if head.lower() in kb.codecs:
|
||||||
|
return head, tail
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _match_role(text: str, role: TokenRole, kb: ReleaseKnowledge) -> TokenRole | None:
|
||||||
|
"""Return ``role`` if ``text`` matches it under ``kb``, else ``None``."""
|
||||||
|
lower = text.lower()
|
||||||
|
|
||||||
|
if role is TokenRole.YEAR:
|
||||||
|
return TokenRole.YEAR if _is_year(text) else None
|
||||||
|
|
||||||
|
if role is TokenRole.SEASON_EPISODE:
|
||||||
|
return (
|
||||||
|
TokenRole.SEASON_EPISODE
|
||||||
|
if _parse_season_episode(text) is not None
|
||||||
|
else None
|
||||||
|
)
|
||||||
|
|
||||||
|
if role is TokenRole.RESOLUTION:
|
||||||
|
return TokenRole.RESOLUTION if lower in kb.resolutions else None
|
||||||
|
|
||||||
|
if role is TokenRole.SOURCE:
|
||||||
|
return TokenRole.SOURCE if lower in kb.sources else None
|
||||||
|
|
||||||
|
if role is TokenRole.CODEC:
|
||||||
|
return TokenRole.CODEC if lower in kb.codecs else None
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Stage 2a — group detection
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def _detect_group(tokens: list[Token], kb: ReleaseKnowledge) -> tuple[str, int | None]:
|
||||||
|
"""Identify the release group by walking tokens right-to-left.
|
||||||
|
|
||||||
|
Returns ``(group_name, token_index_carrying_group)``. ``index`` is
|
||||||
|
``None`` when the group is absent (no trailing ``-`` in the stream).
|
||||||
|
"""
|
||||||
|
# Priority 1: codec-GROUP shape (clearest signal).
|
||||||
|
for tok in reversed(tokens):
|
||||||
|
split = _split_codec_group(tok.text, kb)
|
||||||
|
if split is not None:
|
||||||
|
_, group = split
|
||||||
|
return (group or "UNKNOWN"), tok.index
|
||||||
|
|
||||||
|
# Priority 2: rightmost dash, excluding dashed sources (Web-DL, etc.).
|
||||||
|
for tok in reversed(tokens):
|
||||||
|
if "-" not in tok.text:
|
||||||
|
continue
|
||||||
|
head, _, tail = tok.text.rpartition("-")
|
||||||
|
if (
|
||||||
|
head.lower() in kb.sources
|
||||||
|
or tok.text.lower().replace("-", "") in kb.sources
|
||||||
|
):
|
||||||
|
continue
|
||||||
|
if tail:
|
||||||
|
return tail, tok.index
|
||||||
|
|
||||||
|
return "UNKNOWN", None
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Stage 2b — structural annotation (schema-driven)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def _annotate_structural(
|
||||||
|
tokens: list[Token],
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
schema: GroupSchema,
|
||||||
|
group_token_index: int,
|
||||||
|
) -> list[Token] | None:
|
||||||
|
"""Annotate structural tokens following a known group schema.
|
||||||
|
|
||||||
|
Walks the schema's chunks against the body (tokens up to the group
|
||||||
|
token). For each chunk, scans forward in the body for a matching
|
||||||
|
token — tokens passed over without match are left UNKNOWN (the
|
||||||
|
enricher pass will handle them).
|
||||||
|
|
||||||
|
Returns ``None`` if any mandatory chunk fails to find a match.
|
||||||
|
"""
|
||||||
|
result = list(tokens)
|
||||||
|
|
||||||
|
# The codec-GROUP token carries CODEC + GROUP. Split it now so the
|
||||||
|
# schema walk knows the codec is "pre-consumed" at the end.
|
||||||
|
group_token = result[group_token_index]
|
||||||
|
cg_split = _split_codec_group(group_token.text, kb)
|
||||||
|
codec_pre_consumed = False
|
||||||
|
if cg_split is not None:
|
||||||
|
codec, group = cg_split
|
||||||
|
result[group_token_index] = group_token.with_role(
|
||||||
|
TokenRole.CODEC, codec=codec, group=group or "UNKNOWN"
|
||||||
|
)
|
||||||
|
codec_pre_consumed = True
|
||||||
|
else:
|
||||||
|
head, _, tail = group_token.text.rpartition("-")
|
||||||
|
result[group_token_index] = group_token.with_role(
|
||||||
|
TokenRole.GROUP, group=tail or "UNKNOWN", prefix=head
|
||||||
|
)
|
||||||
|
|
||||||
|
body_end = group_token_index # exclusive
|
||||||
|
tok_idx = 0
|
||||||
|
chunk_idx = 0
|
||||||
|
|
||||||
|
# 1) TITLE — leftmost contiguous tokens up to the first structural
|
||||||
|
# boundary. Title is special because it can be multi-token.
|
||||||
|
while (
|
||||||
|
chunk_idx < len(schema.chunks)
|
||||||
|
and schema.chunks[chunk_idx].role is TokenRole.TITLE
|
||||||
|
):
|
||||||
|
title_end = _find_title_end(result, body_end, kb)
|
||||||
|
for i in range(tok_idx, title_end):
|
||||||
|
result[i] = result[i].with_role(TokenRole.TITLE)
|
||||||
|
tok_idx = title_end
|
||||||
|
chunk_idx += 1
|
||||||
|
|
||||||
|
# 2) Remaining structural chunks. For each, scan forward in the body
|
||||||
|
# for a matching token; tokens passed over remain UNKNOWN.
|
||||||
|
for chunk in schema.chunks[chunk_idx:]:
|
||||||
|
if chunk.role is TokenRole.GROUP:
|
||||||
|
continue
|
||||||
|
if chunk.role is TokenRole.CODEC and codec_pre_consumed:
|
||||||
|
continue
|
||||||
|
|
||||||
|
match_idx = _find_chunk(result, tok_idx, body_end, chunk.role, kb)
|
||||||
|
if match_idx is None:
|
||||||
|
if chunk.optional:
|
||||||
|
continue
|
||||||
|
return None
|
||||||
|
|
||||||
|
result[match_idx] = result[match_idx].with_role(chunk.role)
|
||||||
|
tok_idx = match_idx + 1
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def _find_title_end(
|
||||||
|
tokens: list[Token], body_end: int, kb: ReleaseKnowledge
|
||||||
|
) -> int:
|
||||||
|
"""Return the exclusive index where the title ends.
|
||||||
|
|
||||||
|
The title is the leftmost run of tokens whose text does not match
|
||||||
|
any structural role (year, season/episode, resolution, source,
|
||||||
|
codec). Enricher tokens (audio, HDR, language) are *not* boundaries
|
||||||
|
because they can appear in the middle of the structural sequence;
|
||||||
|
however, in canonical scene names they don't appear inside the title
|
||||||
|
itself, so this heuristic holds in practice.
|
||||||
|
"""
|
||||||
|
for i in range(body_end):
|
||||||
|
text = tokens[i].text
|
||||||
|
if _parse_season_episode(text) is not None:
|
||||||
|
return i
|
||||||
|
if _is_year(text):
|
||||||
|
return i
|
||||||
|
lower = text.lower()
|
||||||
|
if lower in kb.resolutions:
|
||||||
|
return i
|
||||||
|
if lower in kb.sources:
|
||||||
|
return i
|
||||||
|
if lower in kb.codecs:
|
||||||
|
return i
|
||||||
|
# codec-GROUP token (e.g. "x265-KONTRAST") or dashed source (Web-DL).
|
||||||
|
if "-" in text:
|
||||||
|
head, _, _ = text.rpartition("-")
|
||||||
|
if (
|
||||||
|
head.lower() in kb.codecs
|
||||||
|
or head.lower() in kb.sources
|
||||||
|
or text.lower().replace("-", "") in kb.sources
|
||||||
|
):
|
||||||
|
return i
|
||||||
|
return body_end
|
||||||
|
|
||||||
|
|
||||||
|
def _find_chunk(
|
||||||
|
tokens: list[Token],
|
||||||
|
start: int,
|
||||||
|
end: int,
|
||||||
|
role: TokenRole,
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
) -> int | None:
|
||||||
|
"""Return the first index in ``[start, end)`` whose token matches ``role``.
|
||||||
|
|
||||||
|
Returns ``None`` if no token in the range matches. Tokens already
|
||||||
|
annotated (non-UNKNOWN) are skipped — they belong to another chunk.
|
||||||
|
"""
|
||||||
|
for i in range(start, end):
|
||||||
|
if tokens[i].role is not TokenRole.UNKNOWN:
|
||||||
|
continue
|
||||||
|
if _match_role(tokens[i].text, role, kb) is not None:
|
||||||
|
return i
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Stage 2b' — SHITTY annotation (schema-less heuristic)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def _annotate_shitty(
|
||||||
|
tokens: list[Token],
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
group_index: int | None,
|
||||||
|
) -> list[Token]:
|
||||||
|
"""Schema-less, dictionary-driven annotation.
|
||||||
|
|
||||||
|
SHITTY's job is narrow: for releases that *look* like scene names
|
||||||
|
but don't have a registered group schema, tag every token whose text
|
||||||
|
falls into a known YAML bucket (resolutions, codecs, sources, …).
|
||||||
|
Anything we can't classify stays UNKNOWN. The leftmost run of
|
||||||
|
UNKNOWN tokens becomes the title. Done.
|
||||||
|
|
||||||
|
Anything that requires more reasoning (parenthesized tech blocks,
|
||||||
|
bare-dashed title fragments, year-disguised slug suffixes, …) is
|
||||||
|
PATH OF PAIN territory and stays out of here on purpose.
|
||||||
|
"""
|
||||||
|
result = list(tokens)
|
||||||
|
|
||||||
|
# 1) Group token — split codec-GROUP or tag GROUP. Same logic as EASY.
|
||||||
|
if group_index is not None:
|
||||||
|
gt = result[group_index]
|
||||||
|
cg_split = _split_codec_group(gt.text, kb)
|
||||||
|
if cg_split is not None:
|
||||||
|
codec, group = cg_split
|
||||||
|
result[group_index] = gt.with_role(
|
||||||
|
TokenRole.CODEC, codec=codec, group=group or "UNKNOWN"
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
_, _, tail = gt.text.rpartition("-")
|
||||||
|
result[group_index] = gt.with_role(
|
||||||
|
TokenRole.GROUP, group=tail or "UNKNOWN"
|
||||||
|
)
|
||||||
|
|
||||||
|
# 2) Enrichers (audio / video-meta / edition / language).
|
||||||
|
result = _annotate_enrichers(result, kb)
|
||||||
|
|
||||||
|
# 3) Single pass: tag each UNKNOWN token by looking it up in the kb
|
||||||
|
# buckets. First match wins per token, first occurrence wins per
|
||||||
|
# role (we don't overwrite an already-tagged role).
|
||||||
|
matchers: list[tuple[TokenRole, callable]] = [
|
||||||
|
(TokenRole.SEASON_EPISODE, lambda t: _parse_season_episode(t) is not None),
|
||||||
|
(TokenRole.YEAR, _is_year),
|
||||||
|
(TokenRole.RESOLUTION, lambda t: t.lower() in kb.resolutions),
|
||||||
|
(TokenRole.DISTRIBUTOR, lambda t: t.upper() in kb.distributors),
|
||||||
|
(TokenRole.SOURCE, lambda t: t.lower() in kb.sources),
|
||||||
|
(TokenRole.CODEC, lambda t: t.lower() in kb.codecs),
|
||||||
|
]
|
||||||
|
seen: set[TokenRole] = set()
|
||||||
|
|
||||||
|
for i, tok in enumerate(result):
|
||||||
|
if tok.role is not TokenRole.UNKNOWN:
|
||||||
|
continue
|
||||||
|
for role, matches in matchers:
|
||||||
|
if role in seen:
|
||||||
|
continue
|
||||||
|
if matches(tok.text):
|
||||||
|
result[i] = tok.with_role(role)
|
||||||
|
seen.add(role)
|
||||||
|
break
|
||||||
|
|
||||||
|
# 4) Title = leftmost contiguous UNKNOWN tokens.
|
||||||
|
for i, tok in enumerate(result):
|
||||||
|
if tok.role is not TokenRole.UNKNOWN:
|
||||||
|
break
|
||||||
|
result[i] = tok.with_role(TokenRole.TITLE)
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Stage 2c — enricher pass (non-positional roles)
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def _annotate_enrichers(tokens: list[Token], kb: ReleaseKnowledge) -> list[Token]:
|
||||||
|
"""Tag the remaining UNKNOWN tokens with non-positional roles.
|
||||||
|
|
||||||
|
Multi-token sequences are matched first (so ``DTS.HD.MA`` wins over
|
||||||
|
a single-token ``DTS``). For each sequence match, the first token
|
||||||
|
receives the role + ``extra["sequence"]`` (the canonical joined
|
||||||
|
value), and the trailing members are marked with the same role +
|
||||||
|
``extra["sequence_member"]=True`` so :func:`assemble` extracts the
|
||||||
|
value only from the primary.
|
||||||
|
"""
|
||||||
|
result = list(tokens)
|
||||||
|
|
||||||
|
# Multi-token sequences first.
|
||||||
|
_apply_sequences(
|
||||||
|
result, kb.audio.get("sequences", []), "codec", TokenRole.AUDIO_CODEC
|
||||||
|
)
|
||||||
|
_apply_sequences(
|
||||||
|
result, kb.video_meta.get("sequences", []), "hdr", TokenRole.HDR
|
||||||
|
)
|
||||||
|
_apply_sequences(
|
||||||
|
result, kb.editions.get("sequences", []), "edition", TokenRole.EDITION
|
||||||
|
)
|
||||||
|
|
||||||
|
# Single tokens.
|
||||||
|
known_audio_codecs = {c.upper() for c in kb.audio.get("codecs", [])}
|
||||||
|
known_audio_channels = set(kb.audio.get("channels", []))
|
||||||
|
known_hdr = {h.upper() for h in kb.video_meta.get("hdr", [])} | kb.hdr_extra
|
||||||
|
known_bit_depth = {d.lower() for d in kb.video_meta.get("bit_depth", [])}
|
||||||
|
known_editions = {t.upper() for t in kb.editions.get("tokens", [])}
|
||||||
|
|
||||||
|
# Channel layouts like "5.1" are tokenized as two tokens ("5", "1")
|
||||||
|
# because "." is a separator. Detect consecutive pairs whose joined
|
||||||
|
# value (without any trailing "-GROUP") is in the channel set.
|
||||||
|
_detect_channel_pairs(result, known_audio_channels)
|
||||||
|
|
||||||
|
for i, tok in enumerate(result):
|
||||||
|
if tok.role is not TokenRole.UNKNOWN:
|
||||||
|
continue
|
||||||
|
text = tok.text
|
||||||
|
upper = text.upper()
|
||||||
|
lower = text.lower()
|
||||||
|
|
||||||
|
if upper in known_audio_codecs:
|
||||||
|
result[i] = tok.with_role(TokenRole.AUDIO_CODEC)
|
||||||
|
continue
|
||||||
|
if text in known_audio_channels:
|
||||||
|
result[i] = tok.with_role(TokenRole.AUDIO_CHANNELS)
|
||||||
|
continue
|
||||||
|
if upper in known_hdr:
|
||||||
|
result[i] = tok.with_role(TokenRole.HDR)
|
||||||
|
continue
|
||||||
|
if lower in known_bit_depth:
|
||||||
|
result[i] = tok.with_role(TokenRole.BIT_DEPTH)
|
||||||
|
continue
|
||||||
|
if upper in known_editions:
|
||||||
|
result[i] = tok.with_role(TokenRole.EDITION)
|
||||||
|
continue
|
||||||
|
if upper in kb.language_tokens:
|
||||||
|
result[i] = tok.with_role(TokenRole.LANGUAGE)
|
||||||
|
continue
|
||||||
|
if upper in kb.distributors:
|
||||||
|
result[i] = tok.with_role(TokenRole.DISTRIBUTOR)
|
||||||
|
continue
|
||||||
|
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def _apply_sequences(
|
||||||
|
tokens: list[Token],
|
||||||
|
sequences: list[dict],
|
||||||
|
value_key: str,
|
||||||
|
role: TokenRole,
|
||||||
|
) -> None:
|
||||||
|
"""Mark the first occurrence of each sequence in place.
|
||||||
|
|
||||||
|
Mutates ``tokens`` (replacing entries with new role-tagged Token
|
||||||
|
instances). Sequences in the YAML must be ordered most-specific
|
||||||
|
first; the first match wins per starting position.
|
||||||
|
"""
|
||||||
|
if not sequences:
|
||||||
|
return
|
||||||
|
|
||||||
|
upper_texts = [t.text.upper() for t in tokens]
|
||||||
|
consumed: set[int] = set()
|
||||||
|
|
||||||
|
for seq in sequences:
|
||||||
|
seq_upper = [s.upper() for s in seq["tokens"]]
|
||||||
|
n = len(seq_upper)
|
||||||
|
for start in range(len(tokens) - n + 1):
|
||||||
|
if any(idx in consumed for idx in range(start, start + n)):
|
||||||
|
continue
|
||||||
|
if any(
|
||||||
|
tokens[start + k].role is not TokenRole.UNKNOWN for k in range(n)
|
||||||
|
):
|
||||||
|
continue
|
||||||
|
if upper_texts[start : start + n] == seq_upper:
|
||||||
|
tokens[start] = tokens[start].with_role(
|
||||||
|
role, sequence=seq[value_key]
|
||||||
|
)
|
||||||
|
for k in range(1, n):
|
||||||
|
tokens[start + k] = tokens[start + k].with_role(
|
||||||
|
role, sequence_member="True"
|
||||||
|
)
|
||||||
|
consumed.update(range(start, start + n))
|
||||||
|
|
||||||
|
|
||||||
|
def _detect_channel_pairs(
|
||||||
|
tokens: list[Token], known_channels: set[str]
|
||||||
|
) -> None:
|
||||||
|
"""Spot two consecutive numeric tokens that form a channel layout.
|
||||||
|
|
||||||
|
Example: ``["5", "1-KTH"]`` → joined ``"5.1"`` (after stripping the
|
||||||
|
``-GROUP`` suffix on the second). The second token may be the trailing
|
||||||
|
codec-GROUP token, in which case it's already tagged CODEC and we
|
||||||
|
skip — we'd corrupt its role.
|
||||||
|
"""
|
||||||
|
for i in range(len(tokens) - 1):
|
||||||
|
first = tokens[i]
|
||||||
|
second = tokens[i + 1]
|
||||||
|
if first.role is not TokenRole.UNKNOWN:
|
||||||
|
continue
|
||||||
|
# Strip a "-GROUP" suffix on the second token before joining.
|
||||||
|
second_text = second.text.split("-")[0]
|
||||||
|
candidate = f"{first.text}.{second_text}"
|
||||||
|
if candidate not in known_channels:
|
||||||
|
continue
|
||||||
|
# Only tag the first token (carries the channel value). The
|
||||||
|
# second token may legitimately remain UNKNOWN (or be the
|
||||||
|
# codec-GROUP token, already tagged CODEC).
|
||||||
|
tokens[i] = first.with_role(
|
||||||
|
TokenRole.AUDIO_CHANNELS, sequence=candidate
|
||||||
|
)
|
||||||
|
if second.role is TokenRole.UNKNOWN:
|
||||||
|
tokens[i + 1] = second.with_role(
|
||||||
|
TokenRole.AUDIO_CHANNELS, sequence_member="True"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Stage 2 entry point
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def annotate(tokens: list[Token], kb: ReleaseKnowledge) -> list[Token]:
|
||||||
|
"""Annotate token roles.
|
||||||
|
|
||||||
|
Dispatch:
|
||||||
|
|
||||||
|
* If a group is detected AND has a known schema, run the EASY
|
||||||
|
structural walk. If the schema walk aborts on a mandatory chunk
|
||||||
|
mismatch, fall through to SHITTY (the heuristic still does better
|
||||||
|
than giving up).
|
||||||
|
* Otherwise run SHITTY — schema-less, best-effort, never aborts.
|
||||||
|
|
||||||
|
The enricher pass runs in both cases. The pipeline always returns a
|
||||||
|
populated token list; downstream callers don't need to distinguish
|
||||||
|
EASY vs SHITTY at this layer (the parse_path is decided in the
|
||||||
|
service based on whether a schema matched).
|
||||||
|
"""
|
||||||
|
group_name, group_index = _detect_group(tokens, kb)
|
||||||
|
|
||||||
|
schema = kb.group_schema(group_name) if group_index is not None else None
|
||||||
|
if schema is not None and group_index is not None:
|
||||||
|
structural = _annotate_structural(tokens, kb, schema, group_index)
|
||||||
|
if structural is not None:
|
||||||
|
return _annotate_enrichers(structural, kb)
|
||||||
|
|
||||||
|
# SHITTY fallback — heuristic positional pass. ``_annotate_shitty``
|
||||||
|
# runs its own enricher pass internally (it has to, so the title
|
||||||
|
# scan can skip enricher-tagged tokens).
|
||||||
|
return _annotate_shitty(tokens, kb, group_index)
|
||||||
|
|
||||||
|
|
||||||
|
def has_known_schema(tokens: list[Token], kb: ReleaseKnowledge) -> bool:
|
||||||
|
"""Return True if ``tokens`` would take the EASY path in :func:`annotate`."""
|
||||||
|
group_name, group_index = _detect_group(tokens, kb)
|
||||||
|
if group_index is None:
|
||||||
|
return False
|
||||||
|
return kb.group_schema(group_name) is not None
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Stage 3 — assemble
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|
||||||
|
def assemble(
|
||||||
|
annotated: list[Token],
|
||||||
|
site_tag: str | None,
|
||||||
|
raw_name: str,
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
) -> dict:
|
||||||
|
"""Fold annotated tokens into a ``ParsedRelease``-compatible dict.
|
||||||
|
|
||||||
|
Returns a dict (not a ``ParsedRelease`` instance) so the caller can
|
||||||
|
layer in additional fields (``parse_path``, ``raw``, …) before
|
||||||
|
instantiation.
|
||||||
|
"""
|
||||||
|
# Pure-punctuation tokens (e.g. a stray "-" left by ` - ` separators in
|
||||||
|
# human-friendly release names) carry no title content and would leak
|
||||||
|
# into the joined title as ``"Show.-.Episode"``. Drop them here.
|
||||||
|
title_parts = [
|
||||||
|
t.text
|
||||||
|
for t in annotated
|
||||||
|
if t.role is TokenRole.TITLE and any(c.isalnum() for c in t.text)
|
||||||
|
]
|
||||||
|
title = ".".join(title_parts) if title_parts else (
|
||||||
|
annotated[0].text if annotated else raw_name
|
||||||
|
)
|
||||||
|
|
||||||
|
year: int | None = None
|
||||||
|
season: int | None = None
|
||||||
|
episode: int | None = None
|
||||||
|
episode_end: int | None = None
|
||||||
|
quality: str | None = None
|
||||||
|
source: str | None = None
|
||||||
|
codec: str | None = None
|
||||||
|
group = "UNKNOWN"
|
||||||
|
audio_codec: str | None = None
|
||||||
|
audio_channels: str | None = None
|
||||||
|
bit_depth: str | None = None
|
||||||
|
hdr_format: str | None = None
|
||||||
|
edition: str | None = None
|
||||||
|
distributor: str | None = None
|
||||||
|
languages: list[str] = []
|
||||||
|
is_season_range = False
|
||||||
|
|
||||||
|
for tok in annotated:
|
||||||
|
# Skip non-primary members of a multi-token sequence.
|
||||||
|
if tok.extra.get("sequence_member") == "True":
|
||||||
|
continue
|
||||||
|
|
||||||
|
role = tok.role
|
||||||
|
if role is TokenRole.YEAR:
|
||||||
|
year = int(tok.text)
|
||||||
|
elif role is TokenRole.SEASON_EPISODE:
|
||||||
|
parsed = _parse_season_episode(tok.text)
|
||||||
|
if parsed is not None:
|
||||||
|
season, episode, episode_end = parsed
|
||||||
|
# Detect Sxx-yy range form to flag it as a multi-season pack.
|
||||||
|
upper = tok.text.upper()
|
||||||
|
if (
|
||||||
|
len(upper) == 6
|
||||||
|
and upper[0] == "S"
|
||||||
|
and upper[1:3].isdigit()
|
||||||
|
and upper[3] == "-"
|
||||||
|
and upper[4:6].isdigit()
|
||||||
|
):
|
||||||
|
is_season_range = True
|
||||||
|
elif role is TokenRole.RESOLUTION:
|
||||||
|
quality = tok.text
|
||||||
|
elif role is TokenRole.SOURCE:
|
||||||
|
source = tok.text
|
||||||
|
elif role is TokenRole.CODEC:
|
||||||
|
codec = tok.extra.get("codec", tok.text)
|
||||||
|
if "group" in tok.extra:
|
||||||
|
group = tok.extra["group"] or "UNKNOWN"
|
||||||
|
elif role is TokenRole.GROUP:
|
||||||
|
group = tok.extra.get("group", tok.text) or "UNKNOWN"
|
||||||
|
elif role is TokenRole.AUDIO_CODEC:
|
||||||
|
if audio_codec is None:
|
||||||
|
audio_codec = tok.extra.get("sequence", tok.text)
|
||||||
|
elif role is TokenRole.AUDIO_CHANNELS:
|
||||||
|
if audio_channels is None:
|
||||||
|
audio_channels = tok.extra.get("sequence", tok.text)
|
||||||
|
elif role is TokenRole.BIT_DEPTH:
|
||||||
|
if bit_depth is None:
|
||||||
|
bit_depth = tok.text.lower()
|
||||||
|
elif role is TokenRole.HDR:
|
||||||
|
if hdr_format is None:
|
||||||
|
hdr_format = tok.extra.get("sequence", tok.text.upper())
|
||||||
|
elif role is TokenRole.EDITION:
|
||||||
|
if edition is None:
|
||||||
|
edition = tok.extra.get("sequence", tok.text.upper())
|
||||||
|
elif role is TokenRole.LANGUAGE:
|
||||||
|
languages.append(tok.text.upper())
|
||||||
|
elif role is TokenRole.DISTRIBUTOR:
|
||||||
|
if distributor is None:
|
||||||
|
distributor = tok.text.upper()
|
||||||
|
|
||||||
|
# Media type heuristic. Doc/concert/integrale tokens win over the
|
||||||
|
# generic tech-based fallback. We look across all tokens (not just
|
||||||
|
# annotated ones) because these markers may be tagged UNKNOWN by the
|
||||||
|
# structural pass — only the assemble step cares about them.
|
||||||
|
upper_tokens = {tok.text.upper() for tok in annotated}
|
||||||
|
doc_tokens = {t.upper() for t in kb.media_type_tokens.get("doc", [])}
|
||||||
|
concert_tokens = {t.upper() for t in kb.media_type_tokens.get("concert", [])}
|
||||||
|
integrale_tokens = {t.upper() for t in kb.media_type_tokens.get("integrale", [])}
|
||||||
|
|
||||||
|
if upper_tokens & doc_tokens:
|
||||||
|
media_type = MediaTypeToken.DOCUMENTARY
|
||||||
|
elif upper_tokens & concert_tokens:
|
||||||
|
media_type = MediaTypeToken.CONCERT
|
||||||
|
elif is_season_range:
|
||||||
|
media_type = MediaTypeToken.TV_COMPLETE
|
||||||
|
elif (
|
||||||
|
edition in {"COMPLETE", "INTEGRALE", "COLLECTION"}
|
||||||
|
or upper_tokens & integrale_tokens
|
||||||
|
) and season is None:
|
||||||
|
media_type = MediaTypeToken.TV_COMPLETE
|
||||||
|
elif season is not None:
|
||||||
|
media_type = MediaTypeToken.TV_SHOW
|
||||||
|
elif any((quality, source, codec, year)):
|
||||||
|
media_type = MediaTypeToken.MOVIE
|
||||||
|
else:
|
||||||
|
media_type = MediaTypeToken.UNKNOWN
|
||||||
|
|
||||||
|
return {
|
||||||
|
"title": title,
|
||||||
|
"title_sanitized": kb.sanitize_for_fs(title),
|
||||||
|
"year": year,
|
||||||
|
"season": season,
|
||||||
|
"episode": episode,
|
||||||
|
"episode_end": episode_end,
|
||||||
|
"quality": quality,
|
||||||
|
"source": source,
|
||||||
|
"codec": codec,
|
||||||
|
"group": group,
|
||||||
|
"media_type": media_type,
|
||||||
|
"site_tag": site_tag,
|
||||||
|
"languages": tuple(languages),
|
||||||
|
"audio_codec": audio_codec,
|
||||||
|
"audio_channels": audio_channels,
|
||||||
|
"bit_depth": bit_depth,
|
||||||
|
"hdr_format": hdr_format,
|
||||||
|
"edition": edition,
|
||||||
|
"distributor": distributor,
|
||||||
|
}
|
||||||
@@ -0,0 +1,47 @@
|
|||||||
|
"""Group schema value objects.
|
||||||
|
|
||||||
|
A :class:`GroupSchema` describes the canonical chunk layout of releases
|
||||||
|
from a known group (KONTRAST, RARBG, ELiTE, …). It is the EASY-road
|
||||||
|
contract: when a release ends in ``-<GROUP>`` and we know the group,
|
||||||
|
the annotator walks the schema instead of running the heuristic SHITTY
|
||||||
|
matchers.
|
||||||
|
|
||||||
|
Schemas are loaded from ``knowledge/release/release_groups/<group>.yaml``
|
||||||
|
by an infrastructure adapter and surfaced via the
|
||||||
|
:class:`~alfred.domain.release.ports.knowledge.ReleaseKnowledge` port.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
|
||||||
|
from .tokens import TokenRole
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class SchemaChunk:
|
||||||
|
"""One entry in a group's chunk order.
|
||||||
|
|
||||||
|
``role`` is the :class:`TokenRole` the chunk maps to. ``optional``
|
||||||
|
is True for chunks that may be absent (e.g. ``year`` on TV releases,
|
||||||
|
``source`` on bare ELiTE TV releases).
|
||||||
|
"""
|
||||||
|
|
||||||
|
role: TokenRole
|
||||||
|
optional: bool = False
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class GroupSchema:
|
||||||
|
"""Schema for a known release group.
|
||||||
|
|
||||||
|
``chunks`` is the left-to-right canonical order. The annotator walks
|
||||||
|
tokens and chunks in lockstep: an optional chunk that doesn't match
|
||||||
|
the current token is skipped (the chunk index advances, the token
|
||||||
|
index stays), a mandatory chunk that doesn't match aborts the EASY
|
||||||
|
path and falls back to SHITTY.
|
||||||
|
"""
|
||||||
|
|
||||||
|
name: str
|
||||||
|
separator: str
|
||||||
|
chunks: tuple[SchemaChunk, ...]
|
||||||
@@ -0,0 +1,139 @@
|
|||||||
|
"""Parse-confidence scoring.
|
||||||
|
|
||||||
|
``parse_release`` returns a :class:`ParseReport` alongside its
|
||||||
|
:class:`ParsedRelease`. The report carries:
|
||||||
|
|
||||||
|
- ``confidence``: integer 0–100 derived from which structural and
|
||||||
|
technical fields got populated, minus a penalty per UNKNOWN token
|
||||||
|
left in the annotated stream.
|
||||||
|
- ``road``: which of the three roads the parse took
|
||||||
|
(:class:`Road.EASY` / :class:`Road.SHITTY` / :class:`Road.PATH_OF_PAIN`).
|
||||||
|
- ``unknown_tokens``: textual residue, useful for diagnostics.
|
||||||
|
- ``missing_critical``: structural fields the score-tally found absent
|
||||||
|
(e.g. ``("year", "media_type")``) — the caller can use this to drive
|
||||||
|
PoP recovery (questions, LLM call).
|
||||||
|
|
||||||
|
All weights, penalties and thresholds come from the injected knowledge
|
||||||
|
base (``kb.scoring``), itself loaded from
|
||||||
|
``alfred/knowledge/release/scoring.yaml``. No magic numbers here.
|
||||||
|
|
||||||
|
The scoring functions are pure — they consume the annotated token list
|
||||||
|
and the resulting :class:`ParsedRelease` and return the report. They are
|
||||||
|
called by ``services.parse_release`` after ``assemble`` has run.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from enum import Enum
|
||||||
|
|
||||||
|
from ..ports.knowledge import ReleaseKnowledge
|
||||||
|
from alfred.domain.releases_TO_CHECK.value_objects_old_question_mark import ParsedRelease
|
||||||
|
from .tokens import Token, TokenRole
|
||||||
|
|
||||||
|
|
||||||
|
class Road(str, Enum):
|
||||||
|
"""How the parser handled a given release name.
|
||||||
|
|
||||||
|
Distinct from :class:`~alfred.domain.release.value_objects.TokenizationRoute`,
|
||||||
|
which records the tokenization route (DIRECT / SANITIZED / AI). Road
|
||||||
|
is about confidence in the *result*, not the *method*.
|
||||||
|
"""
|
||||||
|
|
||||||
|
EASY = "easy" # group schema matched — structural annotation
|
||||||
|
SHITTY = "shitty" # no schema, dict-driven annotation, score ≥ threshold
|
||||||
|
PATH_OF_PAIN = "path_of_pain" # score below threshold, needs help
|
||||||
|
|
||||||
|
|
||||||
|
# Critical structural fields — their absence drives the
|
||||||
|
# ``missing_critical`` list in the report.
|
||||||
|
_CRITICAL_FIELDS: tuple[str, ...] = ("title", "media_type", "year")
|
||||||
|
|
||||||
|
|
||||||
|
def _is_tv_shaped(parsed: ParsedRelease) -> bool:
|
||||||
|
"""Season/episode weights only count for releases that *look* like TV."""
|
||||||
|
return parsed.season is not None
|
||||||
|
|
||||||
|
|
||||||
|
def compute_score(
|
||||||
|
parsed: ParsedRelease,
|
||||||
|
annotated: list[Token],
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
) -> int:
|
||||||
|
"""Compute a 0–100 confidence score for the parse.
|
||||||
|
|
||||||
|
Each populated field contributes its weight from
|
||||||
|
``kb.scoring["weights"]``. Season/episode only count when the parse
|
||||||
|
looks like TV. ``group == "UNKNOWN"`` is treated as absent.
|
||||||
|
|
||||||
|
Then a penalty is subtracted per residual UNKNOWN token in
|
||||||
|
``annotated``, capped at ``penalties["max_unknown_penalty"]``.
|
||||||
|
|
||||||
|
Result is clamped to ``[0, 100]``.
|
||||||
|
"""
|
||||||
|
weights = kb.scoring["weights"]
|
||||||
|
penalties = kb.scoring["penalties"]
|
||||||
|
|
||||||
|
score = 0
|
||||||
|
if parsed.title:
|
||||||
|
score += weights.get("title", 0)
|
||||||
|
if parsed.media_type and parsed.media_type.value != "unknown":
|
||||||
|
score += weights.get("media_type", 0)
|
||||||
|
if parsed.year is not None:
|
||||||
|
score += weights.get("year", 0)
|
||||||
|
if _is_tv_shaped(parsed):
|
||||||
|
if parsed.season is not None:
|
||||||
|
score += weights.get("season", 0)
|
||||||
|
if parsed.episode is not None:
|
||||||
|
score += weights.get("episode", 0)
|
||||||
|
if parsed.quality:
|
||||||
|
score += weights.get("resolution", 0)
|
||||||
|
if parsed.source:
|
||||||
|
score += weights.get("source", 0)
|
||||||
|
if parsed.codec:
|
||||||
|
score += weights.get("codec", 0)
|
||||||
|
if parsed.group and parsed.group != "UNKNOWN":
|
||||||
|
score += weights.get("group", 0)
|
||||||
|
|
||||||
|
unknown_count = sum(1 for t in annotated if t.role is TokenRole.UNKNOWN)
|
||||||
|
raw_penalty = unknown_count * penalties.get("unknown_token", 0)
|
||||||
|
capped_penalty = min(raw_penalty, penalties.get("max_unknown_penalty", 0))
|
||||||
|
score -= capped_penalty
|
||||||
|
|
||||||
|
return max(0, min(100, score))
|
||||||
|
|
||||||
|
|
||||||
|
def collect_unknown_tokens(annotated: list[Token]) -> tuple[str, ...]:
|
||||||
|
"""Return the text of every token still tagged UNKNOWN."""
|
||||||
|
return tuple(t.text for t in annotated if t.role is TokenRole.UNKNOWN)
|
||||||
|
|
||||||
|
|
||||||
|
def collect_missing_critical(parsed: ParsedRelease) -> tuple[str, ...]:
|
||||||
|
"""Return the names of critical structural fields that are absent."""
|
||||||
|
missing: list[str] = []
|
||||||
|
if not parsed.title:
|
||||||
|
missing.append("title")
|
||||||
|
if not parsed.media_type or parsed.media_type.value == "unknown":
|
||||||
|
missing.append("media_type")
|
||||||
|
if parsed.year is None:
|
||||||
|
missing.append("year")
|
||||||
|
return tuple(missing)
|
||||||
|
|
||||||
|
|
||||||
|
def decide_road(
|
||||||
|
score: int,
|
||||||
|
has_schema: bool,
|
||||||
|
kb: ReleaseKnowledge,
|
||||||
|
) -> Road:
|
||||||
|
"""Pick the road the parse took.
|
||||||
|
|
||||||
|
EASY is decided structurally: if a known group schema matched, the
|
||||||
|
annotation walked the schema, and that's enough — the score does not
|
||||||
|
veto EASY. Otherwise the score decides between SHITTY and
|
||||||
|
PATH_OF_PAIN using ``kb.scoring["thresholds"]["shitty_min"]``.
|
||||||
|
"""
|
||||||
|
if has_schema:
|
||||||
|
return Road.EASY
|
||||||
|
threshold = kb.scoring["thresholds"].get("shitty_min", 60)
|
||||||
|
if score >= threshold:
|
||||||
|
return Road.SHITTY
|
||||||
|
return Road.PATH_OF_PAIN
|
||||||
@@ -0,0 +1,120 @@
|
|||||||
|
"""Release domain — parsing service.
|
||||||
|
|
||||||
|
Thin orchestrator over the annotate-based pipeline in
|
||||||
|
:mod:`alfred.domain.release.parser.pipeline`. Responsibilities:
|
||||||
|
|
||||||
|
* Strip a leading/trailing ``[site.tag]`` and decide ``parse_path``.
|
||||||
|
* Reject malformed names (forbidden characters) → ``parse_path=AI`` so
|
||||||
|
the LLM can clean them up.
|
||||||
|
* Otherwise call the v2 pipeline (tokenize → annotate → assemble) and
|
||||||
|
wrap the result in :class:`ParsedRelease`.
|
||||||
|
* Score the result and decide the road (EASY / SHITTY / PATH_OF_PAIN)
|
||||||
|
via :mod:`alfred.domain.release.parser.scoring`.
|
||||||
|
|
||||||
|
The public entry point is :func:`parse_release`, which returns
|
||||||
|
``(ParsedRelease, ParseReport)``. The report carries the confidence
|
||||||
|
score, the road, and diagnostic info for downstream callers.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from alfred.domain.releases_TO_CHECK.parser import scoring as _scoring, pipeline as _v2
|
||||||
|
from alfred.domain.releases_TO_CHECK.ports import ReleaseKnowledge
|
||||||
|
from alfred.domain.releases_TO_CHECK.value_objects_old_question_mark import MediaTypeToken, ParsedRelease, ParseReport, TokenizationRoute
|
||||||
|
|
||||||
|
|
||||||
|
def parse_release(
|
||||||
|
name: str, kb: ReleaseKnowledge
|
||||||
|
) -> tuple[ParsedRelease, ParseReport]:
|
||||||
|
"""Parse a release name.
|
||||||
|
|
||||||
|
Returns a tuple ``(ParsedRelease, ParseReport)``. The structural VO
|
||||||
|
is unchanged from the previous single-return contract; the report
|
||||||
|
is new and carries the confidence score + road decision.
|
||||||
|
|
||||||
|
Flow:
|
||||||
|
|
||||||
|
1. Strip a leading/trailing ``[site.tag]`` if present (sets
|
||||||
|
``parse_path="sanitized"``).
|
||||||
|
2. If the remainder still contains truly forbidden chars (anything
|
||||||
|
not in the configured separators), short-circuit to
|
||||||
|
``media_type="unknown"`` / ``parse_path="ai"`` and emit a
|
||||||
|
PATH_OF_PAIN report — the LLM handles these.
|
||||||
|
3. Otherwise run the v2 pipeline: tokenize → annotate (EASY when a
|
||||||
|
group schema is known, SHITTY otherwise) → assemble → score.
|
||||||
|
"""
|
||||||
|
parse_path = TokenizationRoute.DIRECT
|
||||||
|
|
||||||
|
# Apostrophes inside titles ("Don't", "L'avare") are common and should
|
||||||
|
# not push the release through the AI fallback. Strip them up front so
|
||||||
|
# both strip_site_tag and tokenize see "Dont" / "Lavare", which is good
|
||||||
|
# enough for token-level matching. The raw name is preserved on the VO.
|
||||||
|
working_name = name
|
||||||
|
if "'" in working_name:
|
||||||
|
working_name = working_name.replace("'", "")
|
||||||
|
parse_path = TokenizationRoute.SANITIZED
|
||||||
|
|
||||||
|
clean, site_tag = _v2.strip_site_tag(working_name)
|
||||||
|
if site_tag is not None:
|
||||||
|
parse_path = TokenizationRoute.SANITIZED
|
||||||
|
|
||||||
|
if not _is_well_formed(clean, kb):
|
||||||
|
parsed = ParsedRelease(
|
||||||
|
raw=name,
|
||||||
|
clean=clean,
|
||||||
|
title=clean,
|
||||||
|
title_sanitized=kb.sanitize_for_fs(clean),
|
||||||
|
year=None,
|
||||||
|
season=None,
|
||||||
|
episode=None,
|
||||||
|
episode_end=None,
|
||||||
|
quality=None,
|
||||||
|
source=None,
|
||||||
|
codec=None,
|
||||||
|
group="UNKNOWN",
|
||||||
|
media_type=MediaTypeToken.UNKNOWN,
|
||||||
|
site_tag=site_tag,
|
||||||
|
parse_path=TokenizationRoute.AI,
|
||||||
|
)
|
||||||
|
report = ParseReport(
|
||||||
|
confidence=0,
|
||||||
|
road=_scoring.Road.PATH_OF_PAIN.value,
|
||||||
|
unknown_tokens=(clean,),
|
||||||
|
missing_critical=("title", "media_type", "year"),
|
||||||
|
)
|
||||||
|
return parsed, report
|
||||||
|
|
||||||
|
tokens, v2_tag = _v2.tokenize(working_name, kb)
|
||||||
|
annotated = _v2.annotate(tokens, kb)
|
||||||
|
fields = _v2.assemble(annotated, v2_tag, name, kb)
|
||||||
|
|
||||||
|
parsed = ParsedRelease(
|
||||||
|
raw=name,
|
||||||
|
clean=clean,
|
||||||
|
parse_path=parse_path,
|
||||||
|
**fields,
|
||||||
|
)
|
||||||
|
|
||||||
|
has_schema = _v2.has_known_schema(tokens, kb)
|
||||||
|
score = _scoring.compute_score(parsed, annotated, kb)
|
||||||
|
road = _scoring.decide_road(score, has_schema, kb)
|
||||||
|
report = ParseReport(
|
||||||
|
confidence=score,
|
||||||
|
road=road.value,
|
||||||
|
unknown_tokens=_scoring.collect_unknown_tokens(annotated),
|
||||||
|
missing_critical=_scoring.collect_missing_critical(parsed),
|
||||||
|
)
|
||||||
|
return parsed, report
|
||||||
|
|
||||||
|
|
||||||
|
def _is_well_formed(name: str, kb: ReleaseKnowledge) -> bool:
|
||||||
|
"""Return True if ``name`` contains no forbidden characters per scene
|
||||||
|
naming rules.
|
||||||
|
|
||||||
|
Characters listed as token separators (spaces, brackets, parens, …)
|
||||||
|
are NOT considered malforming — the tokenizer handles them. Only
|
||||||
|
truly broken chars like ``@``, ``#``, ``!``, ``%`` make a name
|
||||||
|
malformed.
|
||||||
|
"""
|
||||||
|
tokenizable = set(kb.separators)
|
||||||
|
return not any(c in name for c in kb.forbidden_chars if c not in tokenizable)
|
||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user