Compare commits
1 Commits
main
..
de02bdea06
| Author | SHA1 | Date | |
|---|---|---|---|
| de02bdea06 |
+3
-6
@@ -46,12 +46,9 @@ TMDB_BASE_URL=https://api.themoviedb.org/3
|
||||
|
||||
# qBittorrent
|
||||
# → QBITTORRENT_PASSWORD goes in .env.secrets
|
||||
QBITTORRENT_URL=https://qb.lan.anustart.top
|
||||
QBITTORRENT_USERNAME=letmein
|
||||
QBITTORRENT_URL=http://qbittorrent:16140
|
||||
QBITTORRENT_USERNAME=admin
|
||||
QBITTORRENT_PORT=16140
|
||||
# Path translation: host-side prefix → container-side prefix
|
||||
QBITTORRENT_HOST_PATH=/mnt/testipool
|
||||
QBITTORRENT_CONTAINER_PATH=/mnt/data
|
||||
|
||||
# Meilisearch
|
||||
# → MEILI_MASTER_KEY goes in .env.secrets
|
||||
@@ -63,7 +60,7 @@ MEILI_HOST=http://meilisearch:7700
|
||||
# --- LLM CONFIGURATION ---
|
||||
# Providers: local, openai, anthropic, deepseek, google, kimi
|
||||
# → API keys go in .env.secrets
|
||||
DEFAULT_LLM_PROVIDER=deepseek
|
||||
DEFAULT_LLM_PROVIDER=local
|
||||
|
||||
# Local LLM (Ollama)
|
||||
#OLLAMA_BASE_URL=http://ollama:11434
|
||||
|
||||
+1
-4
@@ -59,8 +59,6 @@ Thumbs.db
|
||||
|
||||
# Backup files
|
||||
*.backup
|
||||
*.bak
|
||||
env_backup/
|
||||
|
||||
# Application data dir
|
||||
data/*
|
||||
@@ -71,8 +69,7 @@ logs/*
|
||||
# Documentation folder
|
||||
docs/
|
||||
|
||||
# .md files (project-level Markdown is brol-y; allow-list the ones we track)
|
||||
# .md files
|
||||
*.md
|
||||
!CHANGELOG.md
|
||||
|
||||
#
|
||||
|
||||
-724
@@ -1,724 +0,0 @@
|
||||
# Changelog
|
||||
|
||||
All notable changes to Alfred are documented here.
|
||||
|
||||
The format is loosely based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
|
||||
Alfred is not yet on SemVer — entries are grouped by **dated work blocks** instead
|
||||
of release numbers. Granularity targets behavioral or API-visible changes; refer
|
||||
to `git log` for commit-level detail.
|
||||
|
||||
Sections used per block: **Added** / **Changed** / **Deprecated** / **Removed** /
|
||||
**Fixed** / **Internal** (for tech-debt and refactor noise that doesn't affect
|
||||
callers).
|
||||
|
||||
---
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
### Fixed
|
||||
|
||||
- **Multi-episode chain (e.g. `S14E09E10E11`) now collapses to a full
|
||||
range.** The parser previously captured `episode=9, episode_end=10`
|
||||
and dropped E11+. It now returns `episode=first, episode_end=last`,
|
||||
with intermediate values implied. Fixture
|
||||
`shitty/archer_multi_episode/` updated from anti-regression-of-bug
|
||||
to anti-regression-of-fix.
|
||||
- **Apostrophes in titles no longer push the release through the AI
|
||||
fallback.** `Honey.Don't.2025.2160p.WEBRip.DSNP.DV.HDR.x265-Amen`
|
||||
previously parsed with `parse_path="ai"` and everything UNKNOWN
|
||||
because `'` is in the forbidden-chars list. Apostrophes are now
|
||||
pre-stripped before the well-formed check, so the parse completes
|
||||
normally (`title=Honey.Dont, year=2025, quality=2160p, ...`); only
|
||||
the title text loses its apostrophe. `parse_path` becomes
|
||||
`sanitized` to surface the cleanup. Side win: PoP fixture
|
||||
`the_prodigy_full_chaos/` also moves from total failure to a
|
||||
partially-correct parse (year, source, codec extracted).
|
||||
- **Season-range markers (`Sxx-yy`) are now recognized as
|
||||
`tv_complete`.** `Der.Tatortreiniger.S01-06.GERMAN...` previously
|
||||
parsed as `media_type=movie` with `S01-06` glued onto the title.
|
||||
The parser now recognizes the range, sets `season=first`,
|
||||
`media_type=tv_complete`, and removes the marker from the title.
|
||||
`is_season_pack` flips to `true`.
|
||||
- **Pure-punctuation TITLE tokens are dropped at assembly.** Releases
|
||||
with surrounding ` - ` separators (`Vinyl - 1x01 - FHD`) previously
|
||||
produced `title="Vinyl.-"`. Such tokens (a stray dash, a wide pipe
|
||||
`|`, …) carry no title content and are now filtered out. Side
|
||||
effect: PoP fixture `khruangbin_yt_wide_pipe/` also benefits — the
|
||||
YouTube wide-pipe no longer leaks into the title.
|
||||
|
||||
### Added
|
||||
|
||||
- **Fullwidth vertical bar `|` (U+FF5C) is now a recognized release-name
|
||||
token separator.** Added to `alfred/knowledge/release/separators.yaml`
|
||||
so CJK release names (and the occasional decorative YouTube-style use)
|
||||
tokenize cleanly instead of leaving the wide pipe glued onto an
|
||||
adjacent token. The tokenizer in
|
||||
`alfred/domain/release/parser/pipeline.py` already iterates the
|
||||
separator list as plain strings (no regex), so a multi-byte UTF-8
|
||||
separator works without any code change.
|
||||
|
||||
- **`InspectedResult.recommended_action` property** — derived hint that
|
||||
collapses the orchestrator's go / wait / skip decision into a single
|
||||
value (``"process"`` / ``"ask_user"`` / ``"skip"``). Centralizes the
|
||||
exclusion logic that was previously dispersed across road /
|
||||
media_type / main_video checks at each call site. Ordering is part of
|
||||
the contract: ``skip`` (no main video, or media_type == ``"other"``)
|
||||
wins over ``ask_user`` (media_type == ``"unknown"`` or road ==
|
||||
``"path_of_pain"``) which wins over ``process``. Surfaced through the
|
||||
``analyze_release`` tool so the LLM can route on it directly.
|
||||
6 new tests in ``tests/application/test_inspect.py`` cover the four
|
||||
branches and the precedence rules.
|
||||
- **`LanguageRepository` port** in `alfred.domain.shared.ports`. Structural
|
||||
Protocol covering `from_iso`, `from_any`, `all`, `__contains__`, `__len__`
|
||||
— the surface previously coupled to the concrete `LanguageRegistry`.
|
||||
Mirrors the `MediaProber` / `FilesystemScanner` pattern: domain code
|
||||
depends on the Protocol, infrastructure provides the YAML-backed
|
||||
adapter. Tests in `tests/infrastructure/test_language_registry.py`.
|
||||
|
||||
### Changed
|
||||
|
||||
- **`Movie` and `Episode` are now frozen dataclasses.** Both entities
|
||||
hold their track collections as `tuple[AudioTrack, ...]` and
|
||||
`tuple[SubtitleTrack, ...]` instead of mutable lists, and are
|
||||
`@dataclass(frozen=True, eq=False)` (identity-based equality
|
||||
preserved via `__eq__`/`__hash__`). `__post_init__` coercion uses
|
||||
`object.__setattr__` for the `imdb_id` / `title` /
|
||||
`season_number` / `episode_number` normalizations. To project
|
||||
enrichment results (probe output, file metadata) callers now rebuild
|
||||
via `dataclasses.replace(...)`. Pattern aligned with the recent
|
||||
`ParsedRelease` freeze. `MediaWithTracks` mixin contract updated to
|
||||
`tuple` accordingly. `Season` and `TVShow` remain mutable for now —
|
||||
freezing the aggregate root would cascade a full reconstruction on
|
||||
every `add_episode`, deferred.
|
||||
- **`SubtitleCandidate` renamed to `SubtitleScanResult`.** The old name
|
||||
conflated "this might become a placed subtitle" with "this is what a
|
||||
scan pass produced". The class is the output of a scan/identify pass
|
||||
— language/format may still be `None`, confidence reflects how sure
|
||||
the classifier is, and `raw_tokens` holds the filename fragments
|
||||
under analysis. `SubtitleScanResult` says that directly. Pure rename
|
||||
with a refreshed docstring in `alfred/domain/subtitles/entities.py`;
|
||||
no behavior change. Touches the domain entity + `__init__` export,
|
||||
the matcher / identifier / utils services, the manage_subtitles use
|
||||
case, the placer, the metadata store, the shared-media cross-ref
|
||||
comment, and the seven test modules that imported the type.
|
||||
|
||||
- **`ParsedRelease` is now frozen; enrichment passes return new
|
||||
instances.** The VO was mutable so `detect_media_type` and
|
||||
`enrich_from_probe` could patch fields in place — a code smell in a
|
||||
value object whose identity *is* its content. `ParsedRelease` is now
|
||||
`@dataclass(frozen=True)`; `languages` is a `tuple[str, ...]`
|
||||
instead of a `list[str]`. `enrich_from_probe` returns a new
|
||||
`ParsedRelease` via `dataclasses.replace` (only allocates when at
|
||||
least one field actually changed). `inspect_release` rebinds
|
||||
`parsed` after both `detect_media_type` (wrapped in `MediaTypeToken`
|
||||
to satisfy the strict isinstance check that now also runs on
|
||||
replace) and `enrich_from_probe`. Parser pipeline now packs
|
||||
`languages` as a tuple in the assemble dict. Callers updated:
|
||||
`inspect_release`, `testing/recognize_folders_in_downloads.py`, and
|
||||
the enrichment tests (22 call sites + language assertions switched
|
||||
to tuple literals).
|
||||
- **`resolve_destination` use cases take `kb` / `prober` as required
|
||||
params; module-level singletons gone.** The four
|
||||
`resolve_{season,episode,movie,series}_destination` use cases now
|
||||
accept `kb: ReleaseKnowledge` and `prober: MediaProber` as required
|
||||
arguments, matching the shape of `inspect_release`. The module-level
|
||||
`_KB = YamlReleaseKnowledge()` and `_PROBER = FfprobeMediaProber()`
|
||||
singletons that previously lived in
|
||||
`alfred/application/filesystem/resolve_destination.py` are removed —
|
||||
the application layer no longer reaches into infrastructure. The
|
||||
singletons now live at the agent-tools frontier
|
||||
(`alfred/agent/tools/filesystem.py`), where the LLM-facing wrappers
|
||||
instantiate them once and thread them through. `analyze_release` no
|
||||
longer needs the dirty `from ... import _KB` indirection. Tests
|
||||
inject their own stubs by keyword (`prober=_StubProber(...)`) instead
|
||||
of monkeypatching a module attribute.
|
||||
- **`ParsePath` enum renamed to `TokenizationRoute`.** The old name
|
||||
collided with `pathlib.Path` in code-reading mental models, and was
|
||||
one letter from `parse_path` (the field that holds the value) — making
|
||||
it harder than it needed to be to spot the type vs the attribute.
|
||||
``TokenizationRoute`` says what it actually captures (DIRECT /
|
||||
SANITIZED / AI = how the name reached the tokenizer), and the class
|
||||
docstring now spells out the orthogonality with ``Road`` (EASY /
|
||||
SHITTY / PATH_OF_PAIN, which captures parser confidence on
|
||||
``ParseReport``). The ``parse_path`` field name stays unchanged —
|
||||
string values too — so YAML fixtures, the ``analyze_release`` tool
|
||||
spec, and any external consumer are untouched.
|
||||
- **`enrich_from_probe` codec mappings moved to YAML.** The three
|
||||
hard-coded module dicts (`_VIDEO_CODEC_MAP`, `_AUDIO_CODEC_MAP`,
|
||||
`_CHANNEL_MAP`) translating ffprobe output to scene tokens
|
||||
(`hevc → x265`, `eac3 → EAC3`, `8 → "7.1"`, …) now live in
|
||||
`alfred/knowledge/release/probe_mappings.yaml` and are loaded into
|
||||
`ReleaseKnowledge.probe_mappings` (new port field, populated by
|
||||
`YamlReleaseKnowledge`). `enrich_from_probe` gains a third `kb`
|
||||
parameter and reads the maps from there. Aligns with the CLAUDE.md
|
||||
rule that lookup tables of domain knowledge belong in YAML, not in
|
||||
Python — and opens the door to a future "learn new codec" pass.
|
||||
Callers updated: `inspect_release`, `testing/recognize_folders_in_downloads.py`,
|
||||
and all 22 sites in `tests/application/test_enrich_from_probe.py`.
|
||||
- **`ParsedRelease.tech_string` is now a derived `@property`**
|
||||
(`alfred/domain/release/value_objects.py`). It computes
|
||||
`quality.source.codec` joined by dots on every access, so it stays in
|
||||
sync with the underlying fields by construction. The stored field is
|
||||
gone from the dataclass, the dict returned by `assemble()` no longer
|
||||
carries the key, `parse_release`'s malformed-name fallback drops the
|
||||
`tech_string=""` kwarg, and `enrich_from_probe` no longer re-derives
|
||||
it after filling `quality`/`source`/`codec`. Closes the
|
||||
parser/enrichment double-source-of-truth that `e79ca46` had to fix
|
||||
reactively. The fixtures runner now injects `tech_string` alongside
|
||||
`is_season_pack` since `asdict()` skips properties.
|
||||
- **`RuleScope.level` is now an enum (`RuleScopeLevel`).** The set of
|
||||
valid levels (global, release_group, movie, show, season, episode)
|
||||
was documented only in a docstring comment and validated nowhere.
|
||||
`RuleScopeLevel(str, Enum)` keeps wire compatibility (YAML
|
||||
serialization, `.value` access) while making the closed set explicit
|
||||
to type-checkers and IDEs. `to_dict()` emits `.value` strings so
|
||||
YAML output is unchanged.
|
||||
- **`FilePath` VO uses `__post_init__` instead of a hand-rolled
|
||||
`__init__`.** Same public API (accepts `str | Path`), same behavior,
|
||||
but the dataclass-generated `__init__` is no longer bypassed. One
|
||||
less smell in the shared VOs.
|
||||
- **`Language` VO is strict by default; `Language.from_raw()` factory
|
||||
for normalization.** The previous `__post_init__` mutated `iso` and
|
||||
`aliases` via `object.__setattr__` on a frozen dataclass — a code
|
||||
smell hiding behind the dataclass facade. Split: the direct
|
||||
constructor now rejects un-normalized input (uppercase iso,
|
||||
whitespace in aliases, etc.), and `Language.from_raw()` handles
|
||||
arbitrary YAML/user input. Only one caller (LanguageRegistry loading
|
||||
the ISO YAML) needed migration.
|
||||
- **`ParsedRelease.normalised` renamed to `clean`.** The field name
|
||||
promised "dots instead of spaces" but in practice held
|
||||
`raw - site_tag - apostrophes` — only used by `season_folder_name()`.
|
||||
Renamed and docstring corrected.
|
||||
- **`ParsedRelease.media_type` / `parse_path` are strict enums.** The
|
||||
fields were already typed as `MediaTypeToken` / `ParsePath`, but a
|
||||
tolerant `__post_init__` coerced raw strings. With both classes
|
||||
being `(str, Enum)`, the coercion served no purpose. Strict
|
||||
constructor; `.value` no longer passed at call sites; dropped the
|
||||
unused `_VALID_MEDIA_TYPES` / `_VALID_PARSE_PATHS` lookup tables.
|
||||
|
||||
### Removed
|
||||
|
||||
- **`settings.min_movie_size_bytes`** — orphan Pydantic field +
|
||||
validator. Its only consumer (`MovieService.validate_movie_file`)
|
||||
had been removed during an earlier refactor. The "real movie vs
|
||||
sample" rule now lives in extension-based exclusion
|
||||
(`application/release/supported_media.py`) and PoP. If a size
|
||||
threshold is ever needed, it'll go in a knowledge YAML, not in
|
||||
`settings`.
|
||||
|
||||
### Internal
|
||||
|
||||
- **Flattened `alfred.domain.shared.media/` package into a single
|
||||
`media.py` module.** The 6-file package (audio, video, subtitle,
|
||||
info, matching, tracks_mixin + `__init__`) collapsed into one ~250
|
||||
LoC module. All 12 import sites continue to resolve unchanged
|
||||
(`from alfred.domain.shared.media import AudioTrack, MediaInfo, …`)
|
||||
since Python treats `media.py` and `media/__init__.py`
|
||||
interchangeably for import paths. Easier to scan when the whole
|
||||
bounded-context fits on one screen.
|
||||
- **`SubtitleKnowledgeBase` types `language_registry` against the
|
||||
`LanguageRepository` port** instead of the concrete `LanguageRegistry`
|
||||
class. The default constructor still instantiates the concrete adapter
|
||||
when no repository is injected — behaviour is unchanged for existing
|
||||
callers. Opens the door to in-memory fakes in future tests without
|
||||
loading the full ISO 639 YAML.
|
||||
- **Moved `detect_media_type` and `enrich_from_probe` from
|
||||
`alfred.application.filesystem` to `alfred.application.release`**.
|
||||
They are inspection-pipeline helpers — their natural home is next to
|
||||
`inspect_release`, not next to the filesystem use cases. The move
|
||||
also eliminates a circular-import workaround in
|
||||
`resolve_destination.py`: `inspect_release` can now be imported at
|
||||
module top instead of lazily inside `_resolve_parsed`. Public
|
||||
surface is unchanged for callers that imported the helpers from
|
||||
their full module paths (the only call sites — `inspect.py`, two
|
||||
tests, one testing script — were updated in this commit).
|
||||
|
||||
### Added
|
||||
|
||||
- **`resolve_*_destination` use cases now consume `inspect_release`**.
|
||||
`resolve_episode_destination` and `resolve_movie_destination` reuse
|
||||
their existing `source_file` parameter as the inspection target;
|
||||
`resolve_season_destination` and `resolve_series_destination` gain
|
||||
a new **optional** `source_path` parameter (also threaded through
|
||||
the tool wrappers and YAML specs). When the path exists, ffprobe
|
||||
data fills tokens missing from the release name (e.g. quality) and
|
||||
refreshes `tech_string`, so the destination folder / file names
|
||||
end up more accurate. When the path is missing or absent (back-compat
|
||||
callers), the use cases fall back to parse-only — same behavior as
|
||||
before.
|
||||
|
||||
### Fixed
|
||||
|
||||
- **`enrich_from_probe` now refreshes `tech_string`** after filling
|
||||
`quality` / `source` / `codec`. Previously the field stayed at its
|
||||
parser-time value, so filename builders saw stale tech tokens even
|
||||
after a successful probe. New `TestTechString` class in
|
||||
`tests/application/test_enrich_from_probe.py` locks the behavior.
|
||||
|
||||
### Added
|
||||
|
||||
- **`inspect_release` orchestrator + `InspectedResult` VO**
|
||||
(`alfred/application/release/inspect.py`). Single composition of the
|
||||
four inspection layers: `parse_release` → `detect_media_type` (patches
|
||||
`parsed.media_type`) → `find_main_video` (top-level scan) →
|
||||
`prober.probe` + `enrich_from_probe` when a video exists and the
|
||||
refined media type isn't in `{"unknown", "other"}`. Returns a frozen
|
||||
`InspectedResult(parsed, report, source_path, main_video, media_info,
|
||||
probe_used)` that downstream callers consume directly instead of
|
||||
rebuilding the same chain. `kb` and `prober` are injected — no
|
||||
module-level singletons. Never raises.
|
||||
|
||||
### Changed
|
||||
|
||||
- **`analyze_release` tool now delegates to `inspect_release`** — same
|
||||
output shape, plus two new fields: `confidence` (0–100) and `road`
|
||||
(`"easy"` / `"shitty"` / `"path_of_pain"`) surfaced from the parser's
|
||||
`ParseReport`. The tool spec (`specs/analyze_release.yaml`) documents
|
||||
both fields so the LLM can route releases by confidence.
|
||||
|
||||
- **`MediaProber` port now covers full media probing**: added
|
||||
`probe(video) -> MediaInfo | None` alongside the existing
|
||||
`list_subtitle_streams`. `FfprobeMediaProber` (in
|
||||
`alfred/infrastructure/probe/`) implements both methods and is now
|
||||
the single adapter shelling out to `ffprobe`. The standalone
|
||||
`alfred/infrastructure/filesystem/ffprobe.py` module was removed —
|
||||
all callers (tools, testing scripts) instantiate
|
||||
`FfprobeMediaProber` instead. Unblocks the upcoming
|
||||
`inspect_release` orchestrator, which depends on the port.
|
||||
|
||||
### Removed
|
||||
|
||||
- `alfred/infrastructure/filesystem/ffprobe.py` (folded into the
|
||||
`FfprobeMediaProber` adapter).
|
||||
|
||||
---
|
||||
|
||||
## [2026-05-20] — Release parser confidence scoring + exclusion
|
||||
|
||||
### Added
|
||||
|
||||
- **Pre-pipeline exclusion helpers** (`alfred/application/release/supported_media.py`):
|
||||
`is_supported_video(path, kb)` (extension-only check against
|
||||
`kb.video_extensions`) and `find_main_video(folder, kb)` (top-level
|
||||
scan, lexicographically-first eligible file, returns `None` when no
|
||||
video qualifies; accepts a bare file as folder for single-file
|
||||
releases). No size threshold, no filename heuristics —
|
||||
PATH_OF_PAIN handles the exotic cases. Foundation for the future
|
||||
`inspect_release` orchestrator.
|
||||
|
||||
- **Release parser — parse-confidence scoring** (`alfred/domain/release/parser/scoring.py`,
|
||||
`alfred/knowledge/release/scoring.yaml`). `parse_release` now returns
|
||||
`(ParsedRelease, ParseReport)`. The new `ParseReport` frozen VO
|
||||
carries a 0–100 `confidence`, a `road` (`"easy"` / `"shitty"` /
|
||||
`"path_of_pain"`), the residual UNKNOWN tokens, and the missing
|
||||
critical fields. EASY is decided structurally (a group schema
|
||||
matched); SHITTY vs PATH_OF_PAIN is decided by score against a
|
||||
YAML-configurable cutoff (default 60). Weights and penalties also
|
||||
live in `scoring.yaml` — title 30, media_type 20, year 15, season
|
||||
10, episode 5, tech 5 each; penalty 5 per UNKNOWN token capped at
|
||||
-30. `Road` is a new enum, distinct from `ParsePath` (which records
|
||||
the tokenization route, not the confidence tier). `ReleaseKnowledge`
|
||||
port gains a `scoring: dict` field.
|
||||
|
||||
### Changed
|
||||
|
||||
- **`parse_release` signature** is now `(name, kb) → tuple[ParsedRelease,
|
||||
ParseReport]` instead of returning a bare `ParsedRelease`. Call
|
||||
sites updated in `application/filesystem/resolve_destination.py` and
|
||||
`agent/tools/filesystem.py`. Tests updated accordingly.
|
||||
|
||||
---
|
||||
|
||||
## [2026-05-20] — Release parser v2 (EASY + SHITTY)
|
||||
|
||||
### Added
|
||||
|
||||
- **Release parser v2 — EASY path live** (`alfred/domain/release/parser/`):
|
||||
new annotate-based pipeline (tokenize → annotate → assemble) drives
|
||||
releases from known groups. Exposes `Token` (frozen VO with `index` +
|
||||
`role` + `extra`), `TokenRole` enum (structural/technical/meta families),
|
||||
and `GroupSchema` / `SchemaChunk` value objects.
|
||||
- `pipeline.tokenize`: string-ops separator split (no regex), strips
|
||||
a `[site.tag]` prefix/suffix first.
|
||||
- `pipeline.annotate`: detects the trailing group right-to-left
|
||||
(priority to `codec-GROUP` shape, fallback to any non-source dashed
|
||||
token), looks up its `GroupSchema`, then walks tokens and schema
|
||||
chunks in lockstep — optional chunks that don't match are skipped,
|
||||
mandatory mismatches abort EASY and return `None` so the caller can
|
||||
fall back to SHITTY.
|
||||
- `pipeline.assemble`: folds annotated tokens into a
|
||||
`ParsedRelease`-compatible dict.
|
||||
- `parse_release` (in `release.services`) tries the v2 EASY path first
|
||||
and falls through to the legacy SHITTY heuristic on `None`. Legacy
|
||||
SHITTY/PATH OF PAIN behavior is unchanged.
|
||||
- Knowledge: `alfred/knowledge/release/release_groups/{kontrast,elite,
|
||||
rarbg}.yaml` declare the canonical chunk order per group, loaded via
|
||||
new `ReleaseKnowledge.group_schema(name)` port method.
|
||||
- Tests in `tests/domain/release/test_parser_v2_{scaffolding,easy}.py`
|
||||
cover token VOs, site-tag stripping, group detection, schema-driven
|
||||
annotation (movie, TV episode, season pack with optional source),
|
||||
and field assembly.
|
||||
|
||||
- **Release parser v2 — enricher pass** completes the EASY pipeline.
|
||||
The structural schema walk now tolerates non-positional tokens
|
||||
between chunks (instead of aborting on leftover tokens), and a second
|
||||
pass tags them with audio / video-meta / edition / language roles.
|
||||
Multi-token sequences from `audio.yaml`, `video.yaml`, `editions.yaml`
|
||||
(e.g. `DTS.HD.MA`, `DV.HDR10`, `TrueHD.Atmos`, `DIRECTORS.CUT`) are
|
||||
matched before single tokens. Channel layouts like `5.1` and `7.1`
|
||||
(split into two tokens by the `.` separator) are detected as
|
||||
consecutive pairs. Sequence members carry an `extra["sequence_member"]`
|
||||
marker so `assemble` extracts the canonical value only from the
|
||||
primary token. KONTRAST releases with audio / HDR / edition / language
|
||||
metadata now produce a fully populated `ParsedRelease`.
|
||||
|
||||
- **Streaming distributor as a separate dimension** from encoding source.
|
||||
New `alfred/knowledge/release/distributors.yaml` (NF, AMZN, DSNP, HMAX,
|
||||
ATVP, HULU, PCOK, PMTP, CR) feeds a new `ReleaseKnowledge.distributors`
|
||||
port field, a `TokenRole.DISTRIBUTOR` annotation, and a
|
||||
`ParsedRelease.distributor` field. `WEB-DL` stays the source; the
|
||||
platform that produced the release is now recorded distinctly. The
|
||||
five entries (NF, AMZN, DSNP, HMAX, ATVP) were correspondingly removed
|
||||
from `sources.yaml`.
|
||||
|
||||
- **Real-world release fixtures** under `tests/fixtures/releases/{easy,shitty,path_of_pain}/`,
|
||||
each documenting an expected `ParsedRelease` plus the future `routing`
|
||||
(library / torrents / seed_hardlinks) for the upcoming `organize_media`
|
||||
refactor. EASY bucket seeded with 5 cases (movie, single-episode, season
|
||||
pack, movie + noise, YTS bracket-heavy). SHITTY bucket seeded with 15
|
||||
anti-regression cases covering: 3-level INTEGRALE hierarchy (Angel),
|
||||
French custom titles (Buffy, La Nuit au Musée, Chérie j'ai agrandi),
|
||||
multi-episode chain `S14E09E10E11` (Archer, captures E11 loss),
|
||||
lowercase `s01e01` (Notre Planète), `NxNN` with ` - ` separators
|
||||
(Vinyl, captures dash artifact), title-with-year-suffix (Deutschland.83),
|
||||
season-range `S01-06` (Tatortreiniger, captures movie misclassification),
|
||||
bare folder name (Jurassic Park,
|
||||
media_type=unknown), apostrophe-in-name (Honey Don't, captures full AI-path
|
||||
degeneration), SUBS-tag movie (Hook), space separators (Predator Badlands,
|
||||
captures group=UNKNOWN), subs-only release (Westworld S04).
|
||||
PATH OF PAIN bucket seeded with 10 worst-case fixtures covering:
|
||||
UTF-8 wide pipe yt-dlp slug (Khruangbin), 3-show franchise box-set
|
||||
with double season range and parens-wrapped tech (Deutschland 83-86-89,
|
||||
captures `group=S03` misdetection), accented chars in title (Chérie
|
||||
BéBé with VFF), 8-word stand-up comedy title (Jimmy Carr), site-tag
|
||||
prefix + XviD (OxTorrent), episode title + air-date silently lost
|
||||
(Prodiges), full-chaos apostrophe + spaces + Blu-ray dash + 1080i +
|
||||
multi-word audio codec (The Prodigy, full AI-path degeneration),
|
||||
yt-dlp YouTube ID glued to year (Sleaford Mods), bilingual `[FR-EN]`
|
||||
tag mistaken for group (Super Mario Bros), COMPLETE + S01-S07 range +
|
||||
REPACK + HEVC (Gilmore Girls, the well-behaved exception).
|
||||
Parametrized over `tests/domain/test_release_fixtures.py` for anti-regression.
|
||||
- **`NxNN` alt season/episode form supported** by `parse_release`. Releases like
|
||||
`Show.1x05.720p.HDTV.x264-GRP` and `Show.2x07x08.1080p.WEB.x265-GRP` (multi-ep
|
||||
alt form) now parse as TV shows.
|
||||
- **`alfred/knowledge/release/separators.yaml`** declares the token separators
|
||||
used by the release-name tokenizer (`.`, ` `, `[`, `]`, `(`, `)`, `_`). New
|
||||
conventions can be added without code changes. The canonical `.` is always
|
||||
present even if missing from YAML.
|
||||
|
||||
### Changed
|
||||
|
||||
- **Release parser v2 — SHITTY simplified to dict-driven tagging**.
|
||||
The legacy ~480-line heuristic block in `release/services.py` is gone;
|
||||
`pipeline._annotate_shitty` does a single pass that looks each token
|
||||
up in the kb buckets (resolutions / sources / codecs / distributors /
|
||||
year / `SxxExx`) with first-match-wins semantics, and the leftmost
|
||||
contiguous UNKNOWN run becomes the title. `annotate()` no longer
|
||||
returns `None` — SHITTY is the always-on fallback when no group schema
|
||||
matches. `services.py` shrunk from ~525 to ~85 lines. Four fixtures
|
||||
(`deutschland_franchise_box`, `sleaford_yt_slug`,
|
||||
`super_mario_bilingual`, `predator_space_separators` — the last one
|
||||
moved from `shitty/` → `path_of_pain/`) are now marked
|
||||
`pytest.mark.xfail(strict=False)` documenting PoP-grade pathologies
|
||||
that SHITTY intentionally won't handle. `ReleaseFixture` grows an
|
||||
`xfail_reason` field; the parametrized suite wires the xfail mark
|
||||
automatically.
|
||||
|
||||
- **`parse_release` tokenizer is now data-driven**: it splits on any character
|
||||
listed in `separators.yaml` (regex character class) instead of `name.split(".")`.
|
||||
This makes YTS-style releases (`The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]`),
|
||||
space-separated names (`Inception 2010 1080p BluRay x264-GROUP`), and
|
||||
underscore-separated names parse correctly via the direct path — no more
|
||||
fallback through sanitization.
|
||||
- **`parse_release` flow simplified**: site-tag extraction always runs first
|
||||
(so `parse_path == "sanitized"` now reliably indicates a stripped `[tag]`),
|
||||
then well-formedness is checked only against truly forbidden chars
|
||||
(anything not in the configured separator set).
|
||||
- **ISO 639-2/B is now the canonical language code project-wide** (was a mix of
|
||||
639-1 and 639-2/T):
|
||||
- `SubtitlePreferences.languages` default is now `["fre", "eng"]` (was
|
||||
`["fr", "en"]`). Old LTM files are not auto-migrated — delete
|
||||
`data/memory/ltm.json` to regenerate with the new defaults.
|
||||
- Subtitle output filenames are now `{iso639_2b}.srt` (e.g. `fre.srt`,
|
||||
`fre.sdh.srt`). Existing `fr.srt` files are still **read** correctly
|
||||
(recognized as French via alias) but new files are written canonically.
|
||||
- `Language` value object docstring corrected: it has always stored 639-2/B
|
||||
(matching what ffprobe emits), not 639-2/T as previously documented.
|
||||
- **`MovieService.validate_movie_file` minimum size is now configurable** via
|
||||
`settings.min_movie_size_bytes` (default unchanged: 100 MB). Constructor
|
||||
accepts an optional `min_movie_size_bytes` override for tests.
|
||||
- **`SubtitleKnowledgeBase` delegates language lookup to `LanguageRegistry`**
|
||||
rather than duplicating tokens. `subtitles.yaml` now only declares
|
||||
subtitle-specific tokens (e.g. `vostfr`, `vf`, `vff`) under a new
|
||||
`language_tokens` section.
|
||||
|
||||
### Removed
|
||||
|
||||
- **`alfred/domain/tv_shows/services.py`** and **`alfred/domain/movies/services.py`**
|
||||
deleted entirely. They held fossil parsers (`parse_episode_filename`,
|
||||
`extract_movie_metadata`, …) with zero production callers — superseded by
|
||||
`parse_release` as the single source of truth for release-name parsing.
|
||||
Associated tests (`tests/domain/test_movies.py`, `tests/domain/test_tv_shows_service.py`)
|
||||
removed as well.
|
||||
- `_sanitize` and `_normalize` helpers in `alfred/domain/release/services.py` —
|
||||
the new tokenizer makes them redundant.
|
||||
- `_LANG_KEYWORDS`, `_SDH_TOKENS`, `_FORCED_TOKENS`, `SUBTITLE_EXTENSIONS`
|
||||
hardcoded dicts in `alfred/domain/subtitles/scanner.py` — all knowledge now
|
||||
lives in YAML (CLAUDE.md compliance).
|
||||
- `_MIN_MOVIE_SIZE_BYTES` module-level constant in
|
||||
`alfred/domain/movies/services.py` — replaced by the new setting.
|
||||
- Top-level `languages:` block in `subtitles.yaml` — superseded by
|
||||
`language_tokens:` (subtitle-specific only) since iso_languages.yaml is the
|
||||
canonical source.
|
||||
|
||||
### Fixed
|
||||
|
||||
- **`hi` token no longer marks a subtitle as SDH** (it conflicted with the
|
||||
ISO 639-1 alias for Hindi). SDH is now detected only via `sdh`, `cc`, and
|
||||
`hearing` tokens.
|
||||
- `SubtitleKnowledgeBase` default rules used `"fra"` while
|
||||
`iso_languages.yaml` exposes French as `"fre"` — preferred languages
|
||||
defaults now match the canonical form.
|
||||
|
||||
### Internal
|
||||
|
||||
- **Domain I/O extraction** (`refactor/domain-io-extraction`): the domain
|
||||
layer no longer performs subprocess calls, filesystem scans, or YAML
|
||||
loading. Achieved in a series of focused commits:
|
||||
- **Knowledge YAML loaders moved to infrastructure**:
|
||||
`alfred/domain/release/knowledge.py`,
|
||||
`alfred/domain/shared/knowledge/language_registry.py`, and
|
||||
`alfred/domain/subtitles/knowledge/{base,loader}.py` relocated to
|
||||
`alfred/infrastructure/knowledge/`. Re-exports were dropped — callers
|
||||
import directly from the new location.
|
||||
- **`MediaProber` and `FilesystemScanner` Protocol ports** introduced at
|
||||
`alfred/domain/shared/ports/` with frozen-dataclass DTOs
|
||||
(`SubtitleStreamInfo`, `FileEntry`). `SubtitleIdentifier` and
|
||||
`PatternDetector` are now constructor-injected with concrete adapters
|
||||
(`FfprobeMediaProber` wrapping `subprocess.run(ffprobe)` and
|
||||
`PathlibFilesystemScanner` wrapping `pathlib`). No more direct
|
||||
`subprocess`/`pathlib` usage from the subtitle domain services.
|
||||
- **Live filesystem methods removed from VOs and entities**:
|
||||
`FilePath.exists()` / `.is_file()` / `.is_dir()` deleted —
|
||||
`FilePath` is now a pure address VO. `Movie.has_file()` and
|
||||
`Episode.is_downloaded()` dropped. Callers either rely on a prior
|
||||
detection step or use try/except over pre-checks (eliminates
|
||||
TOCTOU races).
|
||||
- **`SubtitlePlacer` moved to the application layer** at
|
||||
`alfred/application/subtitles/placer.py` — it performs `os.link`
|
||||
I/O, which doesn't belong in the domain. Pre-checks replaced with
|
||||
try/except for `FileNotFoundError`/`FileExistsError`.
|
||||
- **`SubtitleRuleSet.resolve()` no longer reaches into the knowledge
|
||||
base**: the implicit `DEFAULT_RULES()` helper is gone, replaced by
|
||||
an explicit `default_rules: SubtitleMatchingRules` parameter. The
|
||||
`ManageSubtitles` use case loads defaults from the KB once and
|
||||
passes them in.
|
||||
- **`SubtitleKnowledge` Protocol port** at
|
||||
`alfred/domain/subtitles/ports/knowledge.py` declares the read-only
|
||||
query surface domain services consume (7 methods:
|
||||
`known_extensions`, `format_for_extension`, `language_for_token`,
|
||||
`is_known_lang_token`, `type_for_token`, `is_known_type_token`,
|
||||
`patterns`). `SubtitleIdentifier` and `PatternDetector` depend on
|
||||
this Protocol instead of the concrete `SubtitleKnowledgeBase` from
|
||||
infrastructure — `domain/subtitles/` now has zero imports from
|
||||
`infrastructure/`. The remaining domain → infra leak
|
||||
(`domain/release/` loading separator YAML at import-time) is
|
||||
documented in tech-debt and scheduled for its own branch.
|
||||
- **`to_dot_folder_name(title)` helper** in
|
||||
`alfred/domain/shared/value_objects.py` — extracts the
|
||||
`re.sub(r"[^\w\s\.\-]", "", title).replace(" ", ".")` pattern that was
|
||||
duplicated between `MovieTitle.normalized()` and `TVShow.get_folder_name()`.
|
||||
- **`ParsedRelease.languages` uses `field(default_factory=list)`** instead of
|
||||
a manual `__post_init__` that assigned `[]` via `object.__setattr__`.
|
||||
- **`file_extensions.yaml` splits subtitle sidecars (`.srt`, `.sub`, `.idx`,
|
||||
`.ass`, `.ssa`) into a dedicated `subtitle:` category** instead of lumping
|
||||
them under `metadata:`. The `_METADATA_EXTENSIONS` set used by
|
||||
`detect_media_type` remains the union of both (same behavior — subtitles
|
||||
are still ignored when deciding the media type of a folder), but a new
|
||||
`load_subtitle_extensions()` loader is now available for the subtitles
|
||||
domain. Sematic clarity, no functional change.
|
||||
- **`tv_shows/entities.py` module docstring** now shows the aggregate
|
||||
ownership as an ASCII tree before the rule text — quicker visual scan
|
||||
of the DDD structure.
|
||||
- Removed backward-compat shims `_sanitise_for_fs` /
|
||||
`_strip_episode_from_normalised` from `domain/release/value_objects.py`
|
||||
(zero callers).
|
||||
- Cleaned ruff warnings across the codebase: `subprocess.run` calls now pass
|
||||
explicit `check=False` (PLW1510); lazy imports promoted to module top where
|
||||
there was no cycle (PLC0415 in `manage_subtitles.py`, `placer.py`,
|
||||
`qbittorrent/client.py`, `file_manager.py`); fixed module-level import
|
||||
ordering (E402) in `language_registry.py` and `subtitles/knowledge/loader.py`;
|
||||
removed unused locals (F841 / B007); replaced unnecessary set comprehension
|
||||
with `set()` in `release/knowledge.py` (C416).
|
||||
- Ruff config: ignore `PLR0911` / `PLR0912` (too-many-returns / too-many-branches)
|
||||
globally — noisy on parser mappers and orchestrator use-cases where early-return
|
||||
validation is essential complexity. Ignore `PLW0603` for the documented memory
|
||||
singleton (`infrastructure/persistence/context.py`).
|
||||
- **Release-knowledge DDD purification** (`refactor/domain-release-knowledge`):
|
||||
the last domain → infrastructure leak (`domain/release/value_objects.py`
|
||||
loading YAML at import-time) is gone. Achieved via:
|
||||
- **`ReleaseKnowledge` Protocol port** at
|
||||
`alfred/domain/release/ports/knowledge.py` declares the read-only query
|
||||
surface release parsing needs (token sets for resolutions, sources, codecs,
|
||||
languages, hdr extras; structured dicts for audio, video_meta, editions,
|
||||
media_type_tokens; separators list; file-extension sets used by
|
||||
application/infra callers; `sanitize_for_fs(text)` method).
|
||||
- **`YamlReleaseKnowledge` adapter** at
|
||||
`alfred/infrastructure/knowledge/release_kb.py` loads every YAML constant
|
||||
once at construction. Builds an immutable `str.maketrans` translation
|
||||
table for filesystem sanitization.
|
||||
- **`parse_release(name, kb)`** takes the knowledge as an explicit
|
||||
parameter — no more module-level YAML loading inside the domain. Every
|
||||
internal helper (`_tokenize`, `_extract_tech`, `_extract_languages`,
|
||||
`_extract_audio`, `_extract_video_meta`, `_extract_edition`,
|
||||
`_extract_title`, `_infer_media_type`, `_is_well_formed`) takes `kb`.
|
||||
- **`ParsedRelease` Option B**: sanitization happens once at parse time
|
||||
and is stored on a new `title_sanitized: str` field. Builder methods
|
||||
(`show_folder_name`, `season_folder_name`, `episode_filename`,
|
||||
`movie_folder_name`, `movie_filename`) are now pure — they accept
|
||||
already-sanitized `tmdb_title_safe` / `tmdb_episode_title_safe`
|
||||
arguments. Callers at the use-case boundary sanitize TMDB strings
|
||||
via `kb.sanitize_for_fs(...)` before passing them in.
|
||||
- **All domain-knowledge constants removed from `value_objects.py`**:
|
||||
`_RESOLUTIONS`, `_SOURCES`, `_CODECS`, `_AUDIO`, `_VIDEO_META`,
|
||||
`_EDITIONS`, `_HDR_EXTRA`, `_MEDIA_TYPE_TOKENS`, `_LANGUAGE_TOKENS`,
|
||||
`_FORBIDDEN_CHARS`, `_VIDEO_EXTENSIONS`, `_NON_VIDEO_EXTENSIONS`,
|
||||
`_SUBTITLE_EXTENSIONS`, `_METADATA_EXTENSIONS`, `_WIN_FORBIDDEN_TABLE`,
|
||||
and the `_sanitize_for_fs` helper. The domain module is now pure.
|
||||
- **Application-layer KB singleton**: `resolve_destination.py` instantiates
|
||||
a module-level `_KB: ReleaseKnowledge = YamlReleaseKnowledge()` and
|
||||
threads it through every `parse_release(...)` call. The local
|
||||
`_sanitize` helper and `_WIN_FORBIDDEN` regex were dropped in favor of
|
||||
`_KB.sanitize_for_fs(...)`.
|
||||
- **`detect_media_type(parsed, source_path, kb)` and
|
||||
`find_video_file(path, kb)`** now take the knowledge explicitly
|
||||
instead of importing `_*_EXTENSIONS` constants from the domain.
|
||||
`agent/tools/filesystem.py::analyze_release` imports the application
|
||||
KB singleton and passes it through.
|
||||
|
||||
---
|
||||
|
||||
## [2026-05-17] — TVShow & Movie aggregate refactor
|
||||
|
||||
Multi-phase refonte of the TV show domain into a real DDD aggregate, with
|
||||
matching parity work on `Movie`, a language knowledge system, and the
|
||||
`shared/media` restructure that supports both.
|
||||
|
||||
### Added
|
||||
|
||||
- **Language knowledge system** (`alfred/knowledge/iso_languages.yaml` + 42
|
||||
languages including `und` for undetermined).
|
||||
- `Language` value object (frozen dataclass) with `iso`, `english_name`,
|
||||
`native_name`, `aliases`, and a `matches(raw)` cross-format helper.
|
||||
- `LanguageRegistry` loader (`alfred/domain/shared/knowledge/`) merging
|
||||
builtin + learned YAML. Not a singleton — the application layer
|
||||
instantiates it.
|
||||
- ISO 639-2/B is the canonical key; aliases cover 639-1, 639-2/T, English
|
||||
name, native name, and common spellings.
|
||||
- **`VideoTrack`** dataclass (`alfred/domain/shared/media/video.py`) with a
|
||||
`resolution` property using width-priority bucket detection (handles
|
||||
cinema/scope crops like 1920×960 → 1080p).
|
||||
- **`shared/media/matching.py`** — `track_lang_matches` helper shared by
|
||||
`Episode` and `Movie`. Implements the **"C+" contract** for language helpers:
|
||||
- `Language` query → cross-format match via `Language.matches()`
|
||||
- `str` query → case-insensitive direct comparison (no normalization)
|
||||
- **TVShow aggregate composition**:
|
||||
- `TVShow.seasons: dict[SeasonNumber, Season]`
|
||||
- `Season.episodes: dict[EpisodeNumber, Episode]`
|
||||
- `Season.expected_episodes` / `Season.aired_episodes` (split so collection
|
||||
state can compare "owned vs aired today" without confusing in-flight
|
||||
seasons with future ones)
|
||||
- **Aggregate methods on `TVShow`**:
|
||||
- `add_episode(ep)` — sole sanctioned mutation entry point (creates the
|
||||
season if missing)
|
||||
- `add_season(season)` — replaces a season wholesale
|
||||
- `collection_status()` → `CollectionStatus.{EMPTY, PARTIAL, COMPLETE}`
|
||||
- `is_complete_series()` — true iff `ENDED + COMPLETE`
|
||||
- `missing_episodes()` — flat list of all aired-but-not-owned
|
||||
`(season, episode)` pairs
|
||||
- **`CollectionStatus`** enum (orthogonal to `ShowStatus`).
|
||||
- **Episode track helpers** (`has_audio_in`, `has_subtitles_in`,
|
||||
`has_forced_subs`, `audio_languages`, `subtitle_languages`), driven by
|
||||
`Episode.audio_tracks` / `Episode.subtitle_tracks`.
|
||||
- **Movie aggregate parity** — `Movie` now carries `audio_tracks` /
|
||||
`subtitle_tracks` and exposes the same helpers as `Episode` (same C+
|
||||
contract).
|
||||
- **`CHANGELOG.md`** (this file).
|
||||
|
||||
### Changed
|
||||
|
||||
- **`shared/media_info.py` exploded into `shared/media/{audio,video,subtitle,info,matching}.py`.**
|
||||
`MediaInfo` is now symmetric: every stream type is a `list[Track]`. Flat
|
||||
accessors (`width`, `height`, `video_codec`, `resolution`) remain as
|
||||
properties that read the first video track.
|
||||
- **`MediaInfo.duration_seconds` / `bitrate_kbps`** moved from `VideoTrack` to
|
||||
`MediaInfo` (file-level — they come from the ffprobe `format` block, not a
|
||||
stream). Files without a video stream now correctly expose duration.
|
||||
- **`ShowStatus.from_string`** extended to map TMDB strings (`Returning
|
||||
Series`, `In Production`, `Pilot`, `Planned`, `Canceled`, `Cancelled`).
|
||||
Comparison is whitespace-trimmed and case-insensitive.
|
||||
- **`Season` / `Episode`** dropped their `show_imdb_id` back-references. They
|
||||
are owned by `TVShow` and reached only through it.
|
||||
- **`TVShow.seasons_count` and `episode_count`** are now `@property` (computed
|
||||
from the dict) instead of stored ints.
|
||||
- **`TVShowService.parse_episode_from_filename`** rewritten in string
|
||||
operations (no regex). Supports `S01E05` / `s1e5` and `1x05` / `01x5` forms.
|
||||
- **`TVShowService.find_next_episode`** now drives off
|
||||
`show.missing_episodes()` instead of the hardcoded "max 50 episodes per
|
||||
season" heuristic.
|
||||
- **`TVShowService` constructor** no longer takes `season_repository` /
|
||||
`episode_repository` — the aggregate persists in one block via
|
||||
`TVShowRepository` only.
|
||||
- **`SubtitleTrack` in `alfred.domain.subtitles.entities` renamed to
|
||||
`SubtitleCandidate`.** Coexists with the `shared.media.SubtitleTrack`
|
||||
ffprobe-view dataclass (different bounded contexts, kept separate
|
||||
intentionally).
|
||||
- **`tv_shows/services.py` `_VIDEO_EXTENSIONS`** now loaded from
|
||||
`knowledge/release/file_extensions.yaml` via `load_video_extensions()`
|
||||
(single source of truth).
|
||||
- **`CLAUDE.md`** updated with three new policy sections:
|
||||
- "Tests" — small updates OK during normal work, no mass-update sprees
|
||||
- "Backwards-compatibility shims" — prefer clean migration over shims
|
||||
- "Regex" — not forbidden, use judgment when string ops would be fragile
|
||||
|
||||
### Removed
|
||||
|
||||
- **Legacy `Season N Episode N` filename form** in
|
||||
`TVShowService.parse_episode_from_filename`. It never appears in the release
|
||||
names Alfred handles, and supporting it forced a regex.
|
||||
- **`SeasonRepository` and `EpisodeRepository`** — only the aggregate root has
|
||||
a repository (DDD rule: one repo per aggregate).
|
||||
- **`shared/media_info.py`** compatibility shim — callers updated.
|
||||
- **`SubtitleTrack` compatibility alias** in `subtitles.entities` — callers
|
||||
updated to `SubtitleCandidate`.
|
||||
|
||||
### Fixed
|
||||
|
||||
- **`MediaInfo.duration_seconds` returns `None` on audio-only files** instead
|
||||
of crashing through `primary_video.duration_seconds` (see the duration/bitrate
|
||||
move under **Changed**).
|
||||
- **`MediaOrganizer`** (`infrastructure/filesystem/organizer.py`) no longer
|
||||
passes the removed `show_imdb_id` / `episode_count` kwargs when constructing
|
||||
a `Season` for folder-name generation.
|
||||
|
||||
### Internal
|
||||
|
||||
- Test suite rewritten where the aggregate redesign broke fixtures:
|
||||
`tests/domain/test_tv_shows.py` (69 tests), `tests/domain/test_media_info.py`
|
||||
(rewritten for `VideoTrack`), `tests/application/test_enrich_from_probe.py`
|
||||
(helper added), `tests/infrastructure/test_filesystem_extras.py` (fixtures),
|
||||
`tests/domain/test_tv_shows_service.py` (find_next_episode driven by real
|
||||
aggregate state).
|
||||
- Subtitle services internal migration: `matcher.py`, `utils.py`, `placer.py`,
|
||||
`identifier.py` updated to import `SubtitleCandidate`.
|
||||
- Suite status at end of block: **1066 passed, 8 skipped, 0 failed**.
|
||||
+8
-145
@@ -3,16 +3,13 @@
|
||||
import json
|
||||
import logging
|
||||
from collections.abc import AsyncGenerator
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from alfred.infrastructure.metadata import MetadataStore
|
||||
from alfred.infrastructure.persistence import get_memory
|
||||
from alfred.settings import settings
|
||||
|
||||
from .prompt import PromptBuilder
|
||||
from .prompts import PromptBuilder
|
||||
from .registry import Tool, make_tools
|
||||
from .workflows import WorkflowLoader
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -36,8 +33,8 @@ class Agent:
|
||||
self.settings = settings
|
||||
self.llm = llm
|
||||
self.tools: dict[str, Tool] = make_tools(settings)
|
||||
self.workflow_loader = WorkflowLoader()
|
||||
self.prompt_builder = PromptBuilder(self.tools, self.workflow_loader)
|
||||
self.prompt_builder = PromptBuilder(self.tools)
|
||||
self.settings = settings
|
||||
self.max_tool_iterations = max_tool_iterations
|
||||
|
||||
def step(self, user_input: str) -> str:
|
||||
@@ -142,7 +139,7 @@ class Agent:
|
||||
memory.save()
|
||||
return final_response
|
||||
|
||||
def _execute_tool_call(self, tool_call: dict[str, Any]) -> dict[str, Any]: # noqa: PLR0911
|
||||
def _execute_tool_call(self, tool_call: dict[str, Any]) -> dict[str, Any]:
|
||||
"""
|
||||
Execute a single tool call.
|
||||
|
||||
@@ -171,163 +168,29 @@ class Agent:
|
||||
"available_tools": available,
|
||||
}
|
||||
|
||||
# Defensive: reject calls to tools that are not currently in scope.
|
||||
visible = set(self.prompt_builder.visible_tool_names())
|
||||
if tool_name not in visible:
|
||||
return {
|
||||
"error": "tool_out_of_scope",
|
||||
"message": (
|
||||
f"Tool '{tool_name}' is not available in the current "
|
||||
"workflow scope. Call end_workflow first or start the "
|
||||
"appropriate workflow."
|
||||
),
|
||||
"available_tools": sorted(visible),
|
||||
}
|
||||
|
||||
tool = self.tools[tool_name]
|
||||
memory = get_memory()
|
||||
|
||||
# Cache lookup — for tools flagged cacheable, short-circuit on hit.
|
||||
cache_key_value = self._cache_key_for(tool, args)
|
||||
if cache_key_value is not None:
|
||||
cached = memory.stm.tool_results.get(tool_name, cache_key_value)
|
||||
if cached is not None:
|
||||
logger.info(f"Tool cache HIT: {tool_name}[{cache_key_value}]")
|
||||
self._post_tool_side_effects(tool_name, args, cached, from_cache=True)
|
||||
return {**cached, "_from_cache": True}
|
||||
|
||||
# Execute tool
|
||||
try:
|
||||
result = tool.func(**args)
|
||||
return result
|
||||
except KeyboardInterrupt:
|
||||
# Don't catch KeyboardInterrupt - let it propagate
|
||||
raise
|
||||
except TypeError as e:
|
||||
# Bad arguments
|
||||
memory = get_memory()
|
||||
memory.episodic.add_error(tool_name, f"bad_args: {e}")
|
||||
return {"error": "bad_args", "message": str(e), "tool": tool_name}
|
||||
except Exception as e:
|
||||
# Other errors
|
||||
memory = get_memory()
|
||||
memory.episodic.add_error(tool_name, str(e))
|
||||
return {"error": "execution_failed", "message": str(e), "tool": tool_name}
|
||||
|
||||
# Persist + side effects only on successful results.
|
||||
if isinstance(result, dict) and result.get("status") == "ok":
|
||||
if cache_key_value is not None:
|
||||
memory.stm.tool_results.put(tool_name, cache_key_value, result)
|
||||
self._post_tool_side_effects(tool_name, args, result, from_cache=False)
|
||||
memory.save()
|
||||
|
||||
return result
|
||||
|
||||
@staticmethod
|
||||
def _cache_key_for(tool: Tool, args: dict[str, Any]) -> str | None:
|
||||
"""Return the cache key value for this call, or None if not cacheable."""
|
||||
if tool.cache_key is None:
|
||||
return None
|
||||
value = args.get(tool.cache_key)
|
||||
if value is None:
|
||||
return None
|
||||
return str(value)
|
||||
|
||||
def _post_tool_side_effects(
|
||||
self,
|
||||
tool_name: str,
|
||||
args: dict[str, Any],
|
||||
result: dict[str, Any],
|
||||
*,
|
||||
from_cache: bool,
|
||||
) -> None:
|
||||
"""
|
||||
Tool-agnostic side effects applied after a successful run or cache hit.
|
||||
|
||||
Today:
|
||||
- Update release_focus when a path-keyed inspector runs.
|
||||
- Persist inspector results into the release's `.alfred/metadata.yaml`.
|
||||
- Refresh episodic.last_search_results on find_torrent cache hits so
|
||||
get_torrent_by_index keeps pointing at the right list.
|
||||
"""
|
||||
memory = get_memory()
|
||||
tool = self.tools.get(tool_name)
|
||||
|
||||
# Release focus: any path-keyed inspector updates current_release_path.
|
||||
if tool is not None and tool.cache_key in {"source_path"}:
|
||||
path = args.get(tool.cache_key)
|
||||
if isinstance(path, str) and path:
|
||||
memory.stm.release_focus.focus(path)
|
||||
|
||||
# Persist inspector results to .alfred/metadata.yaml (skip on cache
|
||||
# hit — the file is already up to date from the original run).
|
||||
if not from_cache:
|
||||
self._maybe_update_alfred(tool_name, args, result)
|
||||
|
||||
# Episodic refresh when find_torrent's cache short-circuits the call.
|
||||
if from_cache and tool_name == "find_torrent":
|
||||
torrents = result.get("torrents") or []
|
||||
query = args.get("media_title") or ""
|
||||
memory.episodic.store_search_results(
|
||||
query=query, results=torrents, search_type="torrent"
|
||||
)
|
||||
|
||||
def _maybe_update_alfred(
|
||||
self,
|
||||
tool_name: str,
|
||||
args: dict[str, Any],
|
||||
result: dict[str, Any],
|
||||
) -> None:
|
||||
"""
|
||||
Persist a successful inspector result into the release's
|
||||
`.alfred/metadata.yaml`. No-op when the release root can't be resolved.
|
||||
"""
|
||||
if tool_name not in {"analyze_release", "probe_media", "find_media_imdb_id"}:
|
||||
return
|
||||
|
||||
release_root = self._resolve_release_root(tool_name, args)
|
||||
if release_root is None:
|
||||
return
|
||||
|
||||
try:
|
||||
store = MetadataStore(release_root)
|
||||
if tool_name == "analyze_release":
|
||||
store.update_parse(result)
|
||||
elif tool_name == "probe_media":
|
||||
store.update_probe(result)
|
||||
elif tool_name == "find_media_imdb_id":
|
||||
store.update_tmdb(result)
|
||||
except Exception as e:
|
||||
logger.warning(
|
||||
f"Failed to update .alfred for {tool_name} at {release_root}: {e}"
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def _resolve_release_root(
|
||||
tool_name: str,
|
||||
args: dict[str, Any],
|
||||
) -> Path | None:
|
||||
"""
|
||||
Figure out which release folder owns this call.
|
||||
|
||||
- analyze_release / probe_media: derived from source_path
|
||||
(folder kept as-is, file walked up to its parent).
|
||||
- find_media_imdb_id: follow the current release focus in STM.
|
||||
"""
|
||||
if tool_name in {"analyze_release", "probe_media"}:
|
||||
raw = args.get("source_path")
|
||||
if not isinstance(raw, str) or not raw:
|
||||
return None
|
||||
path = Path(raw)
|
||||
return path if path.is_dir() else path.parent
|
||||
|
||||
# find_media_imdb_id has no path arg — rely on release focus.
|
||||
focus = get_memory().stm.release_focus.current_release_path
|
||||
if not focus:
|
||||
return None
|
||||
path = Path(focus)
|
||||
return path if path.is_dir() else path.parent
|
||||
|
||||
async def step_streaming(
|
||||
self, user_input: str, completion_id: str, created_ts: int, model: str
|
||||
) -> AsyncGenerator[dict[str, Any]]:
|
||||
) -> AsyncGenerator[dict[str, Any], None]:
|
||||
"""
|
||||
Execute agent step with streaming support for LibreChat.
|
||||
|
||||
|
||||
@@ -1,79 +0,0 @@
|
||||
"""Expression loader — charge et merge les fichiers YAML d'expressions par user."""
|
||||
|
||||
import random
|
||||
from pathlib import Path
|
||||
|
||||
import yaml
|
||||
|
||||
_USERS_DIR = Path(__file__).parent.parent / "knowledge" / "users"
|
||||
|
||||
|
||||
def _load_yaml(path: Path) -> dict:
|
||||
if not path.exists():
|
||||
return {}
|
||||
return yaml.safe_load(path.read_text(encoding="utf-8")) or {}
|
||||
|
||||
|
||||
def load_expressions(username: str | None) -> dict:
|
||||
"""
|
||||
Charge common.yaml et le merge avec {username}.yaml.
|
||||
|
||||
Retourne un dict avec :
|
||||
- nickname: str (surnom de l'user, ou username en fallback)
|
||||
- expressions: dict[situation -> list[str]]
|
||||
"""
|
||||
common = _load_yaml(_USERS_DIR / "common.yaml")
|
||||
user_data = _load_yaml(_USERS_DIR / f"{username}.yaml") if username else {}
|
||||
|
||||
# Merge expressions : common + user (les phrases user s'ajoutent)
|
||||
common_exprs: dict[str, list] = common.get("expressions", {})
|
||||
user_exprs: dict[str, list] = user_data.get("expressions", {})
|
||||
|
||||
merged: dict[str, list] = {}
|
||||
all_situations = set(common_exprs) | set(user_exprs)
|
||||
for situation in all_situations:
|
||||
base = list(common_exprs.get(situation, []))
|
||||
extra = list(user_exprs.get(situation, []))
|
||||
merged[situation] = base + extra
|
||||
|
||||
nickname = user_data.get("user", {}).get("nickname") or username or "mec"
|
||||
|
||||
return {
|
||||
"nickname": nickname,
|
||||
"expressions": merged,
|
||||
}
|
||||
|
||||
|
||||
def pick(expressions: dict, situation: str, nickname: str | None = None) -> str:
|
||||
"""
|
||||
Pioche une expression aléatoire pour une situation donnée.
|
||||
|
||||
Résout {user} avec le nickname si fourni.
|
||||
Retourne une string vide si la situation n'existe pas.
|
||||
"""
|
||||
options = expressions.get("expressions", {}).get(situation, [])
|
||||
if not options:
|
||||
return ""
|
||||
chosen = random.choice(options)
|
||||
if nickname:
|
||||
chosen = chosen.replace("{user}", nickname)
|
||||
return chosen
|
||||
|
||||
|
||||
def build_expressions_context(username: str | None) -> dict:
|
||||
"""
|
||||
Point d'entrée principal.
|
||||
|
||||
Retourne :
|
||||
- nickname: str
|
||||
- samples: dict[situation -> une phrase résolue] — une seule par situation
|
||||
"""
|
||||
data = load_expressions(username)
|
||||
nickname = data["nickname"]
|
||||
samples = {
|
||||
situation: pick(data, situation, nickname) for situation in data["expressions"]
|
||||
}
|
||||
return {
|
||||
"nickname": nickname,
|
||||
"samples": samples,
|
||||
}
|
||||
@@ -6,8 +6,7 @@ from typing import Any
|
||||
import requests
|
||||
from requests.exceptions import HTTPError, RequestException, Timeout
|
||||
|
||||
from alfred.settings import Settings
|
||||
from alfred.settings import settings as default_settings
|
||||
from alfred.settings import Settings, settings
|
||||
|
||||
from .exceptions import LLMAPIError, LLMConfigurationError
|
||||
|
||||
@@ -37,7 +36,6 @@ class DeepSeekClient:
|
||||
Raises:
|
||||
LLMConfigurationError: If API key is missing
|
||||
"""
|
||||
self.settings = settings or default_settings
|
||||
self.api_key = api_key or self.settings.deepseek_api_key
|
||||
self.base_url = base_url or self.settings.deepseek_base_url
|
||||
self.model = model or self.settings.deepseek_model
|
||||
@@ -98,7 +96,7 @@ class DeepSeekClient:
|
||||
payload = {
|
||||
"model": self.model,
|
||||
"messages": messages,
|
||||
"temperature": self.settings.llm_temperature,
|
||||
"temperature": settings.llm_temperature,
|
||||
}
|
||||
|
||||
# Add tools if provided
|
||||
|
||||
@@ -7,7 +7,6 @@ import requests
|
||||
from requests.exceptions import HTTPError, RequestException, Timeout
|
||||
|
||||
from alfred.settings import Settings
|
||||
from alfred.settings import settings as default_settings
|
||||
|
||||
from .exceptions import LLMAPIError, LLMConfigurationError
|
||||
|
||||
@@ -47,12 +46,11 @@ class OllamaClient:
|
||||
Raises:
|
||||
LLMConfigurationError: If configuration is invalid
|
||||
"""
|
||||
self.settings = settings or default_settings
|
||||
self.base_url = base_url or self.settings.ollama_base_url
|
||||
self.model = model or self.settings.ollama_model
|
||||
self.timeout = timeout or self.settings.request_timeout
|
||||
self.base_url = base_url or settings.ollama_base_url
|
||||
self.model = model or settings.ollama_model
|
||||
self.timeout = timeout or settings.request_timeout
|
||||
self.temperature = (
|
||||
temperature if temperature is not None else self.settings.llm_temperature
|
||||
temperature if temperature is not None else settings.llm_temperature
|
||||
)
|
||||
|
||||
if not self.base_url:
|
||||
|
||||
@@ -0,0 +1,101 @@
|
||||
# agent/parameters.py
|
||||
from collections.abc import Callable
|
||||
from dataclasses import dataclass
|
||||
from typing import Any
|
||||
|
||||
|
||||
@dataclass
|
||||
class ParameterSchema:
|
||||
"""Describes a required parameter for the agent."""
|
||||
|
||||
key: str
|
||||
description: str
|
||||
why_needed: str # Explanation for the AI
|
||||
type: str # "string", "number", "object", etc.
|
||||
validator: Callable[[Any], bool] | None = None
|
||||
default: Any = None
|
||||
required: bool = True
|
||||
|
||||
|
||||
# Define all required parameters
|
||||
REQUIRED_PARAMETERS = [
|
||||
ParameterSchema(
|
||||
key="config",
|
||||
description="Configuration object containing all folder paths",
|
||||
why_needed=(
|
||||
"This contains the paths to all important folders:\n"
|
||||
"- download_folder: Where downloaded files arrive before being organized\n"
|
||||
"- tvshow_folder: Where TV show files are organized and stored\n"
|
||||
"- movie_folder: Where movie files are organized and stored\n"
|
||||
"- torrent_folder: Where torrent structures are saved for the torrent client"
|
||||
),
|
||||
type="object",
|
||||
validator=lambda x: isinstance(x, dict),
|
||||
required=True,
|
||||
default={},
|
||||
),
|
||||
ParameterSchema(
|
||||
key="tv_shows",
|
||||
description="List of TV shows the user is following",
|
||||
why_needed=(
|
||||
"This tracks which TV shows you're following. "
|
||||
"Each show includes: IMDB ID, title, number of seasons, and status (ongoing or ended)."
|
||||
),
|
||||
type="array",
|
||||
validator=lambda x: isinstance(x, list),
|
||||
required=False,
|
||||
default=[],
|
||||
),
|
||||
]
|
||||
|
||||
|
||||
def get_parameter_schema(key: str) -> ParameterSchema | None:
|
||||
"""Get schema for a specific parameter."""
|
||||
for param in REQUIRED_PARAMETERS:
|
||||
if param.key == key:
|
||||
return param
|
||||
return None
|
||||
|
||||
|
||||
def get_missing_required_parameters(memory_data: dict) -> list[ParameterSchema]:
|
||||
"""Get list of required parameters that are missing or None."""
|
||||
missing = []
|
||||
for param in REQUIRED_PARAMETERS:
|
||||
if param.required:
|
||||
value = memory_data.get(param.key)
|
||||
if value is None:
|
||||
missing.append(param)
|
||||
return missing
|
||||
|
||||
|
||||
def format_parameters_for_prompt() -> str:
|
||||
"""Format parameter descriptions for the AI system prompt."""
|
||||
lines = ["REQUIRED PARAMETERS:"]
|
||||
for param in REQUIRED_PARAMETERS:
|
||||
status = "REQUIRED" if param.required else "OPTIONAL"
|
||||
lines.append(f"\n- {param.key} ({status}):")
|
||||
lines.append(f" Description: {param.description}")
|
||||
lines.append(f" Why needed: {param.why_needed}")
|
||||
lines.append(f" Type: {param.type}")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def validate_parameter(key: str, value: Any) -> tuple[bool, str | None]:
|
||||
"""
|
||||
Validate a parameter value against its schema.
|
||||
|
||||
Returns:
|
||||
(is_valid, error_message)
|
||||
"""
|
||||
schema = get_parameter_schema(key)
|
||||
if not schema:
|
||||
return True, None # Unknown parameters are allowed
|
||||
|
||||
if schema.validator:
|
||||
try:
|
||||
if not schema.validator(value):
|
||||
return False, f"Validation failed for {key}"
|
||||
except Exception as e:
|
||||
return False, f"Validation error for {key}: {str(e)}"
|
||||
|
||||
return True, None
|
||||
@@ -1,333 +0,0 @@
|
||||
"""Prompt builder for the agent system."""
|
||||
|
||||
import json
|
||||
from typing import Any
|
||||
|
||||
from alfred.infrastructure.persistence import get_memory
|
||||
from alfred.infrastructure.persistence.memory import MemoryRegistry
|
||||
|
||||
from .expressions import build_expressions_context
|
||||
from .registry import Tool
|
||||
from .workflows import WorkflowLoader
|
||||
|
||||
# Tools that are always available, regardless of workflow scope.
|
||||
# Kept small on purpose — the noyau is what the agent uses to either
|
||||
# answer trivially or pivot into a workflow.
|
||||
CORE_TOOLS: tuple[str, ...] = (
|
||||
"set_language",
|
||||
"set_path_for_folder",
|
||||
"list_folder",
|
||||
"read_release_metadata",
|
||||
"query_library",
|
||||
"start_workflow",
|
||||
"end_workflow",
|
||||
)
|
||||
|
||||
|
||||
class PromptBuilder:
|
||||
"""Builds system prompts for the agent with memory context."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
tools: dict[str, Tool],
|
||||
workflow_loader: WorkflowLoader | None = None,
|
||||
):
|
||||
self.tools = tools
|
||||
self.workflow_loader = workflow_loader or WorkflowLoader()
|
||||
self._memory_registry = MemoryRegistry()
|
||||
|
||||
def _active_workflow(self, memory) -> dict | None:
|
||||
"""Return the YAML definition of the active workflow, or None."""
|
||||
current = memory.stm.workflow.current
|
||||
if current is None:
|
||||
return None
|
||||
return self.workflow_loader.get(current.get("name"))
|
||||
|
||||
def visible_tool_names(self) -> list[str]:
|
||||
"""
|
||||
Return the names of the tools currently in scope.
|
||||
|
||||
- Idle (no workflow): core noyau only. The LLM enters a workflow
|
||||
via start_workflow to access more tools.
|
||||
- Workflow active: core noyau + the workflow's declared tools.
|
||||
"""
|
||||
memory = get_memory()
|
||||
visible = set(CORE_TOOLS)
|
||||
workflow = self._active_workflow(memory)
|
||||
if workflow is not None:
|
||||
for name in workflow.get("tools", []):
|
||||
visible.add(name)
|
||||
# Only return tools that actually exist in the registry.
|
||||
return [name for name in self.tools if name in visible]
|
||||
|
||||
def _format_identity(self, memory) -> str:
|
||||
"""Build Alfred's identity and personality section."""
|
||||
username = memory.stm.get_entity("username")
|
||||
expr = build_expressions_context(username)
|
||||
nickname = expr["nickname"]
|
||||
samples = expr["samples"]
|
||||
|
||||
# Format expressions as situational guidance for the LLM
|
||||
expr_lines = []
|
||||
situation_labels = {
|
||||
"greeting": "Salutation",
|
||||
"success": "Succès",
|
||||
"working": "En cours",
|
||||
"error": "Erreur",
|
||||
"unclear": "Demande floue",
|
||||
"warning": "Avertissement",
|
||||
"not_found": "Introuvable",
|
||||
}
|
||||
for situation, label in situation_labels.items():
|
||||
phrase = samples.get(situation, "")
|
||||
if phrase:
|
||||
expr_lines.append(f' {label}: "{phrase}"')
|
||||
|
||||
expressions_block = "\n".join(expr_lines)
|
||||
|
||||
return f"""Tu t'appelles Alfred. Tu es un assistant d'organisation de médiathèque — direct, opérationnel, légèrement impertinent.
|
||||
|
||||
PERSONNALITÉ:
|
||||
- Tu parles franglais : français avec des touches d'anglais quand ça colle mieux
|
||||
- Tu es sarcastique et n'as pas peur de te moquer si l'user fait une connerie
|
||||
- Tu envoies chier poliment (mais clairement) quand la demande est trop floue
|
||||
- Tu ne fais pas de blabla inutile. Si "ok c'est fait" suffit, c'est tout ce que tu dis
|
||||
- Tu peux jurer (putain, merde, con, ...) — c'est naturel, pas du remplissage
|
||||
- Jamais de "Great question!" ou de politesse creuse
|
||||
|
||||
USER COURANT: {nickname}
|
||||
|
||||
EXPRESSIONS À UTILISER (une par situation, naturellement intégrées dans ta réponse) :
|
||||
{expressions_block}"""
|
||||
|
||||
def build_tools_spec(self) -> list[dict[str, Any]]:
|
||||
"""Build the tool specification for the LLM API (scope-filtered)."""
|
||||
visible = set(self.visible_tool_names())
|
||||
tool_specs = []
|
||||
for tool in self.tools.values():
|
||||
if tool.name not in visible:
|
||||
continue
|
||||
spec = {
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": tool.name,
|
||||
"description": tool.description,
|
||||
"parameters": tool.parameters,
|
||||
},
|
||||
}
|
||||
tool_specs.append(spec)
|
||||
return tool_specs
|
||||
|
||||
def _format_tools_description(self) -> str:
|
||||
"""Format the currently-visible tools with description + params."""
|
||||
visible = set(self.visible_tool_names())
|
||||
visible_tools = [t for t in self.tools.values() if t.name in visible]
|
||||
if not visible_tools:
|
||||
return ""
|
||||
return "\n".join(
|
||||
f"- {tool.name}: {tool.description}\n"
|
||||
f" Parameters: {json.dumps(tool.parameters, ensure_ascii=False)}"
|
||||
for tool in visible_tools
|
||||
)
|
||||
|
||||
def _format_workflow_scope(self, memory) -> str:
|
||||
"""Describe the current workflow scope so the LLM has a plan."""
|
||||
workflow = self._active_workflow(memory)
|
||||
if workflow is None:
|
||||
available = self.workflow_loader.names()
|
||||
if not available:
|
||||
return ""
|
||||
lines = ["WORKFLOW SCOPE: idle (broad catalog narrowed to core noyau)."]
|
||||
lines.append(
|
||||
" Call start_workflow(workflow_name, params) to enter a scope."
|
||||
)
|
||||
lines.append(" Available workflows:")
|
||||
for name in available:
|
||||
wf = self.workflow_loader.get(name) or {}
|
||||
desc = (wf.get("description") or "").strip().splitlines()
|
||||
summary = desc[0] if desc else ""
|
||||
lines.append(f" - {name}: {summary}")
|
||||
return "\n".join(lines)
|
||||
|
||||
current = memory.stm.workflow.current or {}
|
||||
lines = [
|
||||
f"WORKFLOW SCOPE: active — {current.get('name')} "
|
||||
f"(stage: {current.get('stage')})",
|
||||
]
|
||||
params = current.get("params")
|
||||
if params:
|
||||
lines.append(f" Params: {params}")
|
||||
wf_desc = (workflow.get("description") or "").strip()
|
||||
if wf_desc:
|
||||
lines.append(f" Goal: {wf_desc}")
|
||||
steps = workflow.get("steps", [])
|
||||
if steps:
|
||||
lines.append(" Steps:")
|
||||
for step in steps:
|
||||
step_id = step.get("id", "?")
|
||||
step_tool = step.get("tool") or (
|
||||
"ask_user" if step.get("ask_user") else "—"
|
||||
)
|
||||
lines.append(f" - {step_id} ({step_tool})")
|
||||
lines.append(" Call end_workflow(reason) when done, cancelled, or off-topic.")
|
||||
return "\n".join(lines)
|
||||
|
||||
def _format_episodic_context(self, memory) -> str:
|
||||
"""Format episodic memory context for the prompt."""
|
||||
lines = []
|
||||
|
||||
if memory.episodic.last_search_results:
|
||||
results = memory.episodic.last_search_results
|
||||
result_list = results.get("results", [])
|
||||
lines.append(
|
||||
f"\nLAST SEARCH: '{results.get('query')}' ({len(result_list)} results)"
|
||||
)
|
||||
# Show first 5 results
|
||||
for i, result in enumerate(result_list[:5]):
|
||||
name = result.get("name", "Unknown")
|
||||
lines.append(f" {i + 1}. {name}")
|
||||
if len(result_list) > 5:
|
||||
lines.append(f" ... and {len(result_list) - 5} more")
|
||||
|
||||
if memory.episodic.pending_question:
|
||||
question = memory.episodic.pending_question
|
||||
lines.append(f"\nPENDING QUESTION: {question.get('question')}")
|
||||
lines.append(f" Type: {question.get('type')}")
|
||||
if question.get("options"):
|
||||
lines.append(f" Options: {len(question.get('options'))}")
|
||||
|
||||
if memory.episodic.active_downloads:
|
||||
lines.append(f"\nACTIVE DOWNLOADS: {len(memory.episodic.active_downloads)}")
|
||||
for dl in memory.episodic.active_downloads[:3]:
|
||||
lines.append(f" - {dl.get('name')}: {dl.get('progress', 0)}%")
|
||||
|
||||
if memory.episodic.recent_errors:
|
||||
lines.append("\nRECENT ERRORS (up to 3):")
|
||||
for error in memory.episodic.recent_errors[-3:]:
|
||||
lines.append(
|
||||
f" - Action '{error.get('action')}' failed: {error.get('error')}"
|
||||
)
|
||||
|
||||
# Unread events
|
||||
unread = [e for e in memory.episodic.background_events if not e.get("read")]
|
||||
if unread:
|
||||
lines.append(f"\nUNREAD EVENTS: {len(unread)}")
|
||||
for event in unread[:3]:
|
||||
lines.append(f" - {event.get('type')}: {event.get('data')}")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
def _format_stm_context(self, memory) -> str:
|
||||
"""Format short-term memory context for the prompt."""
|
||||
lines = []
|
||||
|
||||
if memory.stm.current_workflow:
|
||||
workflow = memory.stm.current_workflow
|
||||
lines.append(
|
||||
f"CURRENT WORKFLOW: {workflow.get('name')} (stage: {workflow.get('stage')})"
|
||||
)
|
||||
if workflow.get("params"):
|
||||
lines.append(f" Params: {workflow.get('params')}")
|
||||
|
||||
if memory.stm.current_topic:
|
||||
lines.append(f"CURRENT TOPIC: {memory.stm.current_topic}")
|
||||
|
||||
if memory.stm.extracted_entities:
|
||||
lines.append("EXTRACTED ENTITIES:")
|
||||
for key, value in memory.stm.extracted_entities.items():
|
||||
lines.append(f" - {key}: {value}")
|
||||
|
||||
if memory.stm.language:
|
||||
lines.append(f"CONVERSATION LANGUAGE: {memory.stm.language}")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
def _format_memory_schema(self) -> str:
|
||||
"""Describe available memory components so the agent knows what to read/write and when."""
|
||||
schema = self._memory_registry.schema()
|
||||
tier_labels = {
|
||||
"ltm": "LONG-TERM (persisted)",
|
||||
"stm": "SHORT-TERM (session)",
|
||||
"episodic": "EPISODIC (volatile)",
|
||||
}
|
||||
lines = ["MEMORY COMPONENTS:"]
|
||||
|
||||
for tier, components in schema.items():
|
||||
if not components:
|
||||
continue
|
||||
lines.append(f"\n [{tier_labels.get(tier, tier.upper())}]")
|
||||
for c in components:
|
||||
access = c.get("access", "read")
|
||||
lines.append(f" {c['name']} ({access}): {c['description']}")
|
||||
for field_name, field_desc in c.get("fields", {}).items():
|
||||
lines.append(f" · {field_name}: {field_desc}")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
def _format_config_context(self, memory) -> str:
|
||||
"""Format configuration context."""
|
||||
lines = ["CURRENT CONFIGURATION:"]
|
||||
folders = {
|
||||
**memory.ltm.workspace.as_dict(),
|
||||
**memory.ltm.library_paths.to_dict(),
|
||||
}
|
||||
if folders:
|
||||
for key, value in folders.items():
|
||||
lines.append(f" - {key}: {value}")
|
||||
else:
|
||||
lines.append(" (no configuration set)")
|
||||
return "\n".join(lines)
|
||||
|
||||
def build_system_prompt(self) -> str:
|
||||
"""Build the complete system prompt."""
|
||||
memory = get_memory()
|
||||
|
||||
# Identity + personality
|
||||
identity = self._format_identity(memory)
|
||||
|
||||
# Language instruction
|
||||
language_instruction = (
|
||||
"Si la langue de l'user est différente de la langue courante en STM, "
|
||||
"appelle `set_language` en premier avant de répondre."
|
||||
)
|
||||
|
||||
# Configuration
|
||||
config_section = self._format_config_context(memory)
|
||||
|
||||
# STM context
|
||||
stm_context = self._format_stm_context(memory)
|
||||
|
||||
# Episodic context
|
||||
episodic_context = self._format_episodic_context(memory)
|
||||
|
||||
# Memory schema
|
||||
memory_schema = self._format_memory_schema()
|
||||
|
||||
# Workflow scope (active workflow plan or list of options)
|
||||
workflow_section = self._format_workflow_scope(memory)
|
||||
|
||||
# Available tools (already filtered by scope)
|
||||
tools_desc = self._format_tools_description()
|
||||
tools_section = f"\nOUTILS DISPONIBLES:\n{tools_desc}" if tools_desc else ""
|
||||
|
||||
rules = """
|
||||
RÈGLES:
|
||||
- Utilise les outils pour accomplir les tâches, pas pour décorer
|
||||
- Si des résultats de recherche sont dispo en mémoire épisodique, référence-les par index
|
||||
- Confirme toujours avant une opération destructive (move, delete, overwrite)
|
||||
- Réponses courtes — si c'est fait, dis-le en une ligne
|
||||
- Si la demande est floue, demande un éclaircissement AVANT de lancer quoi que ce soit
|
||||
"""
|
||||
|
||||
sections = [
|
||||
identity,
|
||||
language_instruction,
|
||||
config_section,
|
||||
stm_context,
|
||||
episodic_context,
|
||||
memory_schema,
|
||||
workflow_section,
|
||||
tools_section,
|
||||
rules,
|
||||
]
|
||||
return "\n\n".join(s for s in sections if s and s.strip())
|
||||
@@ -0,0 +1,206 @@
|
||||
"""Prompt builder for the agent system."""
|
||||
|
||||
import json
|
||||
from typing import Any
|
||||
|
||||
from alfred.infrastructure.persistence import get_memory
|
||||
from alfred.infrastructure.persistence.memory import MemoryRegistry
|
||||
|
||||
from .registry import Tool
|
||||
|
||||
|
||||
class PromptBuilder:
|
||||
"""Builds system prompts for the agent with memory context."""
|
||||
|
||||
def __init__(self, tools: dict[str, Tool]):
|
||||
self.tools = tools
|
||||
self._memory_registry = MemoryRegistry()
|
||||
|
||||
def build_tools_spec(self) -> list[dict[str, Any]]:
|
||||
"""Build the tool specification for the LLM API."""
|
||||
tool_specs = []
|
||||
for tool in self.tools.values():
|
||||
spec = {
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": tool.name,
|
||||
"description": tool.description,
|
||||
"parameters": tool.parameters,
|
||||
},
|
||||
}
|
||||
tool_specs.append(spec)
|
||||
return tool_specs
|
||||
|
||||
def _format_tools_description(self) -> str:
|
||||
"""Format tools with their descriptions and parameters."""
|
||||
if not self.tools:
|
||||
return ""
|
||||
return "\n".join(
|
||||
f"- {tool.name}: {tool.description}\n"
|
||||
f" Parameters: {json.dumps(tool.parameters, ensure_ascii=False)}"
|
||||
for tool in self.tools.values()
|
||||
)
|
||||
|
||||
def _format_episodic_context(self, memory) -> str:
|
||||
"""Format episodic memory context for the prompt."""
|
||||
lines = []
|
||||
|
||||
if memory.episodic.last_search_results:
|
||||
results = memory.episodic.last_search_results
|
||||
result_list = results.get("results", [])
|
||||
lines.append(
|
||||
f"\nLAST SEARCH: '{results.get('query')}' ({len(result_list)} results)"
|
||||
)
|
||||
# Show first 5 results
|
||||
for i, result in enumerate(result_list[:5]):
|
||||
name = result.get("name", "Unknown")
|
||||
lines.append(f" {i + 1}. {name}")
|
||||
if len(result_list) > 5:
|
||||
lines.append(f" ... and {len(result_list) - 5} more")
|
||||
|
||||
if memory.episodic.pending_question:
|
||||
question = memory.episodic.pending_question
|
||||
lines.append(f"\nPENDING QUESTION: {question.get('question')}")
|
||||
lines.append(f" Type: {question.get('type')}")
|
||||
if question.get("options"):
|
||||
lines.append(f" Options: {len(question.get('options'))}")
|
||||
|
||||
if memory.episodic.active_downloads:
|
||||
lines.append(f"\nACTIVE DOWNLOADS: {len(memory.episodic.active_downloads)}")
|
||||
for dl in memory.episodic.active_downloads[:3]:
|
||||
lines.append(f" - {dl.get('name')}: {dl.get('progress', 0)}%")
|
||||
|
||||
if memory.episodic.recent_errors:
|
||||
lines.append("\nRECENT ERRORS (up to 3):")
|
||||
for error in memory.episodic.recent_errors[-3:]:
|
||||
lines.append(
|
||||
f" - Action '{error.get('action')}' failed: {error.get('error')}"
|
||||
)
|
||||
|
||||
# Unread events
|
||||
unread = [e for e in memory.episodic.background_events if not e.get("read")]
|
||||
if unread:
|
||||
lines.append(f"\nUNREAD EVENTS: {len(unread)}")
|
||||
for event in unread[:3]:
|
||||
lines.append(f" - {event.get('type')}: {event.get('data')}")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
def _format_stm_context(self, memory) -> str:
|
||||
"""Format short-term memory context for the prompt."""
|
||||
lines = []
|
||||
|
||||
if memory.stm.current_workflow:
|
||||
workflow = memory.stm.current_workflow
|
||||
lines.append(
|
||||
f"CURRENT WORKFLOW: {workflow.get('type')} (stage: {workflow.get('stage')})"
|
||||
)
|
||||
if workflow.get("target"):
|
||||
lines.append(f" Target: {workflow.get('target')}")
|
||||
|
||||
if memory.stm.current_topic:
|
||||
lines.append(f"CURRENT TOPIC: {memory.stm.current_topic}")
|
||||
|
||||
if memory.stm.extracted_entities:
|
||||
lines.append("EXTRACTED ENTITIES:")
|
||||
for key, value in memory.stm.extracted_entities.items():
|
||||
lines.append(f" - {key}: {value}")
|
||||
|
||||
if memory.stm.language:
|
||||
lines.append(f"CONVERSATION LANGUAGE: {memory.stm.language}")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
def _format_memory_schema(self) -> str:
|
||||
"""Describe available memory components so the agent knows what to read/write and when."""
|
||||
schema = self._memory_registry.schema()
|
||||
tier_labels = {"ltm": "LONG-TERM (persisted)", "stm": "SHORT-TERM (session)", "episodic": "EPISODIC (volatile)"}
|
||||
lines = ["MEMORY COMPONENTS:"]
|
||||
|
||||
for tier, components in schema.items():
|
||||
if not components:
|
||||
continue
|
||||
lines.append(f"\n [{tier_labels.get(tier, tier.upper())}]")
|
||||
for c in components:
|
||||
access = c.get("access", "read")
|
||||
lines.append(f" {c['name']} ({access}): {c['description']}")
|
||||
for field_name, field_desc in c.get("fields", {}).items():
|
||||
lines.append(f" · {field_name}: {field_desc}")
|
||||
|
||||
return "\n".join(lines)
|
||||
|
||||
def _format_config_context(self, memory) -> str:
|
||||
"""Format configuration context."""
|
||||
lines = ["CURRENT CONFIGURATION:"]
|
||||
folders = {**memory.ltm.workspace.as_dict(), **memory.ltm.library_paths.to_dict()}
|
||||
if folders:
|
||||
for key, value in folders.items():
|
||||
lines.append(f" - {key}: {value}")
|
||||
else:
|
||||
lines.append(" (no configuration set)")
|
||||
return "\n".join(lines)
|
||||
|
||||
def build_system_prompt(self) -> str:
|
||||
"""Build the complete system prompt."""
|
||||
# Get memory once for all context formatting
|
||||
memory = get_memory()
|
||||
|
||||
# Base instruction
|
||||
base = "You are a helpful AI assistant for managing a media library."
|
||||
|
||||
# Language instruction
|
||||
language_instruction = (
|
||||
"Your first task is to determine the user's language from their message "
|
||||
"and use the `set_language` tool if it's different from the current one. "
|
||||
"After that, proceed to help the user."
|
||||
)
|
||||
|
||||
# Available tools
|
||||
tools_desc = self._format_tools_description()
|
||||
tools_section = f"\nAVAILABLE TOOLS:\n{tools_desc}" if tools_desc else ""
|
||||
|
||||
# Memory schema
|
||||
memory_schema = self._format_memory_schema()
|
||||
|
||||
# Configuration
|
||||
config_section = self._format_config_context(memory)
|
||||
if config_section:
|
||||
config_section = f"\n{config_section}"
|
||||
|
||||
# STM context
|
||||
stm_context = self._format_stm_context(memory)
|
||||
if stm_context:
|
||||
stm_context = f"\n{stm_context}"
|
||||
|
||||
# Episodic context
|
||||
episodic_context = self._format_episodic_context(memory)
|
||||
|
||||
# Important rules
|
||||
rules = """
|
||||
IMPORTANT RULES:
|
||||
- Use tools to accomplish tasks
|
||||
- When search results are available, reference them by index (e.g., "add_torrent_by_index")
|
||||
- Always confirm actions with the user before executing destructive operations
|
||||
- Provide clear, concise responses
|
||||
"""
|
||||
|
||||
# Examples
|
||||
examples = """
|
||||
EXAMPLES:
|
||||
- User: "Find Inception" → Use find_media_imdb_id, then find_torrent
|
||||
- User: "download the 3rd one" → Use add_torrent_by_index with index=3
|
||||
- User: "List my downloads" → Use list_folder with folder_type="download"
|
||||
"""
|
||||
|
||||
return f"""{base}
|
||||
|
||||
{language_instruction}
|
||||
{tools_section}
|
||||
|
||||
{memory_schema}
|
||||
{config_section}
|
||||
{stm_context}
|
||||
{episodic_context}
|
||||
{rules}
|
||||
{examples}
|
||||
"""
|
||||
+44
-102
@@ -1,4 +1,4 @@
|
||||
"""Tool registry — defines and registers all available tools for the agent."""
|
||||
"""Tool registry - defines and registers all available tools for the agent."""
|
||||
|
||||
import inspect
|
||||
import logging
|
||||
@@ -6,9 +6,6 @@ from collections.abc import Callable
|
||||
from dataclasses import dataclass
|
||||
from typing import Any
|
||||
|
||||
from .tools.spec import ToolSpec, ToolSpecError
|
||||
from .tools.spec_loader import load_tool_specs
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@@ -20,63 +17,51 @@ class Tool:
|
||||
description: str
|
||||
func: Callable[..., dict[str, Any]]
|
||||
parameters: dict[str, Any]
|
||||
cache_key: str | None = None # Parameter name to use as STM cache key.
|
||||
|
||||
|
||||
_PY_TYPE_TO_JSON = {
|
||||
str: "string",
|
||||
int: "integer",
|
||||
float: "number",
|
||||
bool: "boolean",
|
||||
list: "array",
|
||||
dict: "object",
|
||||
}
|
||||
|
||||
|
||||
def _json_type_for(annotation) -> str:
|
||||
"""Map a Python type annotation to a JSON Schema 'type' string."""
|
||||
if annotation is inspect.Parameter.empty:
|
||||
return "string"
|
||||
# Strip Optional[X] / X | None to X.
|
||||
args = getattr(annotation, "__args__", None)
|
||||
if args:
|
||||
non_none = [a for a in args if a is not type(None)]
|
||||
if len(non_none) == 1:
|
||||
annotation = non_none[0]
|
||||
return _PY_TYPE_TO_JSON.get(annotation, "string")
|
||||
|
||||
|
||||
def _create_tool_from_function(func: Callable, spec: ToolSpec | None = None) -> Tool:
|
||||
def _create_tool_from_function(func: Callable) -> Tool:
|
||||
"""
|
||||
Create a Tool object from a function, optionally enriched with a spec.
|
||||
Create a Tool object from a function.
|
||||
|
||||
Types and required-ness always come from the Python signature (source of
|
||||
truth for the API contract). When a spec is provided, the description
|
||||
and per-parameter docs come from the YAML spec instead of the docstring.
|
||||
Args:
|
||||
func: Function to convert to a tool
|
||||
|
||||
Returns:
|
||||
Tool object with metadata extracted from function
|
||||
"""
|
||||
sig = inspect.signature(func)
|
||||
sig_params = {name: p for name, p in sig.parameters.items() if name != "self"}
|
||||
doc = inspect.getdoc(func)
|
||||
|
||||
if spec is not None:
|
||||
_validate_spec_matches_signature(func.__name__, sig_params, spec)
|
||||
description = spec.compile_description()
|
||||
param_descriptions = {
|
||||
name: spec.compile_parameter_description(name) for name in sig_params
|
||||
}
|
||||
else:
|
||||
doc = inspect.getdoc(func)
|
||||
description = doc.strip().split("\n")[0] if doc else func.__name__
|
||||
param_descriptions = {name: f"Parameter {name}" for name in sig_params}
|
||||
# Extract description from docstring (first line)
|
||||
description = doc.strip().split("\n")[0] if doc else func.__name__
|
||||
|
||||
properties: dict[str, dict[str, Any]] = {}
|
||||
required: list[str] = []
|
||||
# Build JSON schema from function signature
|
||||
properties = {}
|
||||
required = []
|
||||
|
||||
for param_name, param in sig.parameters.items():
|
||||
if param_name == "self":
|
||||
continue
|
||||
|
||||
# Map Python types to JSON schema types
|
||||
param_type = "string" # default
|
||||
if param.annotation != inspect.Parameter.empty:
|
||||
if param.annotation is str:
|
||||
param_type = "string"
|
||||
elif param.annotation is int:
|
||||
param_type = "integer"
|
||||
elif param.annotation is float:
|
||||
param_type = "number"
|
||||
elif param.annotation is bool:
|
||||
param_type = "boolean"
|
||||
|
||||
for param_name, param in sig_params.items():
|
||||
properties[param_name] = {
|
||||
"type": _json_type_for(param.annotation),
|
||||
"description": param_descriptions[param_name],
|
||||
"type": param_type,
|
||||
"description": f"Parameter {param_name}",
|
||||
}
|
||||
if param.default is inspect.Parameter.empty:
|
||||
|
||||
# Add to required if no default value
|
||||
if param.default == inspect.Parameter.empty:
|
||||
required.append(param_name)
|
||||
|
||||
parameters = {
|
||||
@@ -85,69 +70,35 @@ def _create_tool_from_function(func: Callable, spec: ToolSpec | None = None) ->
|
||||
"required": required,
|
||||
}
|
||||
|
||||
cache_key = spec.cache.key if spec is not None and spec.cache is not None else None
|
||||
|
||||
return Tool(
|
||||
name=func.__name__,
|
||||
description=description,
|
||||
func=func,
|
||||
parameters=parameters,
|
||||
cache_key=cache_key,
|
||||
)
|
||||
|
||||
|
||||
def _validate_spec_matches_signature(
|
||||
func_name: str,
|
||||
sig_params: dict[str, inspect.Parameter],
|
||||
spec: ToolSpec,
|
||||
) -> None:
|
||||
"""Ensure every signature param has a spec entry and vice versa."""
|
||||
sig_names = set(sig_params.keys())
|
||||
spec_names = set(spec.parameters.keys())
|
||||
|
||||
missing_in_spec = sig_names - spec_names
|
||||
if missing_in_spec:
|
||||
raise ToolSpecError(
|
||||
f"tool '{func_name}': spec is missing entries for parameter(s) "
|
||||
f"{sorted(missing_in_spec)}"
|
||||
)
|
||||
|
||||
extra_in_spec = spec_names - sig_names
|
||||
if extra_in_spec:
|
||||
raise ToolSpecError(
|
||||
f"tool '{func_name}': spec has entries for unknown parameter(s) "
|
||||
f"{sorted(extra_in_spec)} (not in function signature)"
|
||||
)
|
||||
|
||||
|
||||
def make_tools(settings) -> dict[str, Tool]:
|
||||
"""
|
||||
Create and register all available tools.
|
||||
|
||||
Args:
|
||||
settings: Application settings instance.
|
||||
settings: Application settings instance
|
||||
|
||||
Returns:
|
||||
Dictionary mapping tool names to Tool objects.
|
||||
Dictionary mapping tool names to Tool objects
|
||||
"""
|
||||
# Import tools here to avoid circular dependencies
|
||||
from .tools import api as api_tools # noqa: PLC0415
|
||||
from .tools import filesystem as fs_tools # noqa: PLC0415
|
||||
from .tools import language as lang_tools # noqa: PLC0415
|
||||
from .tools import workflow as wf_tools # noqa: PLC0415
|
||||
|
||||
# List of all tool functions
|
||||
tool_functions = [
|
||||
fs_tools.set_path_for_folder,
|
||||
fs_tools.list_folder,
|
||||
fs_tools.read_release_metadata,
|
||||
fs_tools.query_library,
|
||||
fs_tools.analyze_release,
|
||||
fs_tools.probe_media,
|
||||
fs_tools.resolve_season_destination,
|
||||
fs_tools.resolve_episode_destination,
|
||||
fs_tools.resolve_movie_destination,
|
||||
fs_tools.resolve_series_destination,
|
||||
fs_tools.resolve_destination,
|
||||
fs_tools.move_media,
|
||||
fs_tools.move_to_destination,
|
||||
fs_tools.manage_subtitles,
|
||||
fs_tools.create_seed_links,
|
||||
fs_tools.learn,
|
||||
@@ -157,22 +108,13 @@ def make_tools(settings) -> dict[str, Tool]:
|
||||
api_tools.add_torrent_to_qbittorrent,
|
||||
api_tools.get_torrent_by_index,
|
||||
lang_tools.set_language,
|
||||
wf_tools.start_workflow,
|
||||
wf_tools.end_workflow,
|
||||
]
|
||||
|
||||
specs = load_tool_specs()
|
||||
|
||||
tools: dict[str, Tool] = {}
|
||||
# Create Tool objects from functions
|
||||
tools = {}
|
||||
for func in tool_functions:
|
||||
spec = specs.get(func.__name__)
|
||||
tool = _create_tool_from_function(func, spec=spec)
|
||||
tool = _create_tool_from_function(func)
|
||||
tools[tool.name] = tool
|
||||
|
||||
with_spec = sum(1 for fn in tool_functions if fn.__name__ in specs)
|
||||
logger.info(
|
||||
f"Registered {len(tools)} tools "
|
||||
f"({with_spec} with YAML spec, {len(tools) - with_spec} doc-only): "
|
||||
f"{list(tools.keys())}"
|
||||
)
|
||||
logger.info(f"Registered {len(tools)} tools: {list(tools.keys())}")
|
||||
return tools
|
||||
|
||||
@@ -14,7 +14,15 @@ logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def find_media_imdb_id(media_title: str) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/find_media_imdb_id.yaml."""
|
||||
"""
|
||||
Find the IMDb ID for a given media title using TMDB API.
|
||||
|
||||
Args:
|
||||
media_title: Title of the media to search for.
|
||||
|
||||
Returns:
|
||||
Dict with IMDb ID and media info, or error details.
|
||||
"""
|
||||
use_case = SearchMovieUseCase(tmdb_client)
|
||||
response = use_case.execute(media_title)
|
||||
result = response.to_dict()
|
||||
@@ -37,7 +45,18 @@ def find_media_imdb_id(media_title: str) -> dict[str, Any]:
|
||||
|
||||
|
||||
def find_torrent(media_title: str) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/find_torrent.yaml."""
|
||||
"""
|
||||
Find torrents for a given media title using Knaben API.
|
||||
|
||||
Results are stored in episodic memory so the user can reference them
|
||||
by index (e.g., "download the 3rd one").
|
||||
|
||||
Args:
|
||||
media_title: Title of the media to search for.
|
||||
|
||||
Returns:
|
||||
Dict with torrent list or error details.
|
||||
"""
|
||||
logger.info(f"Searching torrents for: {media_title}")
|
||||
|
||||
use_case = SearchTorrentsUseCase(knaben_client)
|
||||
@@ -57,7 +76,17 @@ def find_torrent(media_title: str) -> dict[str, Any]:
|
||||
|
||||
|
||||
def get_torrent_by_index(index: int) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/get_torrent_by_index.yaml."""
|
||||
"""
|
||||
Get a torrent from the last search results by its index.
|
||||
|
||||
Allows the user to reference results by number after a search.
|
||||
|
||||
Args:
|
||||
index: 1-based index of the torrent in the search results.
|
||||
|
||||
Returns:
|
||||
Dict with torrent data or error if not found.
|
||||
"""
|
||||
logger.info(f"Getting torrent at index: {index}")
|
||||
|
||||
memory = get_memory()
|
||||
@@ -84,7 +113,15 @@ def get_torrent_by_index(index: int) -> dict[str, Any]:
|
||||
|
||||
|
||||
def add_torrent_to_qbittorrent(magnet_link: str) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/add_torrent_to_qbittorrent.yaml."""
|
||||
"""
|
||||
Add a torrent to qBittorrent using a magnet link.
|
||||
|
||||
Args:
|
||||
magnet_link: Magnet link of the torrent to add.
|
||||
|
||||
Returns:
|
||||
Dict with success status or error details.
|
||||
"""
|
||||
logger.info("Adding torrent to qBittorrent")
|
||||
|
||||
use_case = AddTorrentUseCase(qbittorrent_client)
|
||||
@@ -120,7 +157,17 @@ def add_torrent_to_qbittorrent(magnet_link: str) -> dict[str, Any]:
|
||||
|
||||
|
||||
def add_torrent_by_index(index: int) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/add_torrent_by_index.yaml."""
|
||||
"""
|
||||
Add a torrent from the last search results by its index.
|
||||
|
||||
Combines get_torrent_by_index and add_torrent_to_qbittorrent.
|
||||
|
||||
Args:
|
||||
index: 1-based index of the torrent in the search results.
|
||||
|
||||
Returns:
|
||||
Dict with success status or error details.
|
||||
"""
|
||||
logger.info(f"Adding torrent by index: {index}")
|
||||
|
||||
torrent_result = get_torrent_by_index(index)
|
||||
|
||||
+126
-273
@@ -3,79 +3,42 @@
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import alfred as _alfred_pkg
|
||||
import yaml
|
||||
|
||||
import alfred as _alfred_pkg
|
||||
from alfred.application.filesystem import (
|
||||
CreateSeedLinksUseCase,
|
||||
ListFolderUseCase,
|
||||
ManageSubtitlesUseCase,
|
||||
MoveMediaUseCase,
|
||||
ResolveDestinationUseCase,
|
||||
SetFolderPathUseCase,
|
||||
)
|
||||
from alfred.application.filesystem.resolve_destination import (
|
||||
resolve_episode_destination as _resolve_episode_destination,
|
||||
)
|
||||
from alfred.application.filesystem.resolve_destination import (
|
||||
resolve_movie_destination as _resolve_movie_destination,
|
||||
)
|
||||
from alfred.application.filesystem.resolve_destination import (
|
||||
resolve_season_destination as _resolve_season_destination,
|
||||
)
|
||||
from alfred.application.filesystem.resolve_destination import (
|
||||
resolve_series_destination as _resolve_series_destination,
|
||||
)
|
||||
from alfred.infrastructure.filesystem import FileManager, create_folder, move
|
||||
from alfred.infrastructure.knowledge.release_kb import YamlReleaseKnowledge
|
||||
from alfred.infrastructure.metadata import MetadataStore
|
||||
from alfred.infrastructure.persistence import get_memory
|
||||
from alfred.infrastructure.probe import FfprobeMediaProber
|
||||
|
||||
# Agent-tools frontier: this is the legitimate home for the singletons that
|
||||
# back every LLM-exposed wrapper. The use cases below take ``kb`` / ``prober``
|
||||
# as required params; tests inject their own stubs.
|
||||
_KB = YamlReleaseKnowledge()
|
||||
_PROBER = FfprobeMediaProber()
|
||||
from alfred.infrastructure.filesystem import FileManager
|
||||
|
||||
_LEARNED_ROOT = Path(_alfred_pkg.__file__).parent.parent / "data" / "knowledge"
|
||||
|
||||
|
||||
def move_media(source: str, destination: str) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/move_media.yaml."""
|
||||
"""
|
||||
Move a media file to a destination path.
|
||||
|
||||
Copies the file safely first (with integrity check), then deletes the source.
|
||||
Use this to organise a downloaded file into the media library.
|
||||
|
||||
Args:
|
||||
source: Absolute path to the source file.
|
||||
destination: Absolute path to the destination file (must not already exist).
|
||||
|
||||
Returns:
|
||||
Dict with status, source, destination, filename, and size — or error details.
|
||||
"""
|
||||
file_manager = FileManager()
|
||||
use_case = MoveMediaUseCase(file_manager)
|
||||
return use_case.execute(source, destination).to_dict()
|
||||
|
||||
|
||||
def move_to_destination(source: str, destination: str) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/move_to_destination.yaml."""
|
||||
parent = str(Path(destination).parent)
|
||||
result = create_folder(parent)
|
||||
if result["status"] != "ok":
|
||||
return result
|
||||
return move(source, destination)
|
||||
|
||||
|
||||
def resolve_season_destination(
|
||||
release_name: str,
|
||||
tmdb_title: str,
|
||||
tmdb_year: int,
|
||||
confirmed_folder: str | None = None,
|
||||
source_path: str | None = None,
|
||||
) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_season_destination.yaml."""
|
||||
return _resolve_season_destination(
|
||||
release_name,
|
||||
tmdb_title,
|
||||
tmdb_year,
|
||||
_KB,
|
||||
_PROBER,
|
||||
confirmed_folder,
|
||||
source_path,
|
||||
).to_dict()
|
||||
|
||||
|
||||
def resolve_episode_destination(
|
||||
def resolve_destination(
|
||||
release_name: str,
|
||||
source_file: str,
|
||||
tmdb_title: str,
|
||||
@@ -83,84 +46,119 @@ def resolve_episode_destination(
|
||||
tmdb_episode_title: str | None = None,
|
||||
confirmed_folder: str | None = None,
|
||||
) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_episode_destination.yaml."""
|
||||
return _resolve_episode_destination(
|
||||
release_name,
|
||||
source_file,
|
||||
tmdb_title,
|
||||
tmdb_year,
|
||||
_KB,
|
||||
_PROBER,
|
||||
tmdb_episode_title,
|
||||
confirmed_folder,
|
||||
"""
|
||||
Compute the destination path in the media library for a release.
|
||||
|
||||
Call this before move_media to get the correct library path. Handles:
|
||||
- Parsing the release name (quality, codec, group, season/episode)
|
||||
- Looking up any existing series folder in the library
|
||||
- Applying group-conflict rules (asks user if ambiguous)
|
||||
- Building the full destination path with correct naming conventions
|
||||
|
||||
Args:
|
||||
release_name: Raw release folder or file name
|
||||
(e.g. "Oz.S03.1080p.WEBRip.x265-KONTRAST").
|
||||
source_file: Absolute path to the source video file (used for extension).
|
||||
tmdb_title: Canonical show/movie title from TMDB (e.g. "Oz").
|
||||
tmdb_year: Release/start year from TMDB (e.g. 1997).
|
||||
tmdb_episode_title: Episode title from TMDB for single-episode releases
|
||||
(e.g. "The Routine"). Omit for season packs and movies.
|
||||
confirmed_folder: If a previous call returned needs_clarification, pass
|
||||
the user-chosen folder name here to proceed.
|
||||
|
||||
Returns:
|
||||
On success: dict with status, library_file, series_folder, season_folder,
|
||||
series_folder_name, season_folder_name, filename,
|
||||
is_new_series_folder.
|
||||
On ambiguity: dict with status="needs_clarification", question, options.
|
||||
On error: dict with status="error", error, message.
|
||||
"""
|
||||
use_case = ResolveDestinationUseCase()
|
||||
return use_case.execute(
|
||||
release_name=release_name,
|
||||
source_file=source_file,
|
||||
tmdb_title=tmdb_title,
|
||||
tmdb_year=tmdb_year,
|
||||
tmdb_episode_title=tmdb_episode_title,
|
||||
confirmed_folder=confirmed_folder,
|
||||
).to_dict()
|
||||
|
||||
|
||||
def resolve_movie_destination(
|
||||
release_name: str,
|
||||
source_file: str,
|
||||
tmdb_title: str,
|
||||
tmdb_year: int,
|
||||
) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_movie_destination.yaml."""
|
||||
return _resolve_movie_destination(
|
||||
release_name, source_file, tmdb_title, tmdb_year, _KB, _PROBER
|
||||
).to_dict()
|
||||
def create_seed_links(library_file: str, original_download_folder: str) -> dict[str, Any]:
|
||||
"""
|
||||
Prepare a torrent subfolder so qBittorrent can keep seeding after a move.
|
||||
|
||||
Hard-links the video file from the library into torrents/<original_folder_name>/,
|
||||
then copies all remaining files from the original download folder (subtitles,
|
||||
.nfo, .jpg, .txt, …) so the torrent data is complete.
|
||||
|
||||
def resolve_series_destination(
|
||||
release_name: str,
|
||||
tmdb_title: str,
|
||||
tmdb_year: int,
|
||||
confirmed_folder: str | None = None,
|
||||
source_path: str | None = None,
|
||||
) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_series_destination.yaml."""
|
||||
return _resolve_series_destination(
|
||||
release_name,
|
||||
tmdb_title,
|
||||
tmdb_year,
|
||||
_KB,
|
||||
_PROBER,
|
||||
confirmed_folder,
|
||||
source_path,
|
||||
).to_dict()
|
||||
Call this after move_media when the user wants to keep seeding.
|
||||
|
||||
Args:
|
||||
library_file: Absolute path to the video file now in the library.
|
||||
original_download_folder: Absolute path to the original download folder
|
||||
(may still contain subs, nfo, and other release files).
|
||||
|
||||
def create_seed_links(
|
||||
library_file: str, original_download_folder: str
|
||||
) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/create_seed_links.yaml."""
|
||||
Returns:
|
||||
Dict with status, torrent_subfolder, linked_file, copied_files,
|
||||
copied_count, skipped — or error details.
|
||||
"""
|
||||
file_manager = FileManager()
|
||||
use_case = CreateSeedLinksUseCase(file_manager)
|
||||
return use_case.execute(library_file, original_download_folder).to_dict()
|
||||
|
||||
|
||||
def manage_subtitles(source_video: str, destination_video: str) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/manage_subtitles.yaml."""
|
||||
"""
|
||||
Place subtitle files alongside an organised video file.
|
||||
|
||||
Scans for subtitle files (.srt, .ass, .ssa, .vtt, .sub) next to the source
|
||||
video, filters them according to the user's SubtitlePreferences (languages,
|
||||
min size, SDH, forced), and hard-links the passing files next to the
|
||||
destination video with the correct naming convention:
|
||||
fr.srt / fr.sdh.srt / fr.forced.srt / en.srt …
|
||||
|
||||
Call this right after move_media or copy_media, passing the same source and
|
||||
destination paths. If no subtitles are found, returns ok with placed_count=0.
|
||||
|
||||
Args:
|
||||
source_video: Absolute path to the original video file (in the download folder).
|
||||
destination_video: Absolute path to the placed video file (in the library).
|
||||
|
||||
Returns:
|
||||
Dict with status, placed list (source, destination, filename), placed_count,
|
||||
skipped_count — or error details.
|
||||
"""
|
||||
file_manager = FileManager()
|
||||
use_case = ManageSubtitlesUseCase(file_manager)
|
||||
return use_case.execute(source_video, destination_video).to_dict()
|
||||
|
||||
|
||||
def learn(pack: str, category: str, key: str, values: list[str]) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/learn.yaml."""
|
||||
"""
|
||||
Teach Alfred a new token mapping and persist it to the learned knowledge pack.
|
||||
|
||||
Use this when a subtitle file contains an unrecognised token — after confirming
|
||||
with the user what the token means, call learn() to persist it so Alfred
|
||||
recognises it in future scans.
|
||||
|
||||
Args:
|
||||
pack: Knowledge pack name. Currently only "subtitles" is supported.
|
||||
category: Category within the pack: "languages", "types", or "formats".
|
||||
key: The entry key — e.g. ISO 639-1 language code ("es"), type id ("sdh").
|
||||
values: List of tokens to add — e.g. ["spanish", "espanol", "spa"].
|
||||
|
||||
Returns:
|
||||
Dict with status, added_count, and the updated token list.
|
||||
"""
|
||||
_VALID_PACKS = {"subtitles"}
|
||||
_VALID_CATEGORIES = {"languages", "types", "formats"}
|
||||
|
||||
if pack not in _VALID_PACKS:
|
||||
return {
|
||||
"status": "error",
|
||||
"error": "unknown_pack",
|
||||
"message": f"Unknown pack '{pack}'. Valid: {sorted(_VALID_PACKS)}",
|
||||
}
|
||||
return {"status": "error", "error": "unknown_pack", "message": f"Unknown pack '{pack}'. Valid: {sorted(_VALID_PACKS)}"}
|
||||
|
||||
if category not in _VALID_CATEGORIES:
|
||||
return {
|
||||
"status": "error",
|
||||
"error": "unknown_category",
|
||||
"message": f"Unknown category '{category}'. Valid: {sorted(_VALID_CATEGORIES)}",
|
||||
}
|
||||
return {"status": "error", "error": "unknown_category", "message": f"Unknown category '{category}'. Valid: {sorted(_VALID_CATEGORIES)}"}
|
||||
|
||||
learned_path = _LEARNED_ROOT / "subtitles_learned.yaml"
|
||||
_LEARNED_ROOT.mkdir(parents=True, exist_ok=True)
|
||||
@@ -182,9 +180,7 @@ def learn(pack: str, category: str, key: str, values: list[str]) -> dict[str, An
|
||||
tmp = learned_path.with_suffix(".yaml.tmp")
|
||||
try:
|
||||
with open(tmp, "w", encoding="utf-8") as f:
|
||||
yaml.safe_dump(
|
||||
data, f, allow_unicode=True, default_flow_style=False, sort_keys=False
|
||||
)
|
||||
yaml.safe_dump(data, f, allow_unicode=True, default_flow_style=False, sort_keys=False)
|
||||
tmp.rename(learned_path)
|
||||
except Exception as e:
|
||||
tmp.unlink(missing_ok=True)
|
||||
@@ -201,177 +197,34 @@ def learn(pack: str, category: str, key: str, values: list[str]) -> dict[str, An
|
||||
|
||||
|
||||
def set_path_for_folder(folder_name: str, path_value: str) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/set_path_for_folder.yaml."""
|
||||
"""
|
||||
Set a folder path in the configuration.
|
||||
|
||||
Args:
|
||||
folder_name: Name of folder to set (download, tvshow, movie, torrent).
|
||||
path_value: Absolute path to the folder.
|
||||
|
||||
Returns:
|
||||
Dict with status or error information.
|
||||
"""
|
||||
file_manager = FileManager()
|
||||
use_case = SetFolderPathUseCase(file_manager)
|
||||
response = use_case.execute(folder_name, path_value)
|
||||
return response.to_dict()
|
||||
|
||||
|
||||
def analyze_release(release_name: str, source_path: str) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/analyze_release.yaml."""
|
||||
from alfred.application.release import inspect_release # noqa: PLC0415
|
||||
|
||||
result = inspect_release(release_name, Path(source_path), _KB, _PROBER)
|
||||
parsed = result.parsed
|
||||
return {
|
||||
"status": "ok",
|
||||
"media_type": parsed.media_type,
|
||||
"parse_path": parsed.parse_path,
|
||||
"title": parsed.title,
|
||||
"year": parsed.year,
|
||||
"season": parsed.season,
|
||||
"episode": parsed.episode,
|
||||
"episode_end": parsed.episode_end,
|
||||
"quality": parsed.quality,
|
||||
"source": parsed.source,
|
||||
"codec": parsed.codec,
|
||||
"group": parsed.group,
|
||||
"languages": parsed.languages,
|
||||
"audio_codec": parsed.audio_codec,
|
||||
"audio_channels": parsed.audio_channels,
|
||||
"bit_depth": parsed.bit_depth,
|
||||
"hdr_format": parsed.hdr_format,
|
||||
"edition": parsed.edition,
|
||||
"site_tag": parsed.site_tag,
|
||||
"is_season_pack": parsed.is_season_pack,
|
||||
"probe_used": result.probe_used,
|
||||
"confidence": result.report.confidence,
|
||||
"road": result.report.road,
|
||||
"recommended_action": result.recommended_action,
|
||||
}
|
||||
|
||||
|
||||
def probe_media(source_path: str) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/probe_media.yaml."""
|
||||
path = Path(source_path)
|
||||
if not path.exists():
|
||||
return {
|
||||
"status": "error",
|
||||
"error": "not_found",
|
||||
"message": f"{source_path} does not exist",
|
||||
}
|
||||
|
||||
media_info = _PROBER.probe(path)
|
||||
if media_info is None:
|
||||
return {
|
||||
"status": "error",
|
||||
"error": "probe_failed",
|
||||
"message": "ffprobe failed to read the file",
|
||||
}
|
||||
|
||||
return {
|
||||
"status": "ok",
|
||||
"video": {
|
||||
"codec": media_info.video_codec,
|
||||
"resolution": media_info.resolution,
|
||||
"width": media_info.width,
|
||||
"height": media_info.height,
|
||||
"duration_seconds": media_info.duration_seconds,
|
||||
"bitrate_kbps": media_info.bitrate_kbps,
|
||||
},
|
||||
"audio_tracks": [
|
||||
{
|
||||
"index": t.index,
|
||||
"codec": t.codec,
|
||||
"channels": t.channels,
|
||||
"channel_layout": t.channel_layout,
|
||||
"language": t.language,
|
||||
"is_default": t.is_default,
|
||||
}
|
||||
for t in media_info.audio_tracks
|
||||
],
|
||||
"subtitle_tracks": [
|
||||
{
|
||||
"index": t.index,
|
||||
"codec": t.codec,
|
||||
"language": t.language,
|
||||
"is_default": t.is_default,
|
||||
"is_forced": t.is_forced,
|
||||
}
|
||||
for t in media_info.subtitle_tracks
|
||||
],
|
||||
"audio_languages": media_info.audio_languages,
|
||||
"is_multi_audio": media_info.is_multi_audio,
|
||||
}
|
||||
|
||||
|
||||
def list_folder(folder_type: str, path: str = ".") -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/list_folder.yaml."""
|
||||
"""
|
||||
List contents of a configured folder.
|
||||
|
||||
Args:
|
||||
folder_type: Type of folder to list (download, tvshow, movie, torrent).
|
||||
path: Relative path within the folder (default: root).
|
||||
|
||||
Returns:
|
||||
Dict with folder contents or error information.
|
||||
"""
|
||||
file_manager = FileManager()
|
||||
use_case = ListFolderUseCase(file_manager)
|
||||
response = use_case.execute(folder_type, path)
|
||||
return response.to_dict()
|
||||
|
||||
|
||||
def read_release_metadata(release_path: str) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/read_release_metadata.yaml."""
|
||||
path = Path(release_path)
|
||||
if not path.exists():
|
||||
return {
|
||||
"status": "error",
|
||||
"error": "not_found",
|
||||
"message": f"{release_path} does not exist",
|
||||
}
|
||||
root = path if path.is_dir() else path.parent
|
||||
store = MetadataStore(root)
|
||||
if not store.exists():
|
||||
return {
|
||||
"status": "ok",
|
||||
"release_path": str(root),
|
||||
"has_metadata": False,
|
||||
"metadata": {},
|
||||
}
|
||||
return {
|
||||
"status": "ok",
|
||||
"release_path": str(root),
|
||||
"has_metadata": True,
|
||||
"metadata": store.load(),
|
||||
}
|
||||
|
||||
|
||||
def query_library(name: str) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/query_library.yaml."""
|
||||
needle = name.strip().lower()
|
||||
if not needle:
|
||||
return {
|
||||
"status": "error",
|
||||
"error": "empty_name",
|
||||
"message": "name must be a non-empty string",
|
||||
}
|
||||
|
||||
memory = get_memory()
|
||||
roots = memory.ltm.library_paths.to_dict() or {}
|
||||
if not roots:
|
||||
return {
|
||||
"status": "error",
|
||||
"error": "no_libraries",
|
||||
"message": "No library paths configured — call set_path_for_folder first.",
|
||||
}
|
||||
|
||||
matches: list[dict[str, Any]] = []
|
||||
for collection, root in roots.items():
|
||||
root_path = Path(root)
|
||||
if not root_path.is_dir():
|
||||
continue
|
||||
for entry in root_path.iterdir():
|
||||
if not entry.is_dir():
|
||||
continue
|
||||
if needle not in entry.name.lower():
|
||||
continue
|
||||
store = MetadataStore(entry)
|
||||
matches.append(
|
||||
{
|
||||
"collection": collection,
|
||||
"name": entry.name,
|
||||
"path": str(entry),
|
||||
"has_metadata": store.exists(),
|
||||
}
|
||||
)
|
||||
|
||||
return {
|
||||
"status": "ok",
|
||||
"query": name,
|
||||
"match_count": len(matches),
|
||||
"matches": matches,
|
||||
}
|
||||
|
||||
@@ -9,7 +9,15 @@ logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def set_language(language: str) -> dict[str, Any]:
|
||||
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/set_language.yaml."""
|
||||
"""
|
||||
Set the conversation language.
|
||||
|
||||
Args:
|
||||
language: Language code (e.g., 'en', 'fr', 'es', 'de')
|
||||
|
||||
Returns:
|
||||
Status dictionary
|
||||
"""
|
||||
try:
|
||||
memory = get_memory()
|
||||
memory.stm.set_language(language)
|
||||
|
||||
@@ -1,221 +0,0 @@
|
||||
"""
|
||||
ToolSpec — semantic description of a tool, loaded from YAML.
|
||||
|
||||
Each tool exposed to the agent has a matching YAML spec under
|
||||
alfred/agent/tools/specs/{tool_name}.yaml. The spec carries everything the
|
||||
LLM needs to decide *when* and *why* to call the tool — separated from the
|
||||
Python signature, which remains the source of truth for *how* (types,
|
||||
required-ness).
|
||||
|
||||
The YAML structure is documented in the dataclasses below. Loading a spec
|
||||
validates its shape; missing or unexpected fields raise ToolSpecError.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
|
||||
import yaml
|
||||
|
||||
|
||||
class ToolSpecError(ValueError):
|
||||
"""Raised when a YAML tool spec is malformed or inconsistent."""
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ParameterSpec:
|
||||
"""Semantic description of a single tool parameter."""
|
||||
|
||||
description: str # Short: what the value represents.
|
||||
why_needed: str # Why the tool needs this — drives LLM reasoning.
|
||||
example: str | None = None # Concrete example value, shown to the LLM.
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, name: str, data: dict) -> ParameterSpec:
|
||||
_require(data, "description", f"parameter '{name}'")
|
||||
_require(data, "why_needed", f"parameter '{name}'")
|
||||
return cls(
|
||||
description=str(data["description"]).strip(),
|
||||
why_needed=str(data["why_needed"]).strip(),
|
||||
example=str(data["example"]).strip()
|
||||
if data.get("example") is not None
|
||||
else None,
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ReturnsSpec:
|
||||
"""Description of one possible return shape (ok / needs_clarification / error / ...)."""
|
||||
|
||||
description: str
|
||||
fields: dict[str, str] = field(default_factory=dict)
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, key: str, data: dict) -> ReturnsSpec:
|
||||
_require(data, "description", f"returns.{key}")
|
||||
fields = data.get("fields") or {}
|
||||
if not isinstance(fields, dict):
|
||||
raise ToolSpecError(
|
||||
f"returns.{key}.fields must be a dict, got {type(fields).__name__}"
|
||||
)
|
||||
return cls(
|
||||
description=str(data["description"]).strip(),
|
||||
fields={str(k): str(v).strip() for k, v in fields.items()},
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class CacheSpec:
|
||||
"""Marks a tool as cacheable in STM.tool_results, keyed by one of its parameters."""
|
||||
|
||||
key: str # Name of the parameter whose value is the cache key.
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: dict) -> CacheSpec:
|
||||
_require(data, "key", "cache")
|
||||
return cls(key=str(data["key"]).strip())
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ToolSpec:
|
||||
"""Full semantic spec for one tool."""
|
||||
|
||||
name: str
|
||||
summary: str # One-liner — becomes Tool.description.
|
||||
description: str # Longer paragraph.
|
||||
when_to_use: str
|
||||
when_not_to_use: str | None
|
||||
next_steps: str | None
|
||||
parameters: dict[str, ParameterSpec] # name -> ParameterSpec
|
||||
returns: dict[str, ReturnsSpec] # status_key -> ReturnsSpec
|
||||
cache: CacheSpec | None = None # If present, tool is cached.
|
||||
|
||||
@classmethod
|
||||
def from_yaml_path(cls, path: Path) -> ToolSpec:
|
||||
with open(path, encoding="utf-8") as f:
|
||||
data = yaml.safe_load(f) or {}
|
||||
if not isinstance(data, dict):
|
||||
raise ToolSpecError(f"{path}: top-level must be a mapping")
|
||||
try:
|
||||
return cls.from_dict(data)
|
||||
except ToolSpecError as e:
|
||||
raise ToolSpecError(f"{path}: {e}") from e
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: dict) -> ToolSpec:
|
||||
_require(data, "name", "spec")
|
||||
_require(data, "summary", "spec")
|
||||
_require(data, "description", "spec")
|
||||
_require(data, "when_to_use", "spec")
|
||||
|
||||
params_raw = data.get("parameters") or {}
|
||||
if not isinstance(params_raw, dict):
|
||||
raise ToolSpecError("parameters must be a mapping")
|
||||
parameters = {
|
||||
pname: ParameterSpec.from_dict(pname, pdata or {})
|
||||
for pname, pdata in params_raw.items()
|
||||
}
|
||||
|
||||
returns_raw = data.get("returns") or {}
|
||||
if not isinstance(returns_raw, dict):
|
||||
raise ToolSpecError("returns must be a mapping")
|
||||
returns = {
|
||||
rkey: ReturnsSpec.from_dict(rkey, rdata or {})
|
||||
for rkey, rdata in returns_raw.items()
|
||||
}
|
||||
|
||||
cache_raw = data.get("cache")
|
||||
if cache_raw is not None and not isinstance(cache_raw, dict):
|
||||
raise ToolSpecError("cache must be a mapping")
|
||||
cache = CacheSpec.from_dict(cache_raw) if cache_raw else None
|
||||
|
||||
spec = cls(
|
||||
name=str(data["name"]).strip(),
|
||||
summary=str(data["summary"]).strip(),
|
||||
description=str(data["description"]).strip(),
|
||||
when_to_use=str(data["when_to_use"]).strip(),
|
||||
when_not_to_use=_strip_or_none(data.get("when_not_to_use")),
|
||||
next_steps=_strip_or_none(data.get("next_steps")),
|
||||
parameters=parameters,
|
||||
returns=returns,
|
||||
cache=cache,
|
||||
)
|
||||
if cache is not None and cache.key not in parameters:
|
||||
raise ToolSpecError(
|
||||
f"cache.key '{cache.key}' is not a declared parameter "
|
||||
f"(declared: {sorted(parameters)})"
|
||||
)
|
||||
return spec
|
||||
|
||||
def compile_description(self) -> str:
|
||||
"""
|
||||
Build the long description text passed to the LLM as Tool.description.
|
||||
|
||||
Layout:
|
||||
<summary>
|
||||
|
||||
<description>
|
||||
|
||||
When to use:
|
||||
<when_to_use>
|
||||
|
||||
When NOT to use: (if present)
|
||||
<when_not_to_use>
|
||||
|
||||
Next steps: (if present)
|
||||
<next_steps>
|
||||
|
||||
Returns:
|
||||
<status>: <description>
|
||||
· <field>: <desc>
|
||||
"""
|
||||
parts = [self.summary, "", self.description]
|
||||
|
||||
parts += ["", "When to use:", _indent(self.when_to_use)]
|
||||
|
||||
if self.when_not_to_use:
|
||||
parts += ["", "When NOT to use:", _indent(self.when_not_to_use)]
|
||||
|
||||
if self.next_steps:
|
||||
parts += ["", "Next steps:", _indent(self.next_steps)]
|
||||
|
||||
if self.returns:
|
||||
parts += ["", "Returns:"]
|
||||
for status, ret in self.returns.items():
|
||||
parts.append(f" {status}: {ret.description}")
|
||||
for fname, fdesc in ret.fields.items():
|
||||
parts.append(f" · {fname}: {fdesc}")
|
||||
|
||||
return "\n".join(parts)
|
||||
|
||||
def compile_parameter_description(self, name: str) -> str:
|
||||
"""Build the JSON Schema 'description' field for one parameter."""
|
||||
p = self.parameters.get(name)
|
||||
if p is None:
|
||||
raise ToolSpecError(f"tool '{self.name}': no spec for parameter '{name}'")
|
||||
text = f"{p.description} (Why: {p.why_needed})"
|
||||
if p.example:
|
||||
text += f" Example: {p.example}"
|
||||
return text
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _require(data: dict, key: str, where: str) -> None:
|
||||
if data.get(key) is None or (isinstance(data[key], str) and not data[key].strip()):
|
||||
raise ToolSpecError(f"{where}: missing required field '{key}'")
|
||||
|
||||
|
||||
def _strip_or_none(value) -> str | None:
|
||||
if value is None:
|
||||
return None
|
||||
s = str(value).strip()
|
||||
return s or None
|
||||
|
||||
|
||||
def _indent(text: str, prefix: str = " ") -> str:
|
||||
return "\n".join(prefix + line for line in text.splitlines())
|
||||
@@ -1,53 +0,0 @@
|
||||
"""
|
||||
ToolSpecLoader — discover and load all YAML tool specs from a directory.
|
||||
|
||||
Convention: one YAML file per tool, named exactly like the Python function
|
||||
that implements it (e.g. resolve_season_destination.yaml).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from pathlib import Path
|
||||
|
||||
from .spec import ToolSpec, ToolSpecError
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
_DEFAULT_SPECS_DIR = Path(__file__).parent / "specs"
|
||||
|
||||
|
||||
def load_tool_specs(specs_dir: Path | None = None) -> dict[str, ToolSpec]:
|
||||
"""
|
||||
Load every {tool}.yaml under specs_dir into a {name -> ToolSpec} mapping.
|
||||
|
||||
Args:
|
||||
specs_dir: Directory to scan. Defaults to alfred/agent/tools/specs/.
|
||||
|
||||
Returns:
|
||||
Mapping from tool name to its parsed ToolSpec.
|
||||
|
||||
Raises:
|
||||
ToolSpecError: if a spec is malformed, or if the filename doesn't
|
||||
match the 'name' field inside the YAML.
|
||||
"""
|
||||
root = specs_dir or _DEFAULT_SPECS_DIR
|
||||
if not root.exists():
|
||||
logger.warning(f"Tool specs directory not found: {root}")
|
||||
return {}
|
||||
|
||||
specs: dict[str, ToolSpec] = {}
|
||||
for path in sorted(root.glob("*.yaml")):
|
||||
spec = ToolSpec.from_yaml_path(path)
|
||||
expected_name = path.stem
|
||||
if spec.name != expected_name:
|
||||
raise ToolSpecError(
|
||||
f"{path}: filename stem '{expected_name}' "
|
||||
f"does not match spec.name '{spec.name}'"
|
||||
)
|
||||
if spec.name in specs:
|
||||
raise ToolSpecError(f"duplicate tool spec name: '{spec.name}'")
|
||||
specs[spec.name] = spec
|
||||
|
||||
logger.info(f"Loaded {len(specs)} tool spec(s) from {root}")
|
||||
return specs
|
||||
@@ -1,53 +0,0 @@
|
||||
name: add_torrent_by_index
|
||||
|
||||
summary: >
|
||||
Pick a torrent from the last find_torrent results by index and add
|
||||
it to qBittorrent in one call.
|
||||
|
||||
description: |
|
||||
Convenience wrapper that combines get_torrent_by_index +
|
||||
add_torrent_to_qbittorrent. Looks up the torrent at the given
|
||||
1-based index, extracts its magnet link, and sends it to
|
||||
qBittorrent. The result mirrors add_torrent_to_qbittorrent's, with
|
||||
the chosen torrent's name appended on success.
|
||||
|
||||
when_to_use: |
|
||||
The default action after find_torrent when the user picks a hit by
|
||||
number ("download the second one"). One call, two side effects:
|
||||
episodic memory updated + download started.
|
||||
|
||||
when_not_to_use: |
|
||||
- When the user only wants to inspect, not download — use
|
||||
get_torrent_by_index.
|
||||
- When the magnet comes from outside the search results — use
|
||||
add_torrent_to_qbittorrent directly.
|
||||
|
||||
next_steps: |
|
||||
- On status=ok: confirm the download started and end the workflow
|
||||
if not already ended.
|
||||
- On status=error (not_found): the index is out of range; show the
|
||||
available count from episodic memory.
|
||||
- On status=error (no_magnet): the search result was malformed —
|
||||
suggest re-running find_torrent.
|
||||
|
||||
parameters:
|
||||
index:
|
||||
description: 1-based position of the torrent in the last find_torrent results.
|
||||
why_needed: |
|
||||
Identifies which torrent to add. Out-of-range indices return
|
||||
not_found.
|
||||
example: 3
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Torrent was added to qBittorrent.
|
||||
fields:
|
||||
status: "'ok'"
|
||||
message: Confirmation message.
|
||||
torrent_name: Name of the torrent that was added.
|
||||
|
||||
error:
|
||||
description: Failed to add.
|
||||
fields:
|
||||
error: Short error code (not_found, no_magnet, ...).
|
||||
message: Human-readable explanation.
|
||||
@@ -1,48 +0,0 @@
|
||||
name: add_torrent_to_qbittorrent
|
||||
|
||||
summary: >
|
||||
Send a magnet link to qBittorrent and start the download.
|
||||
|
||||
description: |
|
||||
Adds a torrent to qBittorrent using its WebUI API. On success, the
|
||||
download is also recorded in episodic memory as an active_download
|
||||
so the agent can track its progress later, the STM topic is set to
|
||||
"downloading", and the current workflow is ended (the user typically
|
||||
leaves the find-and-download scope at this point).
|
||||
|
||||
when_to_use: |
|
||||
When the user provides a raw magnet link, or when chaining manually
|
||||
after get_torrent_by_index. For the common "user picked search hit
|
||||
N" case, prefer add_torrent_by_index — one call instead of two.
|
||||
|
||||
when_not_to_use: |
|
||||
- For .torrent files (not supported by this tool — magnet only).
|
||||
- When qBittorrent is not configured / reachable — the call will
|
||||
fail and the user has to fix the config first.
|
||||
|
||||
next_steps: |
|
||||
- On status=ok: the workflow is already ended; confirm to the user
|
||||
that the download has started.
|
||||
- On status=error: surface the message; common causes are auth
|
||||
failure or qBittorrent being unreachable.
|
||||
|
||||
parameters:
|
||||
magnet_link:
|
||||
description: Magnet URI of the torrent to add (magnet:?xt=urn:btih:...).
|
||||
why_needed: |
|
||||
The actual payload sent to qBittorrent. Must be a full magnet
|
||||
URI, not a hash alone.
|
||||
example: "magnet:?xt=urn:btih:abc123..."
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Torrent accepted by qBittorrent.
|
||||
fields:
|
||||
status: "'ok'"
|
||||
message: Confirmation message.
|
||||
|
||||
error:
|
||||
description: qBittorrent rejected the request or is unreachable.
|
||||
fields:
|
||||
error: Short error code.
|
||||
message: Human-readable explanation.
|
||||
@@ -1,85 +0,0 @@
|
||||
name: analyze_release
|
||||
|
||||
summary: >
|
||||
One-shot analyzer that parses a release name, detects its media type
|
||||
from the folder layout, and enriches the result with ffprobe data.
|
||||
|
||||
description: |
|
||||
Combines three steps in a single call so the agent gets a complete
|
||||
picture before routing:
|
||||
1. parse_release(release_name) — extracts title, year, season,
|
||||
episode, quality, source, codec, group, languages, audio info,
|
||||
HDR, edition, site tag.
|
||||
2. detect_media_type(parsed, path) — uses the on-disk layout
|
||||
(single file vs. folder, presence of S01 dirs, episode count)
|
||||
to choose: movie / tv_episode / tv_season / tv_complete /
|
||||
other / unknown.
|
||||
3. ffprobe enrichment — when the media type is recognised, runs
|
||||
ffprobe on the first video file found and fills in audio
|
||||
codec/channels, bit depth, HDR format. Sets probe_used=true.
|
||||
|
||||
when_to_use: |
|
||||
As the very first step of any organize workflow, right after
|
||||
list_folder, on each release the user wants to handle. The output
|
||||
drives which resolve_*_destination to call next.
|
||||
|
||||
when_not_to_use: |
|
||||
- When you only need codec/audio info on a specific video file:
|
||||
use probe_media (no parsing, no media-type detection).
|
||||
- For releases the user has already analyzed earlier in the same
|
||||
workflow — the parse is deterministic, no need to re-run.
|
||||
|
||||
next_steps: |
|
||||
- media_type == movie → resolve_movie_destination
|
||||
- media_type == tv_season → resolve_season_destination
|
||||
- media_type == tv_episode → resolve_episode_destination
|
||||
- media_type == tv_complete → resolve_series_destination
|
||||
- media_type in (other, unknown) → ask the user what to do; do not
|
||||
auto-route.
|
||||
|
||||
cache:
|
||||
key: source_path
|
||||
|
||||
parameters:
|
||||
release_name:
|
||||
description: Raw release folder or file name as it appears on disk.
|
||||
why_needed: |
|
||||
Source of all the parsed tokens (quality, codec, group, ...).
|
||||
Don't sanitise it — the parser relies on the exact spelling.
|
||||
example: Breaking.Bad.S01.1080p.BluRay.x265-GROUP
|
||||
|
||||
source_path:
|
||||
description: Absolute path to the release folder or file on disk.
|
||||
why_needed: |
|
||||
Required for layout-based media-type detection and for ffprobe
|
||||
to find a video file inside the release.
|
||||
example: /downloads/Breaking.Bad.S01.1080p.BluRay.x265-GROUP
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Release analyzed.
|
||||
fields:
|
||||
status: "'ok'"
|
||||
media_type: "One of: movie, tv_episode, tv_season, tv_complete, other, unknown."
|
||||
parse_path: "Which parser branch was taken (debug)."
|
||||
title: Parsed title.
|
||||
year: Parsed year (int) or null.
|
||||
season: Season number (int) or null.
|
||||
episode: Episode number (int) or null.
|
||||
episode_end: Range end episode (multi-episode releases) or null.
|
||||
quality: Resolution token (e.g. 1080p, 2160p).
|
||||
source: Source token (BluRay, WEB-DL, ...).
|
||||
codec: Video codec token (x264, x265, ...).
|
||||
group: Release group name or null.
|
||||
languages: List of detected language tokens.
|
||||
audio_codec: Audio codec from ffprobe (when probe_used=true).
|
||||
audio_channels: Audio channel count from ffprobe.
|
||||
bit_depth: Bit depth from ffprobe.
|
||||
hdr_format: HDR format from ffprobe (HDR10, DV, ...) or null.
|
||||
edition: Edition tag (Extended, Director's Cut, ...) or null.
|
||||
site_tag: Source-site tag if present.
|
||||
is_season_pack: True when the folder contains a full season.
|
||||
probe_used: True when ffprobe successfully enriched the result.
|
||||
confidence: Parser confidence score, 0–100 (higher = more reliable).
|
||||
road: "Parser road: 'easy' (group schema matched), 'shitty' (heuristic but acceptable), or 'path_of_pain' (low confidence — ask the user before auto-routing)."
|
||||
recommended_action: "Orchestrator hint: 'process' (go straight to resolve_*_destination), 'ask_user' (media_type unknown or road=path_of_pain — confirm with the user first), or 'skip' (no main video, or media_type=other — nothing to organize)."
|
||||
@@ -1,59 +0,0 @@
|
||||
name: create_seed_links
|
||||
|
||||
summary: >
|
||||
Recreate the original torrent folder structure with hard-links so
|
||||
qBittorrent can keep seeding after the library move.
|
||||
|
||||
description: |
|
||||
Hard-links the library video file back into torrents/<original_folder_name>/
|
||||
and copies all remaining files from the original download folder
|
||||
(subtitles, .nfo, .jpg, .txt, …) so the torrent data is complete on
|
||||
disk. qBittorrent then sees the same content at the location it
|
||||
expects and can keep seeding without rehashing the whole torrent.
|
||||
|
||||
when_to_use: |
|
||||
Only when the user has confirmed they want to keep seeding after a
|
||||
move. Call right after manage_subtitles (or after move_media if there
|
||||
are no subs).
|
||||
|
||||
when_not_to_use: |
|
||||
- When the user explicitly answered "no" to "keep seeding?".
|
||||
- When the download was not from a torrent (e.g. direct download).
|
||||
- Before the library file is in place — this tool reads it.
|
||||
|
||||
next_steps: |
|
||||
- After success: optionally call qBittorrent to update the torrent's
|
||||
save path / force a recheck (not yet covered by a tool).
|
||||
- End the workflow.
|
||||
|
||||
parameters:
|
||||
library_file:
|
||||
description: Absolute path to the video file now in the library.
|
||||
why_needed: |
|
||||
The source for the hard-link — same inode means qBittorrent sees
|
||||
identical bytes at the seeding path.
|
||||
example: /tv_shows/Oz.1997.1080p.WEBRip.x265-KONTRAST/Season 03/Oz.S03E01.mkv
|
||||
|
||||
original_download_folder:
|
||||
description: Absolute path to the original download folder.
|
||||
why_needed: |
|
||||
Provides the folder name to recreate under torrents/ and the
|
||||
auxiliary files (subs, nfo, ...) to copy over.
|
||||
example: /downloads/Oz.S03.1080p.WEBRip.x265-KONTRAST
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Seeding folder rebuilt.
|
||||
fields:
|
||||
status: "'ok'"
|
||||
torrent_subfolder: Absolute path of the recreated folder under torrents/.
|
||||
linked_file: Absolute path of the hard-linked video.
|
||||
copied_files: List of auxiliary files that were copied.
|
||||
copied_count: Number of auxiliary files copied.
|
||||
skipped: List of files skipped (already present, unreadable, ...).
|
||||
|
||||
error:
|
||||
description: Failed to rebuild the seeding folder.
|
||||
fields:
|
||||
error: Short error code.
|
||||
message: Human-readable explanation.
|
||||
@@ -1,48 +0,0 @@
|
||||
name: end_workflow
|
||||
|
||||
summary: >
|
||||
Leave the current workflow scope and return to the broad-catalog mode.
|
||||
|
||||
description: |
|
||||
Clears the active workflow from STM. After this call the visible tool
|
||||
catalog returns to the core noyau plus start_workflow, so the agent is
|
||||
ready to handle a different request.
|
||||
|
||||
when_to_use: |
|
||||
- When all the workflow's steps have completed successfully.
|
||||
- When the user explicitly cancels the current task.
|
||||
- When the user changes subject mid-conversation and the active
|
||||
workflow is no longer relevant.
|
||||
- When an unrecoverable error makes continuing pointless — explain
|
||||
in 'reason'.
|
||||
|
||||
when_not_to_use: |
|
||||
- Do not call when there is no active workflow — it will return an
|
||||
error. Just call start_workflow for the new request instead.
|
||||
- Do not call mid-step just to "free up tools"; finish the step
|
||||
or fail it explicitly first.
|
||||
|
||||
next_steps: |
|
||||
- After ending, you can either call start_workflow for a new task or
|
||||
answer the user directly from the broad catalog.
|
||||
|
||||
parameters:
|
||||
reason:
|
||||
description: Short reason for ending — completed, cancelled, changed_subject, error, ...
|
||||
why_needed: |
|
||||
Recorded in episodic memory for debugging and future audits. A
|
||||
structured short string is more useful than a long sentence.
|
||||
example: completed
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Workflow ended; catalog is back to the broad noyau.
|
||||
fields:
|
||||
workflow: Name of the workflow that just ended.
|
||||
reason: The reason that was passed in.
|
||||
|
||||
error:
|
||||
description: Could not end — typically because nothing was active.
|
||||
fields:
|
||||
error: Short error code (no_active_workflow).
|
||||
message: Human-readable explanation.
|
||||
@@ -1,56 +0,0 @@
|
||||
name: find_media_imdb_id
|
||||
|
||||
summary: >
|
||||
Search TMDB for a media title and return its canonical title, year,
|
||||
IMDb id, and TMDB id.
|
||||
|
||||
description: |
|
||||
Looks up a title on TMDB and returns the canonical metadata needed by
|
||||
the resolve_*_destination tools. On success, the result is also
|
||||
stashed in short-term memory under "last_media_search" so later steps
|
||||
in the workflow can read it without re-calling TMDB. The STM topic
|
||||
is set to "searching_media".
|
||||
|
||||
when_to_use: |
|
||||
Right after analyze_release, before calling resolve_*_destination —
|
||||
the resolvers need the canonical title + year and refuse to guess
|
||||
them from the raw release name.
|
||||
|
||||
when_not_to_use: |
|
||||
- When you already have the IMDb id in STM from an earlier step in
|
||||
the same workflow.
|
||||
- For torrent search — use find_torrent instead.
|
||||
|
||||
next_steps: |
|
||||
- On status=ok: call the appropriate resolve_*_destination with
|
||||
tmdb_title and tmdb_year from the result.
|
||||
- On status=error (not_found): show the error and ask the user for
|
||||
a more precise title.
|
||||
|
||||
cache:
|
||||
key: media_title
|
||||
|
||||
parameters:
|
||||
media_title:
|
||||
description: Title to search for. Free-form — TMDB does the matching.
|
||||
why_needed: |
|
||||
Drives the TMDB query. Pass a sanitized version (no resolution
|
||||
tokens, no group name) for best results.
|
||||
example: Breaking Bad
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Match found.
|
||||
fields:
|
||||
status: "'ok'"
|
||||
title: Canonical title as returned by TMDB.
|
||||
year: Release year (movies) or first-air year (series).
|
||||
media_type: "'movie' or 'tv'."
|
||||
imdb_id: IMDb identifier (ttXXXXXXX) or null.
|
||||
tmdb_id: TMDB numeric id.
|
||||
|
||||
error:
|
||||
description: No match or API failure.
|
||||
fields:
|
||||
error: Short error code (not_found, api_error, ...).
|
||||
message: Human-readable explanation.
|
||||
@@ -1,52 +0,0 @@
|
||||
name: find_torrent
|
||||
|
||||
summary: >
|
||||
Search Knaben for torrents matching a media title; cache results in
|
||||
episodic memory.
|
||||
|
||||
description: |
|
||||
Queries the Knaben aggregator for up to 10 torrents matching the
|
||||
given title, then stores the result list in episodic memory under
|
||||
"last_search_results". The user can then refer to a torrent by
|
||||
1-based index ("download the 3rd one") via get_torrent_by_index or
|
||||
add_torrent_by_index. The STM topic is set to "selecting_torrent".
|
||||
|
||||
when_to_use: |
|
||||
When the user wants to download something new — typically the first
|
||||
step of a "find + download" sub-task. The agent should usually
|
||||
pre-filter the title (canonical name + year) before searching for
|
||||
cleaner results.
|
||||
|
||||
when_not_to_use: |
|
||||
- For TMDB metadata lookup — use find_media_imdb_id.
|
||||
- When a search was already performed in the same session and the
|
||||
user is just picking from the existing list.
|
||||
|
||||
next_steps: |
|
||||
- Present the indexed results to the user.
|
||||
- Once chosen: call add_torrent_by_index(N) — that wraps
|
||||
get_torrent_by_index + add_torrent_to_qbittorrent.
|
||||
|
||||
cache:
|
||||
key: media_title
|
||||
|
||||
parameters:
|
||||
media_title:
|
||||
description: Title to search for on Knaben. Free-form.
|
||||
why_needed: |
|
||||
Drives the search query. Use the canonical title (from
|
||||
find_media_imdb_id) plus quality preferences for better hits.
|
||||
example: Inception 2010 1080p
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Search returned a list of torrents.
|
||||
fields:
|
||||
status: "'ok'"
|
||||
torrents: "List of {name, size, seeders, leechers, magnet, ...}, up to 10."
|
||||
|
||||
error:
|
||||
description: Search failed.
|
||||
fields:
|
||||
error: Short error code.
|
||||
message: Human-readable explanation.
|
||||
@@ -1,48 +0,0 @@
|
||||
name: get_torrent_by_index
|
||||
|
||||
summary: >
|
||||
Retrieve a torrent from the last find_torrent search by its 1-based
|
||||
index.
|
||||
|
||||
description: |
|
||||
Reads episodic memory's last_search_results and returns the entry at
|
||||
the given 1-based position. Pure lookup — does not start a download.
|
||||
Fails when the search results are missing or the index is out of
|
||||
range.
|
||||
|
||||
when_to_use: |
|
||||
When the user references a search hit by number ("show me the second
|
||||
one") but doesn't yet want to download — e.g. inspection, sharing
|
||||
the magnet, ...
|
||||
|
||||
when_not_to_use: |
|
||||
- When the user wants to start downloading: use add_torrent_by_index
|
||||
instead (one call instead of two).
|
||||
- When no search has been performed yet — the result will be
|
||||
not_found.
|
||||
|
||||
next_steps: |
|
||||
- Display the torrent to the user.
|
||||
- If they then say "add it", call add_torrent_to_qbittorrent with the
|
||||
magnet, or add_torrent_by_index with the same index.
|
||||
|
||||
parameters:
|
||||
index:
|
||||
description: 1-based position in the last find_torrent result list.
|
||||
why_needed: |
|
||||
Maps to a specific torrent entry. Out-of-range values return an
|
||||
error, not a wraparound.
|
||||
example: 3
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Torrent found at that index.
|
||||
fields:
|
||||
status: "'ok'"
|
||||
torrent: "Full torrent dict (name, size, seeders, leechers, magnet, ...)."
|
||||
|
||||
error:
|
||||
description: No torrent at that index.
|
||||
fields:
|
||||
error: Short error code (not_found).
|
||||
message: Human-readable explanation, e.g. "Search for torrents first."
|
||||
@@ -1,76 +0,0 @@
|
||||
name: learn
|
||||
|
||||
summary: >
|
||||
Teach Alfred a new token mapping and persist it to the learned
|
||||
knowledge pack so future scans recognise it.
|
||||
|
||||
description: |
|
||||
Appends a new token (or list of tokens) to a key inside a knowledge
|
||||
pack and writes the result to `data/knowledge/<pack>_learned.yaml`.
|
||||
The change is persisted atomically (write-tmp + rename) so a crash
|
||||
cannot corrupt the file. Currently only the `subtitles` pack is
|
||||
supported.
|
||||
|
||||
when_to_use: |
|
||||
When manage_subtitles returns needs_clarification with unresolved
|
||||
tokens, after confirming with the user what the tokens mean. Call
|
||||
once per (category, key) — multiple values can be added in a single
|
||||
call.
|
||||
|
||||
when_not_to_use: |
|
||||
- Without explicit user confirmation of what the token means.
|
||||
- For knowledge that belongs in the static pack
|
||||
(alfred/knowledge/<pack>.yaml) — that's editor territory, not
|
||||
runtime learning.
|
||||
|
||||
next_steps: |
|
||||
- After success: re-run the workflow step that triggered the
|
||||
clarification (typically manage_subtitles) so the new mapping is
|
||||
applied.
|
||||
|
||||
parameters:
|
||||
pack:
|
||||
description: Knowledge pack name. Currently only "subtitles" is supported.
|
||||
why_needed: |
|
||||
Decides which `*_learned.yaml` file under data/knowledge/ gets
|
||||
written. The pack name is namespaced to avoid collisions across
|
||||
domains.
|
||||
example: subtitles
|
||||
|
||||
category:
|
||||
description: Category within the pack — "languages", "types", or "formats".
|
||||
why_needed: |
|
||||
Different categories use different lookup tables at scan time.
|
||||
A wrong category silently has no effect.
|
||||
example: languages
|
||||
|
||||
key:
|
||||
description: Canonical entry id — ISO 639-1 code, type name, format name.
|
||||
why_needed: |
|
||||
The destination bucket for the new tokens. Existing tokens under
|
||||
this key are kept; only new values are appended.
|
||||
example: es
|
||||
|
||||
values:
|
||||
description: List of token spellings to add.
|
||||
why_needed: |
|
||||
Release groups use many spellings for the same language/type;
|
||||
pass them all in one call instead of multiple round-trips.
|
||||
example: '["spanish", "espanol", "spa"]'
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Mapping saved.
|
||||
fields:
|
||||
status: "'ok'"
|
||||
pack: Name of the pack that was written to.
|
||||
category: Category that was updated.
|
||||
key: Key that was updated.
|
||||
added_count: Number of values that were actually new (deduplicated).
|
||||
tokens: Full updated token list for that key.
|
||||
|
||||
error:
|
||||
description: Save failed.
|
||||
fields:
|
||||
error: Short error code (unknown_pack, unknown_category, read_failed, write_failed).
|
||||
message: Human-readable explanation.
|
||||
@@ -1,63 +0,0 @@
|
||||
name: list_folder
|
||||
|
||||
summary: >
|
||||
List the contents of a configured folder, optionally below a
|
||||
relative subpath.
|
||||
|
||||
description: |
|
||||
Reads a folder previously configured via set_path_for_folder and
|
||||
returns its entries (files + directories). A relative `path` lets you
|
||||
drill down without re-specifying the absolute root each time. Path
|
||||
traversal is rejected (no `..`, no absolute paths) so the agent
|
||||
cannot escape the configured root.
|
||||
|
||||
when_to_use: |
|
||||
- At the start of an organize workflow to discover what's available
|
||||
in the download folder.
|
||||
- To browse a library collection ("what tv shows do I have?").
|
||||
- As a sanity check before any move to confirm the target exists.
|
||||
|
||||
when_not_to_use: |
|
||||
- For folders that are not configured — call set_path_for_folder
|
||||
first.
|
||||
- To list arbitrary system paths — this tool is intentionally scoped
|
||||
to the known roots.
|
||||
|
||||
next_steps: |
|
||||
- After listing the download folder: typically call analyze_release
|
||||
on a specific entry.
|
||||
- After listing a library folder: use the result to disambiguate a
|
||||
destination during resolve_*_destination.
|
||||
|
||||
cache:
|
||||
key: path
|
||||
|
||||
parameters:
|
||||
folder_type:
|
||||
description: Logical folder key (download, torrent, movie, tv_show, ...).
|
||||
why_needed: |
|
||||
Resolves to an absolute root through LTM. Must have been set via
|
||||
set_path_for_folder beforehand.
|
||||
example: download
|
||||
|
||||
path:
|
||||
description: Relative subpath inside the root (default ".").
|
||||
why_needed: |
|
||||
Lets you drill into a subfolder without expanding the root. No
|
||||
".." or absolute path is allowed.
|
||||
example: Breaking.Bad.S01.1080p.BluRay.x265-GROUP
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Listing returned.
|
||||
fields:
|
||||
status: "'ok'"
|
||||
folder_type: The key that was listed.
|
||||
path: The relative path that was listed.
|
||||
entries: List of {name, type, size?} for each entry.
|
||||
|
||||
error:
|
||||
description: Could not list the folder.
|
||||
fields:
|
||||
error: Short error code (folder_not_configured, path_not_found, path_traversal, ...).
|
||||
message: Human-readable explanation.
|
||||
@@ -1,67 +0,0 @@
|
||||
name: manage_subtitles
|
||||
|
||||
summary: >
|
||||
Detect, filter, and place subtitle tracks next to a video that has just
|
||||
been organised into the library.
|
||||
|
||||
description: |
|
||||
Scans the source video's surroundings for subtitle files
|
||||
(.srt, .ass, .ssa, .vtt, .sub), classifies them by language and type
|
||||
(standard / SDH / forced), filters by the user's SubtitlePreferences
|
||||
(languages, min size, keep_sdh, keep_forced), and hard-links the
|
||||
passing files next to the destination video using the convention
|
||||
`<lang>.<ext>`, `<lang>.sdh.<ext>`, `<lang>.forced.<ext>`.
|
||||
If no subtitles are found, returns status=ok with placed_count=0 — not
|
||||
an error.
|
||||
|
||||
when_to_use: |
|
||||
Always after a successful move_media / move_to_destination, before
|
||||
closing the workflow. Pass the original source path (where subs live)
|
||||
and the new library path (where they should land).
|
||||
|
||||
when_not_to_use: |
|
||||
- Do not call before the video itself has been moved — the destination
|
||||
must exist for hard-links to make sense.
|
||||
- Skip when the user explicitly asks not to handle subtitles.
|
||||
|
||||
next_steps: |
|
||||
- On status=ok: continue with create_seed_links (if seeding) or end
|
||||
the workflow.
|
||||
- On status=needs_clarification: ask the user about the unresolved
|
||||
tokens, then optionally call learn() to teach the new mapping.
|
||||
|
||||
parameters:
|
||||
source_video:
|
||||
description: Absolute path to the original video file (in the download folder).
|
||||
why_needed: |
|
||||
Subtitles typically live next to the source, either as siblings or
|
||||
in a Subs/ subfolder. The scanner walks from this path.
|
||||
example: /downloads/Oz.S03.1080p.WEBRip.x265-KONTRAST/Oz.S03E01.mkv
|
||||
|
||||
destination_video:
|
||||
description: Absolute path to the video file in its library location.
|
||||
why_needed: |
|
||||
Subtitles are hard-linked next to this file so media players pick
|
||||
them up automatically.
|
||||
example: /tv_shows/Oz.1997.1080p.WEBRip.x265-KONTRAST/Season 03/Oz.S03E01.mkv
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Subtitles scanned (and possibly placed).
|
||||
fields:
|
||||
status: "'ok'"
|
||||
placed: List of {source, destination, filename} for each linked file.
|
||||
placed_count: Number of subtitle files placed.
|
||||
skipped_count: Number of subtitle files filtered out.
|
||||
|
||||
needs_clarification:
|
||||
description: One or more tokens could not be classified.
|
||||
fields:
|
||||
unresolved: List of unrecognised tokens with their context.
|
||||
question: Human-readable question to relay to the user.
|
||||
|
||||
error:
|
||||
description: Scan or placement failed.
|
||||
fields:
|
||||
error: Short error code.
|
||||
message: Human-readable explanation.
|
||||
@@ -1,58 +0,0 @@
|
||||
name: move_media
|
||||
|
||||
summary: >
|
||||
Safely move a media file with copy + integrity check + delete source.
|
||||
|
||||
description: |
|
||||
Copies the source file to the destination with an integrity check,
|
||||
then deletes the source. Slower than move_to_destination (which is a
|
||||
plain rename) but safer across filesystems where rename is not atomic
|
||||
or when you want a checksum verification.
|
||||
|
||||
when_to_use: |
|
||||
Use to move a single file across filesystems or when paranoia about
|
||||
data integrity is justified — e.g. moving a finished download from a
|
||||
scratch disk to the main library array.
|
||||
|
||||
when_not_to_use: |
|
||||
- For same-filesystem moves where speed matters: use move_to_destination
|
||||
(instant rename on ZFS/ext4 within the same dataset).
|
||||
- For folder-level moves of complete packs: use move_to_destination —
|
||||
move_media is a single-file operation.
|
||||
|
||||
next_steps: |
|
||||
- After a successful move: call manage_subtitles to place any subtitle
|
||||
tracks, then create_seed_links if the user wants to keep seeding.
|
||||
- On error: surface the error code (file_not_found, destination_exists,
|
||||
integrity_check_failed) and ask the user how to proceed.
|
||||
|
||||
parameters:
|
||||
source:
|
||||
description: Absolute path to the source video file.
|
||||
why_needed: |
|
||||
The file being moved. Typically lives under the downloads folder
|
||||
after a torrent completes.
|
||||
example: /downloads/Inception.2010.1080p.BluRay.x265-GROUP/movie.mkv
|
||||
|
||||
destination:
|
||||
description: Absolute path of the destination file — must not already exist.
|
||||
why_needed: |
|
||||
Where the file lands in the library. Comes from a resolve_*_destination
|
||||
call so the naming convention is respected.
|
||||
example: /movies/Inception.2010.1080p.BluRay.x265-GROUP/Inception.2010.1080p.BluRay.x265-GROUP.mkv
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Move succeeded.
|
||||
fields:
|
||||
status: "'ok'"
|
||||
source: Absolute path of the source (now gone).
|
||||
destination: Absolute path of the destination (now in place).
|
||||
filename: Basename of the destination file.
|
||||
size: Size in bytes.
|
||||
|
||||
error:
|
||||
description: Move failed.
|
||||
fields:
|
||||
error: Short error code (file_not_found, destination_exists, integrity_check_failed, ...).
|
||||
message: Human-readable explanation.
|
||||
@@ -1,55 +0,0 @@
|
||||
name: move_to_destination
|
||||
|
||||
summary: >
|
||||
Move a file or folder to a destination, creating parent directories as needed.
|
||||
|
||||
description: |
|
||||
Performs an actual move on disk. Uses the system 'mv' command, so on the
|
||||
same filesystem (e.g. ZFS) this is an instant rename. Creates the parent
|
||||
directory of the destination if it doesn't exist yet, then moves. Returns
|
||||
before/after paths on success, or an error if the destination already
|
||||
exists or the source can't be moved.
|
||||
|
||||
when_to_use: |
|
||||
Use after one of the resolve_*_destination tools returned status=ok, to
|
||||
perform the move it described. The 'source' and 'destination' arguments
|
||||
come directly from the resolved paths.
|
||||
|
||||
when_not_to_use: |
|
||||
- Never move when status was not 'ok' (clarification still pending or
|
||||
error happened) — that would leave the library in a half-broken state.
|
||||
- Don't use this for the seed-link step; use create_seed_links for that.
|
||||
|
||||
next_steps: |
|
||||
- After a successful move: call manage_subtitles to place any subtitle
|
||||
tracks, then create_seed_links to keep qBittorrent seeding.
|
||||
- On error: surface the message; do not retry blindly — check whether
|
||||
the destination already exists or the source path is correct.
|
||||
|
||||
parameters:
|
||||
source:
|
||||
description: Absolute path to the source file or folder to move.
|
||||
why_needed: |
|
||||
The thing being moved. Comes from the user's download folder or from
|
||||
a previous tool's output.
|
||||
example: /downloads/Oz.S03.1080p.WEBRip.x265-KONTRAST
|
||||
|
||||
destination:
|
||||
description: Absolute path of the destination — must not already exist.
|
||||
why_needed: |
|
||||
Where to put the source. Comes from a resolve_*_destination call so
|
||||
that the path matches the library's naming convention.
|
||||
example: /tv_shows/Oz.1997.1080p.WEBRip.x265-KONTRAST/Oz.S03.1080p.WEBRip.x265-KONTRAST
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Move succeeded.
|
||||
fields:
|
||||
source: Absolute path of the source (now gone).
|
||||
destination: Absolute path of the destination (now in place).
|
||||
|
||||
error:
|
||||
description: Move failed.
|
||||
fields:
|
||||
error: Short error code (source_not_found, destination_exists, mkdir_failed, move_failed).
|
||||
message: Human-readable explanation of what went wrong.
|
||||
@@ -1,56 +0,0 @@
|
||||
name: probe_media
|
||||
|
||||
summary: >
|
||||
Run ffprobe on a single video file and return its technical details.
|
||||
|
||||
description: |
|
||||
Inspects a specific video file with ffprobe and returns codec,
|
||||
resolution, duration, bitrate, the list of audio tracks (with
|
||||
language and channel layout), and the list of embedded subtitle
|
||||
tracks. Independent of any release-name parsing — works on any file
|
||||
you can point at.
|
||||
|
||||
when_to_use: |
|
||||
- To inspect a file's audio/subtitle tracks before deciding what to
|
||||
do (e.g. choose a default audio language).
|
||||
- To verify a video's resolution / codec when the release name is
|
||||
unreliable.
|
||||
- As a building block when analyze_release is overkill.
|
||||
|
||||
when_not_to_use: |
|
||||
- For full release routing — analyze_release does parsing + media
|
||||
type detection + probe in one call.
|
||||
- On non-video files — ffprobe will return probe_failed.
|
||||
|
||||
next_steps: |
|
||||
- The returned info typically feeds a user-facing decision (e.g.
|
||||
"this is 7.1 DTS, want to keep it?"); rarely chained directly to
|
||||
another tool.
|
||||
|
||||
cache:
|
||||
key: source_path
|
||||
|
||||
parameters:
|
||||
source_path:
|
||||
description: Absolute path to the video file to probe.
|
||||
why_needed: |
|
||||
ffprobe needs the exact file (not a folder). For releases use
|
||||
analyze_release; for a known file path, pass it here.
|
||||
example: /downloads/Inception.2010.1080p.BluRay.x265-GROUP/movie.mkv
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Probe succeeded.
|
||||
fields:
|
||||
status: "'ok'"
|
||||
video: "Dict with codec, resolution, width, height, duration_seconds, bitrate_kbps."
|
||||
audio_tracks: "List of {index, codec, channels, channel_layout, language, is_default}."
|
||||
subtitle_tracks: "List of {index, codec, language, is_default, is_forced}."
|
||||
audio_languages: List of language codes present in audio tracks.
|
||||
is_multi_audio: True when more than one audio language is present.
|
||||
|
||||
error:
|
||||
description: Probe failed.
|
||||
fields:
|
||||
error: Short error code (not_found, probe_failed).
|
||||
message: Human-readable explanation.
|
||||
@@ -1,54 +0,0 @@
|
||||
name: query_library
|
||||
|
||||
summary: >
|
||||
Find release folders across all configured library roots whose name
|
||||
contains a substring (case-insensitive).
|
||||
|
||||
description: |
|
||||
Scans every configured library root (movies, tv_shows, …) at depth 1
|
||||
and returns folders whose name contains the query. For each match,
|
||||
reports whether a `.alfred/metadata.yaml` exists — handy to spot
|
||||
releases that have not been inspected yet. Does not recurse into
|
||||
seasons / episodes; one entry per release folder.
|
||||
|
||||
when_to_use: |
|
||||
- To answer "do I already have X?" without listing whole library
|
||||
roots one by one.
|
||||
- To pick the release_path to feed read_release_metadata or any
|
||||
inspector tool.
|
||||
|
||||
when_not_to_use: |
|
||||
- To list the *whole* library — that scan should live behind a
|
||||
dedicated tool (not implemented yet).
|
||||
- To browse a single root — use list_folder instead, it's cheaper
|
||||
and doesn't open every library.
|
||||
|
||||
next_steps: |
|
||||
- When one match is found: feed its path to read_release_metadata or
|
||||
analyze_release.
|
||||
- When several match: surface the indexed list to the user and ask
|
||||
which one they mean.
|
||||
|
||||
parameters:
|
||||
name:
|
||||
description: Case-insensitive substring of the release name to look for.
|
||||
why_needed: |
|
||||
Library folders are named after the release (Title.Year.... or
|
||||
Title (Year)). A substring is enough to catch typical user
|
||||
phrasings ("foundation", "inception 2010").
|
||||
example: foundation
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Scan completed (possibly zero matches).
|
||||
fields:
|
||||
status: "'ok'"
|
||||
query: The query string as received.
|
||||
match_count: Number of matching folders.
|
||||
matches: "List of {collection, name, path, has_metadata}."
|
||||
|
||||
error:
|
||||
description: Scan could not run.
|
||||
fields:
|
||||
error: Short error code (no_libraries, empty_name).
|
||||
message: Human-readable explanation.
|
||||
@@ -1,55 +0,0 @@
|
||||
name: read_release_metadata
|
||||
|
||||
summary: >
|
||||
Read the `.alfred/metadata.yaml` file for a release folder.
|
||||
|
||||
description: |
|
||||
Returns whatever has been previously persisted by inspector tools
|
||||
(analyze_release, probe_media, find_media_imdb_id) and by the subtitle
|
||||
pipeline. Works for any folder — download or library — as long as the
|
||||
release has been touched at least once. Missing metadata is not an
|
||||
error: the tool returns `has_metadata=false` with an empty dict.
|
||||
|
||||
when_to_use: |
|
||||
- Before re-running analyze_release / probe_media on a release you
|
||||
might have already seen — saves a full re-inspection.
|
||||
- To answer "what do we know about X?" without scanning.
|
||||
- To list which releases in a library have no `.alfred` yet (loop +
|
||||
`has_metadata`).
|
||||
|
||||
when_not_to_use: |
|
||||
- To search a library by name — use query_library.
|
||||
- When you need a fresh probe/parse — call the inspector directly,
|
||||
the result will be persisted automatically.
|
||||
|
||||
next_steps: |
|
||||
- If `has_metadata=false`, decide whether to inspect now
|
||||
(analyze_release / probe_media).
|
||||
- If `has_metadata=true`, read `metadata.parse`, `metadata.probe`,
|
||||
`metadata.tmdb` blocks before deciding next actions.
|
||||
|
||||
cache:
|
||||
key: release_path
|
||||
|
||||
parameters:
|
||||
release_path:
|
||||
description: Absolute path to the release folder (or any file inside it).
|
||||
why_needed: |
|
||||
The store lives at `<release_root>/.alfred/metadata.yaml`. A file
|
||||
path is auto-resolved to its parent folder.
|
||||
example: /mnt/library/tv_shows/Foundation.2021.1080p.WEBRip.x265-RARBG
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Release inspected (file may or may not exist).
|
||||
fields:
|
||||
status: "'ok'"
|
||||
release_path: Absolute path of the release folder.
|
||||
has_metadata: True if `.alfred/metadata.yaml` exists.
|
||||
metadata: Full content of the file, or empty dict.
|
||||
|
||||
error:
|
||||
description: Path does not exist on disk.
|
||||
fields:
|
||||
error: Short error code (not_found).
|
||||
message: Human-readable explanation.
|
||||
@@ -1,93 +0,0 @@
|
||||
name: resolve_episode_destination
|
||||
|
||||
summary: >
|
||||
Compute destination paths for a single TV episode file (file move).
|
||||
|
||||
description: |
|
||||
Resolves the target series folder, season subfolder, and full destination
|
||||
filename for a single-episode release. Returns paths only — does not move
|
||||
anything. If a series folder with a different name already exists, returns
|
||||
needs_clarification.
|
||||
|
||||
when_to_use: |
|
||||
Use after analyze_release has identified the release as a single episode
|
||||
(media_type=tv_show, season AND episode both set). TMDB must already be
|
||||
queried for the canonical title/year, and optionally the episode title.
|
||||
|
||||
when_not_to_use: |
|
||||
- Season packs (folder containing many episodes): use resolve_season_destination.
|
||||
- Multi-season packs: use resolve_series_destination.
|
||||
- Movies: use resolve_movie_destination.
|
||||
|
||||
next_steps: |
|
||||
- On status=ok: call move_to_destination with the source video file and
|
||||
destination=library_file.
|
||||
- On status=needs_clarification: present question/options to the user,
|
||||
then re-call with confirmed_folder set.
|
||||
- On status=error: surface the message; do not move.
|
||||
|
||||
parameters:
|
||||
release_name:
|
||||
description: Raw release file name (with extension).
|
||||
why_needed: |
|
||||
Drives extraction of quality/source/codec/group, which become part of
|
||||
the destination filename so each file is self-describing.
|
||||
example: Oz.S03E01.1080p.WEBRip.x265-KONTRAST.mkv
|
||||
|
||||
source_file:
|
||||
description: Absolute path to the source video file on disk.
|
||||
why_needed: |
|
||||
Used to read the source file extension (.mkv, .mp4, .avi…) for the
|
||||
destination filename — release names don't always carry the extension.
|
||||
example: /downloads/Oz.S03E01.1080p.WEBRip.x265-KONTRAST/file.mkv
|
||||
|
||||
tmdb_title:
|
||||
description: Canonical show title from TMDB.
|
||||
why_needed: |
|
||||
Title prefix for both the series folder and the destination filename;
|
||||
ensures consistent naming across all episodes of the show.
|
||||
example: Oz
|
||||
|
||||
tmdb_year:
|
||||
description: Show start year from TMDB.
|
||||
why_needed: |
|
||||
Disambiguates remakes/reboots sharing a title; year is part of the
|
||||
series folder identity.
|
||||
example: "1997"
|
||||
|
||||
tmdb_episode_title:
|
||||
description: Episode title from TMDB. Optional.
|
||||
why_needed: |
|
||||
When present, the destination filename embeds the episode title for
|
||||
human-readability (e.g. Oz.S01E01.The.Routine...).
|
||||
example: The Routine
|
||||
|
||||
confirmed_folder:
|
||||
description: Folder name the user picked after needs_clarification.
|
||||
why_needed: |
|
||||
Forces the use case to skip detection and use this exact folder name.
|
||||
example: Oz.1997.1080p.WEBRip.x265-KONTRAST
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Paths resolved; ready to move the episode file.
|
||||
fields:
|
||||
series_folder: Absolute path to the series root folder.
|
||||
season_folder: Absolute path to the season subfolder.
|
||||
library_file: Absolute path to the destination .mkv file (move target).
|
||||
series_folder_name: Series folder name for display.
|
||||
season_folder_name: Season folder name for display.
|
||||
filename: Destination filename for display.
|
||||
is_new_series_folder: True if the series folder doesn't exist yet.
|
||||
|
||||
needs_clarification:
|
||||
description: A folder exists with a different name; user must choose.
|
||||
fields:
|
||||
question: Human-readable question.
|
||||
options: List of folder names to pick from.
|
||||
|
||||
error:
|
||||
description: Resolution failed.
|
||||
fields:
|
||||
error: Short error code.
|
||||
message: Human-readable explanation.
|
||||
@@ -1,72 +0,0 @@
|
||||
name: resolve_movie_destination
|
||||
|
||||
summary: >
|
||||
Compute destination paths for a movie file (file move).
|
||||
|
||||
description: |
|
||||
Resolves the target movie folder and full destination filename for a movie
|
||||
release. Returns paths only — does not move anything. Movies do not have
|
||||
the existing-folder disambiguation problem that TV shows have (each
|
||||
release lands in its own folder named after the canonical title + year +
|
||||
tech).
|
||||
|
||||
when_to_use: |
|
||||
Use after analyze_release has identified the release as a movie
|
||||
(media_type=movie). TMDB must already be queried for the canonical title
|
||||
and release year.
|
||||
|
||||
when_not_to_use: |
|
||||
- TV shows in any form: use resolve_season_destination /
|
||||
resolve_episode_destination / resolve_series_destination.
|
||||
- Documentaries when they're treated as series rather than standalone
|
||||
films: route them through the TV-show resolvers.
|
||||
|
||||
next_steps: |
|
||||
- On status=ok: call move_to_destination with the source video file and
|
||||
destination=library_file.
|
||||
- On status=error: surface the message; do not move.
|
||||
|
||||
parameters:
|
||||
release_name:
|
||||
description: Raw release folder or file name.
|
||||
why_needed: |
|
||||
Drives extraction of quality/source/codec/group/edition tokens, which
|
||||
become part of both the movie folder and filename so each release is
|
||||
self-describing on disk.
|
||||
example: Inception.2010.1080p.BluRay.x265-GROUP
|
||||
|
||||
source_file:
|
||||
description: Absolute path to the source video file on disk.
|
||||
why_needed: |
|
||||
Used to read the file extension for the destination filename.
|
||||
example: /downloads/Inception.2010.1080p.BluRay.x265-GROUP/movie.mkv
|
||||
|
||||
tmdb_title:
|
||||
description: Canonical movie title from TMDB.
|
||||
why_needed: |
|
||||
Title prefix for the destination folder/file; ensures the library
|
||||
uses the canonical title and not a sanitized release-name title.
|
||||
example: Inception
|
||||
|
||||
tmdb_year:
|
||||
description: Movie release year from TMDB.
|
||||
why_needed: |
|
||||
Disambiguates remakes that share a title (Dune 1984 vs Dune 2021)
|
||||
and locks the folder identity in time.
|
||||
example: "2010"
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Paths resolved; ready to move.
|
||||
fields:
|
||||
movie_folder: Absolute path to the movie folder.
|
||||
library_file: Absolute path to the destination .mkv file (move target).
|
||||
movie_folder_name: Folder name for display.
|
||||
filename: Destination filename for display.
|
||||
is_new_folder: True if the movie folder doesn't exist yet.
|
||||
|
||||
error:
|
||||
description: Resolution failed.
|
||||
fields:
|
||||
error: Short error code (e.g. library_not_set).
|
||||
message: Human-readable explanation.
|
||||
@@ -1,95 +0,0 @@
|
||||
name: resolve_season_destination
|
||||
|
||||
summary: >
|
||||
Compute destination paths for a season pack (folder move) in the TV library.
|
||||
|
||||
description: |
|
||||
Resolves the target series folder and season subfolder for a complete-season
|
||||
download. Returns the paths only — does not perform any move. If a series
|
||||
folder for this show already exists in the library with a different name
|
||||
(different group/quality/source), returns needs_clarification so the user
|
||||
can decide whether to merge into the existing folder or create a new one.
|
||||
|
||||
when_to_use: |
|
||||
Use after analyze_release has identified the release as a season pack
|
||||
(media_type=tv_show, season set, episode unset). TMDB must already be
|
||||
queried so tmdb_title and tmdb_year are canonical values, not raw tokens
|
||||
from the release name.
|
||||
|
||||
when_not_to_use: |
|
||||
- Single-episode files: use resolve_episode_destination instead.
|
||||
- Multi-season packs (S01-S05 etc.): use resolve_series_destination.
|
||||
- Movies: use resolve_movie_destination.
|
||||
|
||||
next_steps: |
|
||||
- On status=ok: call move_to_destination with source=<download folder> and
|
||||
destination=season_folder.
|
||||
- On status=needs_clarification: present the question and options to the
|
||||
user, then re-call this tool with confirmed_folder set to the user's pick.
|
||||
- On status=error: surface the message to the user; do not move anything.
|
||||
|
||||
parameters:
|
||||
release_name:
|
||||
description: Raw release folder name as it appears on disk.
|
||||
why_needed: |
|
||||
Drives extraction of quality/source/codec/group tokens — these are
|
||||
embedded in the target folder name (Title.Year.Quality.Source.Codec-GROUP)
|
||||
to make releases self-describing on the filesystem.
|
||||
example: Oz.S03.1080p.WEBRip.x265-KONTRAST
|
||||
|
||||
tmdb_title:
|
||||
description: Canonical show title from TMDB.
|
||||
why_needed: |
|
||||
Builds the title prefix of the folder name. Must come from TMDB to
|
||||
avoid typos and variant spellings present in the raw release name.
|
||||
example: Oz
|
||||
|
||||
tmdb_year:
|
||||
description: Show start year from TMDB.
|
||||
why_needed: |
|
||||
Disambiguates shows that share a title across decades (e.g. multiple
|
||||
remakes of "The Office") and locks the folder identity.
|
||||
example: "1997"
|
||||
|
||||
confirmed_folder:
|
||||
description: |
|
||||
Folder name chosen by the user after a previous needs_clarification
|
||||
response.
|
||||
why_needed: |
|
||||
Short-circuits the existing-folder detection and forces the use case
|
||||
to use this exact folder name, even if it doesn't match the computed
|
||||
one.
|
||||
example: Oz.1997.1080p.WEBRip.x265-KONTRAST
|
||||
|
||||
source_path:
|
||||
description: |
|
||||
Absolute path to the release folder on disk. Optional.
|
||||
why_needed: |
|
||||
When provided, the tool runs ffprobe on the main video inside the
|
||||
folder and uses the probe data to fill quality/codec tokens that
|
||||
may be missing from the release name. The enriched tech tokens
|
||||
end up in the destination folder name, so providing source_path
|
||||
gives more accurate names for releases with sparse metadata.
|
||||
example: /downloads/Oz.S03.1080p.WEBRip.x265-KONTRAST
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Paths resolved unambiguously; ready to move.
|
||||
fields:
|
||||
series_folder: Absolute path to the series root folder.
|
||||
season_folder: Absolute path to the season subfolder (move target).
|
||||
series_folder_name: Just the series folder name, for display.
|
||||
season_folder_name: Just the season folder name, for display.
|
||||
is_new_series_folder: True if the series folder doesn't exist yet.
|
||||
|
||||
needs_clarification:
|
||||
description: A folder already exists with a different name; ask the user.
|
||||
fields:
|
||||
question: Human-readable question for the user.
|
||||
options: List of folder names the user can pick from.
|
||||
|
||||
error:
|
||||
description: Resolution failed (config missing, invalid release name, etc.).
|
||||
fields:
|
||||
error: Short error code (e.g. library_not_set).
|
||||
message: Human-readable explanation.
|
||||
@@ -1,87 +0,0 @@
|
||||
name: resolve_series_destination
|
||||
|
||||
summary: >
|
||||
Compute the destination path for a complete multi-season series pack (folder move).
|
||||
|
||||
description: |
|
||||
Resolves the target series folder for a pack that contains multiple seasons
|
||||
(e.g. S01-S05 in a single release). Returns only the series folder — the
|
||||
whole source folder is moved as-is into the library, no per-season
|
||||
restructuring. If a folder with a different name already exists for this
|
||||
show, returns needs_clarification.
|
||||
|
||||
when_to_use: |
|
||||
Use after analyze_release has identified the release as a complete-series
|
||||
pack (media_type=tv_complete, or multi-season indicators). TMDB must
|
||||
already be queried for canonical title/year.
|
||||
|
||||
when_not_to_use: |
|
||||
- Single-season packs: use resolve_season_destination.
|
||||
- Single episodes: use resolve_episode_destination.
|
||||
- Movies: use resolve_movie_destination.
|
||||
|
||||
next_steps: |
|
||||
- On status=ok: call move_to_destination with source=<download folder> and
|
||||
destination=series_folder.
|
||||
- On status=needs_clarification: ask the user, re-call with
|
||||
confirmed_folder set.
|
||||
- On status=error: surface the message; do not move.
|
||||
|
||||
parameters:
|
||||
release_name:
|
||||
description: Raw release folder name as it appears on disk.
|
||||
why_needed: |
|
||||
Drives extraction of quality/source/codec/group tokens for the target
|
||||
folder name, even though the multi-season structure inside is kept
|
||||
as-is.
|
||||
example: The.Wire.S01-S05.1080p.BluRay.x265-GROUP
|
||||
|
||||
tmdb_title:
|
||||
description: Canonical show title from TMDB.
|
||||
why_needed: |
|
||||
Title prefix of the series folder; comes from TMDB to avoid raw
|
||||
release-name spellings.
|
||||
example: The Wire
|
||||
|
||||
tmdb_year:
|
||||
description: Show start year from TMDB.
|
||||
why_needed: |
|
||||
Disambiguates shows that share a title across eras and locks the
|
||||
folder identity.
|
||||
example: "2002"
|
||||
|
||||
confirmed_folder:
|
||||
description: Folder name chosen by the user after needs_clarification.
|
||||
why_needed: |
|
||||
Forces the use case to use this exact folder name and skip detection.
|
||||
example: The.Wire.2002.1080p.BluRay.x265-GROUP
|
||||
|
||||
source_path:
|
||||
description: |
|
||||
Absolute path to the release folder on disk. Optional.
|
||||
why_needed: |
|
||||
When provided, the tool runs ffprobe on the main video inside the
|
||||
folder and uses probe data to fill quality/codec tokens that may
|
||||
be missing from the release name, producing a more accurate
|
||||
destination folder name.
|
||||
example: /downloads/The.Wire.S01-S05.1080p.BluRay.x265-GROUP
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Path resolved; ready to move the pack.
|
||||
fields:
|
||||
series_folder: Absolute path to the destination series folder.
|
||||
series_folder_name: Folder name for display.
|
||||
is_new_series_folder: True if the folder doesn't exist yet.
|
||||
|
||||
needs_clarification:
|
||||
description: A folder exists with a different name; ask the user.
|
||||
fields:
|
||||
question: Human-readable question.
|
||||
options: List of folder names to pick from.
|
||||
|
||||
error:
|
||||
description: Resolution failed.
|
||||
fields:
|
||||
error: Short error code.
|
||||
message: Human-readable explanation.
|
||||
@@ -1,47 +0,0 @@
|
||||
name: set_language
|
||||
|
||||
summary: >
|
||||
Set the conversation language so all subsequent assistant messages
|
||||
match it.
|
||||
|
||||
description: |
|
||||
Persists an ISO 639-1 language code in short-term memory under
|
||||
conversation.language. Read by the prompt builder and any tool that
|
||||
needs to localise output. Does not validate the code against an ISO
|
||||
list — the LLM is trusted to pass a sensible value.
|
||||
|
||||
when_to_use: |
|
||||
As the very first call when the user writes in a language different
|
||||
from the current STM language. Doing it before answering avoids a
|
||||
mid-reply switch.
|
||||
|
||||
when_not_to_use: |
|
||||
- On every turn — only when the language actually changes.
|
||||
- To pick a subtitle language — that lives in SubtitlePreferences,
|
||||
not the conversation language.
|
||||
|
||||
next_steps: |
|
||||
- After success: continue the user's request in the newly set
|
||||
language.
|
||||
|
||||
parameters:
|
||||
language:
|
||||
description: ISO 639-1 language code (en, fr, es, de, ...).
|
||||
why_needed: |
|
||||
Identifies the target language unambiguously across the UI and
|
||||
any localisation logic.
|
||||
example: fr
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Language saved.
|
||||
fields:
|
||||
status: "'ok'"
|
||||
message: Confirmation message.
|
||||
language: The language code that was saved.
|
||||
|
||||
error:
|
||||
description: Could not save the language.
|
||||
fields:
|
||||
status: "'error'"
|
||||
error: Short error code or exception message.
|
||||
@@ -1,58 +0,0 @@
|
||||
name: set_path_for_folder
|
||||
|
||||
summary: >
|
||||
Configure where a known folder lives on disk (download, torrent, or
|
||||
any library collection).
|
||||
|
||||
description: |
|
||||
Stores an absolute path in long-term memory under a folder key. Two
|
||||
classes of folders exist:
|
||||
- Workspace paths: "download", "torrent" — single-valued each, used
|
||||
by the organize workflows.
|
||||
- Library paths: any other key (e.g. "movie", "tv_show",
|
||||
"documentary") — these are the collections you organise into.
|
||||
The path must exist and be a directory; otherwise the call fails
|
||||
without changing memory.
|
||||
|
||||
when_to_use: |
|
||||
On first run, or when the user moves a folder, or when introducing a
|
||||
new library collection (e.g. "set the documentaries folder to ...").
|
||||
|
||||
when_not_to_use: |
|
||||
- For one-off listings — list_folder works without configuration only
|
||||
if the folder is already set.
|
||||
- To rename or delete an existing folder — this only sets paths.
|
||||
|
||||
next_steps: |
|
||||
- After success: typical follow-ups are list_folder on the same key,
|
||||
or starting a workflow that needs the path.
|
||||
|
||||
parameters:
|
||||
folder_name:
|
||||
description: Logical name of the folder (download, torrent, movie, tv_show, ...).
|
||||
why_needed: |
|
||||
The key the agent uses everywhere afterwards. "download" and
|
||||
"torrent" are reserved for workspace; anything else becomes a
|
||||
library collection.
|
||||
example: tv_show
|
||||
|
||||
path_value:
|
||||
description: Absolute path to the folder on disk.
|
||||
why_needed: |
|
||||
Must exist and be readable. Stored verbatim in LTM — relative
|
||||
paths are rejected.
|
||||
example: /tank/library/tv_shows
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Path saved to long-term memory.
|
||||
fields:
|
||||
status: "'ok'"
|
||||
folder_name: The logical name that was set.
|
||||
path_value: The absolute path that was saved.
|
||||
|
||||
error:
|
||||
description: Could not set the path.
|
||||
fields:
|
||||
error: Short error code (path_not_found, not_a_directory, invalid_path, ...).
|
||||
message: Human-readable explanation.
|
||||
@@ -1,64 +0,0 @@
|
||||
name: start_workflow
|
||||
|
||||
summary: >
|
||||
Enter a workflow scope — narrows the visible tool catalog and gives the
|
||||
agent a clear multi-step plan to follow.
|
||||
|
||||
description: |
|
||||
Activates a named workflow defined in YAML under agent/workflows/.
|
||||
Once active, only the workflow's declared tools (plus the core noyau)
|
||||
are exposed to the LLM, which keeps the decision space small and
|
||||
focused. The returned plan (description + steps) is the script the
|
||||
agent should execute until end_workflow is called.
|
||||
|
||||
when_to_use: |
|
||||
Use as the very first action whenever the user request maps to a
|
||||
known workflow (e.g. "organize Breaking Bad" → media.organize_media).
|
||||
Pass any parameters you already know (release name, target media,
|
||||
flags) in 'params' so later steps can read them from STM.
|
||||
|
||||
when_not_to_use: |
|
||||
- Do not start a workflow for purely conversational replies or
|
||||
one-shot lookups that need a single tool call.
|
||||
- Do not start a new workflow while one is already active — call
|
||||
end_workflow first.
|
||||
|
||||
next_steps: |
|
||||
- On status=ok: follow the returned 'steps' list, calling the tools
|
||||
in order. The visible tool catalog has already been narrowed.
|
||||
- On status=error (unknown_workflow): surface the available list to
|
||||
the user and ask which one they meant.
|
||||
- On status=error (workflow_already_active): either continue the
|
||||
active workflow or call end_workflow first.
|
||||
|
||||
parameters:
|
||||
workflow_name:
|
||||
description: Fully-qualified name of the workflow to start (e.g. media.organize_media).
|
||||
why_needed: |
|
||||
Identifies which YAML definition to load. Names use the
|
||||
'domain.action' convention (media.*, mail.*, ...).
|
||||
example: media.organize_media
|
||||
|
||||
params:
|
||||
description: Initial parameters to seed the workflow with (release name, target, flags).
|
||||
why_needed: |
|
||||
Later steps read these from STM instead of asking the user again.
|
||||
Pass whatever you already extracted from the user's message.
|
||||
example: '{"release_name": "Breaking.Bad.S01.1080p.BluRay.x265-GROUP", "keep_seeding": true}'
|
||||
|
||||
returns:
|
||||
ok:
|
||||
description: Workflow activated; catalog has been narrowed.
|
||||
fields:
|
||||
workflow: Name of the activated workflow.
|
||||
description: Human-readable description of what the workflow does.
|
||||
steps: Ordered list of steps to execute.
|
||||
tools: Tools that are now visible (in addition to the core noyau).
|
||||
|
||||
error:
|
||||
description: Could not activate the workflow.
|
||||
fields:
|
||||
error: Short error code (unknown_workflow, workflow_already_active).
|
||||
message: Human-readable explanation.
|
||||
available_workflows: List of valid workflow names (only on unknown_workflow).
|
||||
active_workflow: Name of the currently active workflow (only on workflow_already_active).
|
||||
@@ -1,86 +0,0 @@
|
||||
"""Workflow scoping tools — start_workflow / end_workflow meta-tools.
|
||||
|
||||
These tools let the agent enter and leave a workflow scope. While a
|
||||
workflow is active, the PromptBuilder narrows the visible tool catalog
|
||||
to the noyau + the workflow's declared tools, so the LLM doesn't have
|
||||
to reason over the full set.
|
||||
"""
|
||||
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
from alfred.infrastructure.persistence import get_memory
|
||||
|
||||
from ..workflows import WorkflowLoader
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
_loader_cache: list[WorkflowLoader] = []
|
||||
|
||||
|
||||
def _get_loader() -> WorkflowLoader:
|
||||
"""Lazily build the module-level WorkflowLoader."""
|
||||
if not _loader_cache:
|
||||
_loader_cache.append(WorkflowLoader())
|
||||
return _loader_cache[0]
|
||||
|
||||
|
||||
def start_workflow(workflow_name: str, params: dict) -> dict[str, Any]:
|
||||
"""See specs/start_workflow.yaml for full description."""
|
||||
loader = _get_loader()
|
||||
workflow = loader.get(workflow_name)
|
||||
if workflow is None:
|
||||
return {
|
||||
"status": "error",
|
||||
"error": "unknown_workflow",
|
||||
"message": f"Workflow '{workflow_name}' not found",
|
||||
"available_workflows": loader.names(),
|
||||
}
|
||||
|
||||
memory = get_memory()
|
||||
current = memory.stm.workflow.current
|
||||
if current is not None:
|
||||
return {
|
||||
"status": "error",
|
||||
"error": "workflow_already_active",
|
||||
"message": (
|
||||
f"Workflow '{current.get('name')}' is already active. "
|
||||
"Call end_workflow before starting a new one."
|
||||
),
|
||||
"active_workflow": current.get("name"),
|
||||
}
|
||||
|
||||
memory.stm.start_workflow(workflow_name, params or {})
|
||||
memory.save()
|
||||
logger.info(f"start_workflow: '{workflow_name}' with params={params}")
|
||||
|
||||
return {
|
||||
"status": "ok",
|
||||
"workflow": workflow_name,
|
||||
"description": workflow.get("description", ""),
|
||||
"steps": workflow.get("steps", []),
|
||||
"tools": workflow.get("tools", []),
|
||||
}
|
||||
|
||||
|
||||
def end_workflow(reason: str) -> dict[str, Any]:
|
||||
"""See specs/end_workflow.yaml for full description."""
|
||||
memory = get_memory()
|
||||
current = memory.stm.workflow.current
|
||||
if current is None:
|
||||
return {
|
||||
"status": "error",
|
||||
"error": "no_active_workflow",
|
||||
"message": "No workflow is currently active.",
|
||||
}
|
||||
|
||||
workflow_name = current.get("name")
|
||||
memory.stm.end_workflow()
|
||||
memory.save()
|
||||
logger.info(f"end_workflow: '{workflow_name}' reason={reason!r}")
|
||||
|
||||
return {
|
||||
"status": "ok",
|
||||
"workflow": workflow_name,
|
||||
"reason": reason,
|
||||
}
|
||||
@@ -22,7 +22,7 @@ class WorkflowLoader:
|
||||
Usage:
|
||||
loader = WorkflowLoader()
|
||||
all_workflows = loader.all()
|
||||
workflow = loader.get("media.organize_media")
|
||||
workflow = loader.get("organize_media")
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
|
||||
+1
-1
@@ -1,4 +1,4 @@
|
||||
name: media.manage_subtitles
|
||||
name: manage_subtitles
|
||||
description: >
|
||||
Place subtitle files alongside a video that has just been organised into the library.
|
||||
Detects the release pattern automatically, identifies and classifies all tracks,
|
||||
+16
-26
@@ -1,4 +1,4 @@
|
||||
name: media.organize_media
|
||||
name: organize_media
|
||||
description: >
|
||||
Organise a downloaded series or movie into the media library.
|
||||
Triggered when the user asks to move/organize a specific title.
|
||||
@@ -14,14 +14,9 @@ trigger:
|
||||
|
||||
tools:
|
||||
- list_folder
|
||||
- analyze_release
|
||||
- probe_media
|
||||
- find_media_imdb_id
|
||||
- resolve_season_destination
|
||||
- resolve_episode_destination
|
||||
- resolve_movie_destination
|
||||
- resolve_series_destination
|
||||
- move_to_destination
|
||||
- resolve_destination
|
||||
- move_media
|
||||
- manage_subtitles
|
||||
- create_seed_links
|
||||
|
||||
@@ -39,31 +34,22 @@ steps:
|
||||
params:
|
||||
folder_type: download
|
||||
|
||||
- id: analyze
|
||||
tool: analyze_release
|
||||
description: >
|
||||
Parse the release name to detect media_type (movie / tv_season /
|
||||
tv_episode / tv_complete) and extract season/episode info.
|
||||
|
||||
- id: identify_media
|
||||
tool: find_media_imdb_id
|
||||
description: Confirm canonical title and year via TMDB.
|
||||
description: Confirm title, type (series/movie), and metadata via TMDB.
|
||||
|
||||
- id: resolve_destination
|
||||
tool: resolve_destination
|
||||
description: >
|
||||
Call the resolver that matches media_type from analyze_release:
|
||||
movie → resolve_movie_destination
|
||||
tv_season → resolve_season_destination
|
||||
tv_episode → resolve_episode_destination
|
||||
tv_complete → resolve_series_destination
|
||||
If the resolver returns needs_clarification, ask the user and
|
||||
re-call with confirmed_folder.
|
||||
Compute the correct destination path in the library.
|
||||
Uses the release name + TMDB metadata to build folder and file names.
|
||||
If multiple series folders exist for this title, returns
|
||||
needs_clarification and the user must pick one (re-call with confirmed_folder).
|
||||
|
||||
- id: move_file
|
||||
tool: move_to_destination
|
||||
tool: move_media
|
||||
description: >
|
||||
Move the video file/folder to the destination returned by the
|
||||
resolver above.
|
||||
Move the video file to library_file returned by resolve_destination.
|
||||
|
||||
- id: handle_subtitles
|
||||
tool: manage_subtitles
|
||||
@@ -77,7 +63,7 @@ steps:
|
||||
question: "Do you want to keep seeding this torrent?"
|
||||
answers:
|
||||
"yes": { next_step: create_seed_links }
|
||||
"no": { next_step: end }
|
||||
"no": { next_step: update_library }
|
||||
|
||||
- id: create_seed_links
|
||||
tool: create_seed_links
|
||||
@@ -86,6 +72,10 @@ steps:
|
||||
and copy all remaining files from the original download folder
|
||||
(subs, nfo, jpg, …) so the torrent stays complete for seeding.
|
||||
|
||||
- id: update_library
|
||||
memory_write: Library
|
||||
description: Add the entry to the LTM library after a successful move.
|
||||
|
||||
naming_convention:
|
||||
# Resolved by domain entities (Movie, Episode) — not hardcoded here
|
||||
tv_show: "{title}/Season {season:02d}/{title}.S{season:02d}E{episode:02d}.{ext}"
|
||||
+1
-19
@@ -37,21 +37,6 @@ logger.info(f"Memory context initialized (path: {memory_path})")
|
||||
llm_provider = settings.default_llm_provider.lower()
|
||||
|
||||
|
||||
class _UnconfiguredLLM:
|
||||
"""Placeholder LLM used when no provider could be configured at import time.
|
||||
|
||||
Importing the FastAPI app must not fail just because credentials are
|
||||
absent (e.g. during test collection). Any actual call surfaces a clear
|
||||
503 error at request time via the handlers below.
|
||||
"""
|
||||
|
||||
def __init__(self, reason: str):
|
||||
self.reason = reason
|
||||
|
||||
def complete(self, *args, **kwargs):
|
||||
raise LLMAPIError(f"LLM is not configured: {self.reason}")
|
||||
|
||||
|
||||
try:
|
||||
if llm_provider == "local":
|
||||
logger.info("Using local Ollama LLM")
|
||||
@@ -64,11 +49,8 @@ try:
|
||||
else:
|
||||
raise ValueError(f"Unknown LLM provider: {llm_provider}")
|
||||
except LLMConfigurationError as e:
|
||||
# Degrade gracefully: keep the app importable so tests can patch agent.step
|
||||
# and so missing credentials surface as a 503 at the endpoint, not as an
|
||||
# import error.
|
||||
logger.error(f"Failed to initialize LLM: {e}")
|
||||
llm = _UnconfiguredLLM(str(e))
|
||||
raise
|
||||
|
||||
# Initialize agent
|
||||
agent = Agent(
|
||||
|
||||
@@ -12,16 +12,7 @@ from .dto import (
|
||||
from .list_folder import ListFolderUseCase
|
||||
from .manage_subtitles import ManageSubtitlesUseCase
|
||||
from .move_media import MoveMediaUseCase
|
||||
from .resolve_destination import (
|
||||
ResolvedEpisodeDestination,
|
||||
ResolvedMovieDestination,
|
||||
ResolvedSeasonDestination,
|
||||
ResolvedSeriesDestination,
|
||||
resolve_episode_destination,
|
||||
resolve_movie_destination,
|
||||
resolve_season_destination,
|
||||
resolve_series_destination,
|
||||
)
|
||||
from .resolve_destination import ResolveDestinationUseCase, ResolvedDestination
|
||||
from .set_folder_path import SetFolderPathUseCase
|
||||
|
||||
__all__ = [
|
||||
@@ -30,14 +21,8 @@ __all__ = [
|
||||
"CreateSeedLinksUseCase",
|
||||
"MoveMediaUseCase",
|
||||
"ManageSubtitlesUseCase",
|
||||
"ResolvedSeasonDestination",
|
||||
"ResolvedEpisodeDestination",
|
||||
"ResolvedMovieDestination",
|
||||
"ResolvedSeriesDestination",
|
||||
"resolve_season_destination",
|
||||
"resolve_episode_destination",
|
||||
"resolve_movie_destination",
|
||||
"resolve_series_destination",
|
||||
"ResolveDestinationUseCase",
|
||||
"ResolvedDestination",
|
||||
"SetFolderPathResponse",
|
||||
"ListFolderResponse",
|
||||
"CreateSeedLinksResponse",
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from dataclasses import dataclass, field
|
||||
|
||||
|
||||
@dataclass
|
||||
@@ -88,11 +88,7 @@ class PlacedSubtitle:
|
||||
filename: str
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return {
|
||||
"source": self.source,
|
||||
"destination": self.destination,
|
||||
"filename": self.filename,
|
||||
}
|
||||
return {"source": self.source, "destination": self.destination, "filename": self.filename}
|
||||
|
||||
|
||||
@dataclass
|
||||
@@ -102,7 +98,7 @@ class UnresolvedTrack:
|
||||
raw_tokens: list[str]
|
||||
file_path: str | None = None
|
||||
file_size_kb: float | None = None
|
||||
reason: str = "" # "unknown_language" | "low_confidence"
|
||||
reason: str = "" # "unknown_language" | "low_confidence"
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return {
|
||||
@@ -117,8 +113,8 @@ class UnresolvedTrack:
|
||||
class AvailableSubtitle:
|
||||
"""One subtitle track available on an embedded media item."""
|
||||
|
||||
language: str # ISO 639-2 code
|
||||
subtitle_type: str # "standard" | "sdh" | "forced" | "unknown"
|
||||
language: str # ISO 639-2 code
|
||||
subtitle_type: str # "standard" | "sdh" | "forced" | "unknown"
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return {"language": self.language, "type": self.subtitle_type}
|
||||
@@ -128,12 +124,12 @@ class AvailableSubtitle:
|
||||
class ManageSubtitlesResponse:
|
||||
"""Response from the manage_subtitles use case."""
|
||||
|
||||
status: str # "ok" | "needs_clarification" | "error"
|
||||
status: str # "ok" | "needs_clarification" | "error"
|
||||
video_path: str | None = None
|
||||
placed: list[PlacedSubtitle] | None = None
|
||||
skipped_count: int = 0
|
||||
unresolved: list[UnresolvedTrack] | None = None
|
||||
available: list[AvailableSubtitle] | None = None # embedded tracks summary
|
||||
available: list[AvailableSubtitle] | None = None # embedded tracks summary
|
||||
error: str | None = None
|
||||
message: str | None = None
|
||||
|
||||
|
||||
@@ -4,31 +4,20 @@ import logging
|
||||
from pathlib import Path
|
||||
|
||||
from alfred.domain.shared.value_objects import ImdbId
|
||||
from alfred.domain.subtitles.entities import SubtitleScanResult
|
||||
from alfred.domain.subtitles.entities import SubtitleTrack
|
||||
from alfred.domain.subtitles.knowledge.base import SubtitleKnowledgeBase
|
||||
from alfred.domain.subtitles.knowledge.loader import KnowledgeLoader
|
||||
from alfred.domain.subtitles.services.identifier import SubtitleIdentifier
|
||||
from alfred.domain.subtitles.services.matcher import SubtitleMatcher
|
||||
from alfred.domain.subtitles.services.pattern_detector import PatternDetector
|
||||
from alfred.application.subtitles.placer import (
|
||||
PlacedTrack,
|
||||
SubtitlePlacer,
|
||||
_build_dest_name,
|
||||
)
|
||||
from alfred.domain.subtitles.services.placer import PlacedTrack, SubtitlePlacer
|
||||
from alfred.domain.subtitles.services.utils import available_subtitles
|
||||
from alfred.domain.subtitles.value_objects import ScanStrategy
|
||||
from alfred.infrastructure.filesystem.scanner import PathlibFilesystemScanner
|
||||
from alfred.infrastructure.knowledge.subtitles.base import SubtitleKnowledgeBase
|
||||
from alfred.infrastructure.knowledge.subtitles.loader import KnowledgeLoader
|
||||
from alfred.infrastructure.persistence.context import get_memory
|
||||
from alfred.infrastructure.probe.ffprobe_prober import FfprobeMediaProber
|
||||
from alfred.infrastructure.subtitle.metadata_store import SubtitleMetadataStore
|
||||
from alfred.infrastructure.subtitle.rule_repository import RuleSetRepository
|
||||
|
||||
from .dto import (
|
||||
AvailableSubtitle,
|
||||
ManageSubtitlesResponse,
|
||||
PlacedSubtitle,
|
||||
UnresolvedTrack,
|
||||
)
|
||||
from .dto import AvailableSubtitle, ManageSubtitlesResponse, PlacedSubtitle, UnresolvedTrack
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
@@ -80,12 +69,11 @@ class ManageSubtitlesUseCase:
|
||||
season: int | None = None,
|
||||
episode: int | None = None,
|
||||
confirmed_pattern_id: str | None = None,
|
||||
dry_run: bool = False,
|
||||
) -> ManageSubtitlesResponse:
|
||||
source_path = Path(source_video)
|
||||
dest_path = Path(destination_video)
|
||||
|
||||
if not source_path.exists() and not source_path.parent.exists():
|
||||
if not source_path.exists():
|
||||
return ManageSubtitlesResponse(
|
||||
status="error",
|
||||
error="source_not_found",
|
||||
@@ -93,21 +81,13 @@ class ManageSubtitlesUseCase:
|
||||
)
|
||||
|
||||
kb = SubtitleKnowledgeBase(KnowledgeLoader())
|
||||
prober = FfprobeMediaProber()
|
||||
scanner = PathlibFilesystemScanner()
|
||||
library_root = _infer_library_root(dest_path, media_type)
|
||||
store = SubtitleMetadataStore(library_root)
|
||||
repo = RuleSetRepository(library_root)
|
||||
|
||||
# --- Pattern resolution ---
|
||||
pattern = self._resolve_pattern(
|
||||
kb,
|
||||
prober,
|
||||
scanner,
|
||||
store,
|
||||
source_path,
|
||||
confirmed_pattern_id,
|
||||
release_group,
|
||||
kb, store, source_path, confirmed_pattern_id, release_group
|
||||
)
|
||||
if pattern is None:
|
||||
return ManageSubtitlesResponse(
|
||||
@@ -118,7 +98,7 @@ class ManageSubtitlesUseCase:
|
||||
|
||||
# --- Identify ---
|
||||
media_id = _to_imdb_id(imdb_id)
|
||||
identifier = SubtitleIdentifier(kb, prober, scanner)
|
||||
identifier = SubtitleIdentifier(kb)
|
||||
metadata = identifier.identify(
|
||||
video_path=source_path,
|
||||
pattern=pattern,
|
||||
@@ -128,9 +108,7 @@ class ManageSubtitlesUseCase:
|
||||
)
|
||||
|
||||
if metadata.total_count == 0:
|
||||
logger.info(
|
||||
f"ManageSubtitles: no subtitle tracks found for {source_path.name}"
|
||||
)
|
||||
logger.info(f"ManageSubtitles: no subtitle tracks found for {source_path.name}")
|
||||
return ManageSubtitlesResponse(
|
||||
status="ok",
|
||||
video_path=destination_video,
|
||||
@@ -163,7 +141,7 @@ class ManageSubtitlesUseCase:
|
||||
subtitle_prefs = memory.ltm.subtitle_preferences
|
||||
except Exception:
|
||||
pass
|
||||
rules = repo.load(release_group, subtitle_prefs).resolve(kb.default_rules())
|
||||
rules = repo.load(release_group, subtitle_prefs).resolve()
|
||||
matcher = SubtitleMatcher()
|
||||
matched, unresolved = matcher.match(metadata.external_tracks, rules)
|
||||
|
||||
@@ -186,30 +164,6 @@ class ManageSubtitlesUseCase:
|
||||
skipped_count=metadata.total_count,
|
||||
)
|
||||
|
||||
# --- Dry run: skip placement ---
|
||||
if dry_run:
|
||||
placed_dtos = []
|
||||
for t in matched:
|
||||
if not t.file_path:
|
||||
continue
|
||||
try:
|
||||
filename = _build_dest_name(t, dest_path.stem)
|
||||
except ValueError:
|
||||
continue
|
||||
placed_dtos.append(
|
||||
PlacedSubtitle(
|
||||
source=str(t.file_path),
|
||||
destination=str(dest_path.parent / filename),
|
||||
filename=filename,
|
||||
)
|
||||
)
|
||||
return ManageSubtitlesResponse(
|
||||
status="ok",
|
||||
video_path=destination_video,
|
||||
placed=placed_dtos,
|
||||
skipped_count=0,
|
||||
)
|
||||
|
||||
# --- Place ---
|
||||
placer = SubtitlePlacer()
|
||||
place_result = placer.place(matched, dest_path)
|
||||
@@ -238,8 +192,6 @@ class ManageSubtitlesUseCase:
|
||||
def _resolve_pattern(
|
||||
self,
|
||||
kb: SubtitleKnowledgeBase,
|
||||
prober: FfprobeMediaProber,
|
||||
scanner: PathlibFilesystemScanner,
|
||||
store: SubtitleMetadataStore,
|
||||
source_path: Path,
|
||||
confirmed_pattern_id: str | None,
|
||||
@@ -262,7 +214,7 @@ class ManageSubtitlesUseCase:
|
||||
|
||||
# 3. Auto-detect
|
||||
release_root = source_path.parent
|
||||
detector = PatternDetector(kb, prober, scanner)
|
||||
detector = PatternDetector(kb)
|
||||
result = detector.detect(release_root, source_path)
|
||||
|
||||
if result["detected"] and result["confidence"] >= 0.6:
|
||||
@@ -277,9 +229,7 @@ class ManageSubtitlesUseCase:
|
||||
return kb.pattern("adjacent")
|
||||
|
||||
|
||||
def _to_unresolved_dto(
|
||||
track: SubtitleScanResult, min_confidence: float = 0.7
|
||||
) -> UnresolvedTrack:
|
||||
def _to_unresolved_dto(track: SubtitleTrack, min_confidence: float = 0.7) -> UnresolvedTrack:
|
||||
reason = "unknown_language" if track.language is None else "low_confidence"
|
||||
return UnresolvedTrack(
|
||||
raw_tokens=track.raw_tokens,
|
||||
@@ -291,10 +241,10 @@ def _to_unresolved_dto(
|
||||
|
||||
def _pair_placed_with_tracks(
|
||||
placed: list[PlacedTrack],
|
||||
tracks: list[SubtitleScanResult],
|
||||
) -> list[tuple[PlacedTrack, SubtitleScanResult]]:
|
||||
tracks: list[SubtitleTrack],
|
||||
) -> list[tuple[PlacedTrack, SubtitleTrack]]:
|
||||
"""
|
||||
Pair each PlacedTrack with its originating SubtitleScanResult by source path.
|
||||
Pair each PlacedTrack with its originating SubtitleTrack by source path.
|
||||
Falls back to positional matching if paths don't align.
|
||||
"""
|
||||
track_by_path = {t.file_path: t for t in tracks if t.file_path}
|
||||
@@ -302,7 +252,7 @@ def _pair_placed_with_tracks(
|
||||
for p in placed:
|
||||
track = track_by_path.get(p.source)
|
||||
if track is None and tracks:
|
||||
track = tracks[0] # positional fallback
|
||||
track = tracks[0] # positional fallback
|
||||
if track:
|
||||
pairs.append((p, track))
|
||||
return pairs
|
||||
|
||||
@@ -1,156 +1,62 @@
|
||||
"""
|
||||
Destination resolution — compute library paths for releases.
|
||||
ResolveDestinationUseCase — compute the library destination path for a release.
|
||||
|
||||
Four distinct use cases, one per release type:
|
||||
- resolve_season_destination : season pack (folder move)
|
||||
- resolve_episode_destination : single episode (file move)
|
||||
- resolve_movie_destination : movie (file move)
|
||||
- resolve_series_destination : complete series multi-season pack (folder move)
|
||||
|
||||
Each returns a dedicated DTO with only the fields that make sense for that type.
|
||||
|
||||
These use cases follow Option B of the snapshot-VO design: ``ParsedRelease``
|
||||
arrives with ``title_sanitized`` already computed, and TMDB-supplied strings
|
||||
are sanitized **at the use-case boundary** (here) before being passed into
|
||||
``ParsedRelease`` builder methods. The builders themselves perform no I/O and
|
||||
no sanitization.
|
||||
Steps:
|
||||
1. Parse the release name
|
||||
2. Look up TMDB for title + year (+ episode title if single episode)
|
||||
3. Scan the library for an existing series folder
|
||||
4. Apply group-conflict rules
|
||||
5. Return the computed paths (or needs_clarification if ambiguous)
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from dataclasses import dataclass
|
||||
import re
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
|
||||
from alfred.application.release import inspect_release
|
||||
from alfred.domain.release import parse_release
|
||||
from alfred.domain.release.ports import ReleaseKnowledge
|
||||
from alfred.domain.release.value_objects import ParsedRelease
|
||||
from alfred.domain.shared.ports import MediaProber
|
||||
from alfred.domain.media.release_parser import ParsedRelease, parse_release
|
||||
from alfred.infrastructure.persistence import get_memory
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _resolve_parsed(
|
||||
release_name: str,
|
||||
source_path: str | None,
|
||||
kb: ReleaseKnowledge,
|
||||
prober: MediaProber,
|
||||
) -> ParsedRelease:
|
||||
"""Pick the right entry point depending on whether we have a path.
|
||||
|
||||
When ``source_path`` is provided and points to something that exists,
|
||||
we run the full inspection pipeline so probe data can refresh tech
|
||||
fields (which feed every filename builder). Otherwise we fall back
|
||||
to a parse-only path — same behavior as before.
|
||||
"""
|
||||
if source_path:
|
||||
path = Path(source_path)
|
||||
if path.exists():
|
||||
return inspect_release(release_name, path, kb, prober).parsed
|
||||
parsed, _ = parse_release(release_name, kb)
|
||||
return parsed
|
||||
# Characters forbidden on Windows filesystems (served via NFS)
|
||||
_WIN_FORBIDDEN = re.compile(r'[?:*"<>|\\]')
|
||||
|
||||
|
||||
def _find_existing_tvshow_folders(
|
||||
tv_root: Path, tmdb_title_safe: str, tmdb_year: int
|
||||
) -> list[str]:
|
||||
"""Return folder names in tv_root that match title + year prefix."""
|
||||
if not tv_root.exists():
|
||||
return []
|
||||
clean_title = tmdb_title_safe.replace(" ", ".")
|
||||
prefix = f"{clean_title}.{tmdb_year}".lower()
|
||||
return sorted(
|
||||
entry.name
|
||||
for entry in tv_root.iterdir()
|
||||
if entry.is_dir() and entry.name.lower().startswith(prefix)
|
||||
)
|
||||
|
||||
|
||||
def _get_tv_root() -> Path | None:
|
||||
memory = get_memory()
|
||||
tv_root = memory.ltm.library_paths.get("tv_show")
|
||||
return Path(tv_root) if tv_root else None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Internal sentinel + series-folder resolver (shared by the 3 TV use cases)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@dataclass
|
||||
class _Clarification:
|
||||
"""Module-private sentinel signalling that user input is needed."""
|
||||
|
||||
question: str
|
||||
options: list[str]
|
||||
|
||||
|
||||
def _resolve_series_folder(
|
||||
tv_root: Path,
|
||||
tmdb_title: str,
|
||||
tmdb_title_safe: str,
|
||||
tmdb_year: int,
|
||||
computed_name: str,
|
||||
confirmed_folder: str | None,
|
||||
) -> tuple[str, bool] | _Clarification:
|
||||
"""
|
||||
Resolve which series folder to use.
|
||||
|
||||
Returns:
|
||||
(folder_name, is_new) if resolved unambiguously,
|
||||
_Clarification(question, options) if the caller must ask the user.
|
||||
"""
|
||||
if confirmed_folder:
|
||||
return confirmed_folder, not (tv_root / confirmed_folder).exists()
|
||||
|
||||
existing = _find_existing_tvshow_folders(tv_root, tmdb_title_safe, tmdb_year)
|
||||
|
||||
if not existing:
|
||||
return computed_name, True
|
||||
|
||||
if len(existing) == 1 and existing[0] == computed_name:
|
||||
return existing[0], False
|
||||
|
||||
options = existing + ([computed_name] if computed_name not in existing else [])
|
||||
return _Clarification(
|
||||
question=(
|
||||
f"Un dossier série existe déjà pour '{tmdb_title}' "
|
||||
f"mais son nom diffère du nom calculé ({computed_name}). "
|
||||
f"Lequel utiliser ?"
|
||||
),
|
||||
options=options,
|
||||
)
|
||||
def _sanitise(text: str) -> str:
|
||||
return _WIN_FORBIDDEN.sub("", text)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# DTOs
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@dataclass
|
||||
class _ResolvedDestinationBase:
|
||||
"""
|
||||
Shared shape across all resolution DTOs.
|
||||
class ResolvedDestination:
|
||||
"""All computed paths for a release, ready to hand to move_media."""
|
||||
|
||||
Holds the status flag and the fields used in non-ok states
|
||||
(error / needs_clarification). Subclasses add their own ok-state fields
|
||||
and a to_dict() that delegates the non-ok cases via _base_dict().
|
||||
"""
|
||||
status: str # "ok" | "needs_clarification" | "error"
|
||||
|
||||
status: str # "ok" | "needs_clarification" | "error"
|
||||
# Populated on "ok"
|
||||
library_file: str | None = None # absolute path of the destination video file
|
||||
series_folder: str | None = None # absolute path of the series root folder
|
||||
season_folder: str | None = None # absolute path of the season subfolder
|
||||
series_folder_name: str | None = None # just the folder name (for display)
|
||||
season_folder_name: str | None = None
|
||||
filename: str | None = None
|
||||
is_new_series_folder: bool = False # True if we're creating the folder
|
||||
|
||||
# needs_clarification
|
||||
# Populated on "needs_clarification"
|
||||
question: str | None = None
|
||||
options: list[str] | None = None
|
||||
options: list[str] | None = None # existing group folder names to pick from
|
||||
|
||||
# error
|
||||
# Populated on "error"
|
||||
error: str | None = None
|
||||
message: str | None = None
|
||||
|
||||
def _base_dict(self) -> dict | None:
|
||||
"""Return the dict for error/needs_clarification, or None for ok."""
|
||||
def to_dict(self) -> dict:
|
||||
if self.status == "error":
|
||||
return {"status": self.status, "error": self.error, "message": self.message}
|
||||
if self.status == "needs_clarification":
|
||||
@@ -159,48 +65,11 @@ class _ResolvedDestinationBase:
|
||||
"question": self.question,
|
||||
"options": self.options or [],
|
||||
}
|
||||
return None
|
||||
|
||||
|
||||
@dataclass
|
||||
class ResolvedSeasonDestination(_ResolvedDestinationBase):
|
||||
"""Paths for a season pack — folder move, no individual file paths."""
|
||||
|
||||
series_folder: str | None = None
|
||||
season_folder: str | None = None
|
||||
series_folder_name: str | None = None
|
||||
season_folder_name: str | None = None
|
||||
is_new_series_folder: bool = False
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return self._base_dict() or {
|
||||
return {
|
||||
"status": self.status,
|
||||
"series_folder": self.series_folder,
|
||||
"season_folder": self.season_folder,
|
||||
"series_folder_name": self.series_folder_name,
|
||||
"season_folder_name": self.season_folder_name,
|
||||
"is_new_series_folder": self.is_new_series_folder,
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class ResolvedEpisodeDestination(_ResolvedDestinationBase):
|
||||
"""Paths for a single episode — file move."""
|
||||
|
||||
series_folder: str | None = None
|
||||
season_folder: str | None = None
|
||||
library_file: str | None = None # full path to destination .mkv
|
||||
series_folder_name: str | None = None
|
||||
season_folder_name: str | None = None
|
||||
filename: str | None = None
|
||||
is_new_series_folder: bool = False
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return self._base_dict() or {
|
||||
"status": self.status,
|
||||
"series_folder": self.series_folder,
|
||||
"season_folder": self.season_folder,
|
||||
"library_file": self.library_file,
|
||||
"series_folder": self.series_folder,
|
||||
"season_folder": self.season_folder,
|
||||
"series_folder_name": self.series_folder_name,
|
||||
"season_folder_name": self.season_folder_name,
|
||||
"filename": self.filename,
|
||||
@@ -208,257 +77,170 @@ class ResolvedEpisodeDestination(_ResolvedDestinationBase):
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class ResolvedMovieDestination(_ResolvedDestinationBase):
|
||||
"""Paths for a movie — file move."""
|
||||
# ---------------------------------------------------------------------------
|
||||
# Use case
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
movie_folder: str | None = None
|
||||
library_file: str | None = None
|
||||
movie_folder_name: str | None = None
|
||||
filename: str | None = None
|
||||
is_new_folder: bool = False
|
||||
class ResolveDestinationUseCase:
|
||||
"""
|
||||
Compute the full destination path for a media file being organised.
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return self._base_dict() or {
|
||||
"status": self.status,
|
||||
"movie_folder": self.movie_folder,
|
||||
"library_file": self.library_file,
|
||||
"movie_folder_name": self.movie_folder_name,
|
||||
"filename": self.filename,
|
||||
"is_new_folder": self.is_new_folder,
|
||||
}
|
||||
The caller provides:
|
||||
- release_name: the raw release folder/file name
|
||||
- source_file: path to the actual video file (to get extension)
|
||||
- tmdb_title: canonical title from TMDB
|
||||
- tmdb_year: release year from TMDB
|
||||
- tmdb_episode_title: episode title from TMDB (None for movies / season packs)
|
||||
- confirmed_folder: if the user already answered needs_clarification, pass
|
||||
the chosen folder name here to skip the check
|
||||
|
||||
Returns a ResolvedDestination.
|
||||
"""
|
||||
|
||||
@dataclass
|
||||
class ResolvedSeriesDestination(_ResolvedDestinationBase):
|
||||
"""Paths for a complete multi-season series pack — folder move."""
|
||||
def execute(
|
||||
self,
|
||||
release_name: str,
|
||||
source_file: str,
|
||||
tmdb_title: str,
|
||||
tmdb_year: int,
|
||||
tmdb_episode_title: str | None = None,
|
||||
confirmed_folder: str | None = None,
|
||||
) -> ResolvedDestination:
|
||||
parsed = parse_release(release_name)
|
||||
ext = Path(source_file).suffix # ".mkv"
|
||||
|
||||
series_folder: str | None = None
|
||||
series_folder_name: str | None = None
|
||||
is_new_series_folder: bool = False
|
||||
if parsed.is_movie:
|
||||
return self._resolve_movie(parsed, tmdb_title, tmdb_year, ext)
|
||||
return self._resolve_tvshow(
|
||||
parsed, tmdb_title, tmdb_year, tmdb_episode_title, ext, confirmed_folder
|
||||
)
|
||||
|
||||
def to_dict(self) -> dict:
|
||||
return self._base_dict() or {
|
||||
"status": self.status,
|
||||
"series_folder": self.series_folder,
|
||||
"series_folder_name": self.series_folder_name,
|
||||
"is_new_series_folder": self.is_new_series_folder,
|
||||
}
|
||||
# ------------------------------------------------------------------
|
||||
# Movie
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
def _resolve_movie(
|
||||
self, parsed: ParsedRelease, tmdb_title: str, tmdb_year: int, ext: str
|
||||
) -> ResolvedDestination:
|
||||
memory = get_memory()
|
||||
movies_root = memory.ltm.library_paths.get("movie")
|
||||
if not movies_root:
|
||||
return ResolvedDestination(
|
||||
status="error",
|
||||
error="library_not_set",
|
||||
message="Movie library path is not configured.",
|
||||
)
|
||||
|
||||
folder_name = _sanitise(parsed.movie_folder_name(tmdb_title, tmdb_year))
|
||||
filename = _sanitise(parsed.movie_filename(tmdb_title, tmdb_year, ext))
|
||||
|
||||
folder_path = Path(movies_root) / folder_name
|
||||
file_path = folder_path / filename
|
||||
|
||||
return ResolvedDestination(
|
||||
status="ok",
|
||||
library_file=str(file_path),
|
||||
series_folder=str(folder_path),
|
||||
series_folder_name=folder_name,
|
||||
filename=filename,
|
||||
is_new_series_folder=not folder_path.exists(),
|
||||
)
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# TV show
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
def _resolve_tvshow(
|
||||
self,
|
||||
parsed: ParsedRelease,
|
||||
tmdb_title: str,
|
||||
tmdb_year: int,
|
||||
tmdb_episode_title: str | None,
|
||||
ext: str,
|
||||
confirmed_folder: str | None,
|
||||
) -> ResolvedDestination:
|
||||
memory = get_memory()
|
||||
tv_root = memory.ltm.library_paths.get("tv_show")
|
||||
if not tv_root:
|
||||
return ResolvedDestination(
|
||||
status="error",
|
||||
error="library_not_set",
|
||||
message="TV show library path is not configured.",
|
||||
)
|
||||
|
||||
tv_root_path = Path(tv_root)
|
||||
|
||||
# --- Find existing series folders for this title ---
|
||||
existing = _find_existing_series_folders(tv_root_path, tmdb_title, tmdb_year)
|
||||
|
||||
# --- Determine series folder name ---
|
||||
if confirmed_folder:
|
||||
series_folder_name = confirmed_folder
|
||||
is_new = not (tv_root_path / confirmed_folder).exists()
|
||||
elif len(existing) == 0:
|
||||
# No existing folder — create with release group
|
||||
series_folder_name = _sanitise(parsed.show_folder_name(tmdb_title, tmdb_year))
|
||||
is_new = True
|
||||
elif len(existing) == 1:
|
||||
# Exactly one match — use it regardless of group
|
||||
series_folder_name = existing[0]
|
||||
is_new = False
|
||||
else:
|
||||
# Multiple folders — ask user
|
||||
return ResolvedDestination(
|
||||
status="needs_clarification",
|
||||
question=(
|
||||
f"Multiple folders found for '{tmdb_title}' in your library. "
|
||||
f"Which one should I use for this release ({parsed.group})?"
|
||||
),
|
||||
options=existing,
|
||||
)
|
||||
|
||||
# --- Build paths ---
|
||||
season_folder_name = parsed.season_folder_name()
|
||||
filename = _sanitise(
|
||||
parsed.episode_filename(tmdb_episode_title, ext)
|
||||
if not parsed.is_season_pack
|
||||
else parsed.season_folder_name() + ext
|
||||
)
|
||||
|
||||
series_path = tv_root_path / series_folder_name
|
||||
season_path = series_path / season_folder_name
|
||||
file_path = season_path / filename
|
||||
|
||||
return ResolvedDestination(
|
||||
status="ok",
|
||||
library_file=str(file_path),
|
||||
series_folder=str(series_path),
|
||||
season_folder=str(season_path),
|
||||
series_folder_name=series_folder_name,
|
||||
season_folder_name=season_folder_name,
|
||||
filename=filename,
|
||||
is_new_series_folder=is_new,
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Use cases
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def resolve_season_destination(
|
||||
release_name: str,
|
||||
tmdb_title: str,
|
||||
tmdb_year: int,
|
||||
kb: ReleaseKnowledge,
|
||||
prober: MediaProber,
|
||||
confirmed_folder: str | None = None,
|
||||
source_path: str | None = None,
|
||||
) -> ResolvedSeasonDestination:
|
||||
def _find_existing_series_folders(tv_root: Path, tmdb_title: str, tmdb_year: int) -> list[str]:
|
||||
"""
|
||||
Compute destination paths for a season pack.
|
||||
Return names of folders in tv_root that match the given title + year.
|
||||
|
||||
Returns series_folder + season_folder. No file paths — the whole
|
||||
source folder is moved as-is into season_folder.
|
||||
|
||||
When ``source_path`` points to the release on disk, the parser is
|
||||
augmented with ffprobe data so tech tokens missing from the release
|
||||
name (quality / codec) end up in the folder names.
|
||||
Matching is loose: normalised title (dots, no special chars) + year must
|
||||
appear at the start of the folder name.
|
||||
"""
|
||||
tv_root = _get_tv_root()
|
||||
if not tv_root:
|
||||
return ResolvedSeasonDestination(
|
||||
status="error",
|
||||
error="library_not_set",
|
||||
message="TV show library path is not configured.",
|
||||
)
|
||||
if not tv_root.exists():
|
||||
return []
|
||||
|
||||
parsed = _resolve_parsed(release_name, source_path, kb, prober)
|
||||
tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
|
||||
computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)
|
||||
# Build a normalised prefix to match against: "Oz.1997"
|
||||
clean_title = _sanitise(tmdb_title).replace(" ", ".")
|
||||
prefix = f"{clean_title}.{tmdb_year}".lower()
|
||||
|
||||
resolved = _resolve_series_folder(
|
||||
tv_root, tmdb_title, tmdb_title_safe, tmdb_year, computed_name, confirmed_folder
|
||||
)
|
||||
if isinstance(resolved, _Clarification):
|
||||
return ResolvedSeasonDestination(
|
||||
status="needs_clarification",
|
||||
question=resolved.question,
|
||||
options=resolved.options,
|
||||
)
|
||||
matches = []
|
||||
for entry in tv_root.iterdir():
|
||||
if entry.is_dir() and entry.name.lower().startswith(prefix):
|
||||
matches.append(entry.name)
|
||||
|
||||
series_folder_name, is_new = resolved
|
||||
season_folder_name = parsed.season_folder_name()
|
||||
series_path = tv_root / series_folder_name
|
||||
season_path = series_path / season_folder_name
|
||||
|
||||
return ResolvedSeasonDestination(
|
||||
status="ok",
|
||||
series_folder=str(series_path),
|
||||
season_folder=str(season_path),
|
||||
series_folder_name=series_folder_name,
|
||||
season_folder_name=season_folder_name,
|
||||
is_new_series_folder=is_new,
|
||||
)
|
||||
|
||||
|
||||
def resolve_episode_destination(
|
||||
release_name: str,
|
||||
source_file: str,
|
||||
tmdb_title: str,
|
||||
tmdb_year: int,
|
||||
kb: ReleaseKnowledge,
|
||||
prober: MediaProber,
|
||||
tmdb_episode_title: str | None = None,
|
||||
confirmed_folder: str | None = None,
|
||||
) -> ResolvedEpisodeDestination:
|
||||
"""
|
||||
Compute destination paths for a single episode file.
|
||||
|
||||
Returns series_folder + season_folder + library_file (full path to .mkv).
|
||||
``source_file`` doubles as the inspection target — when it exists,
|
||||
ffprobe enrichment refreshes tech tokens missing from the release name.
|
||||
"""
|
||||
tv_root = _get_tv_root()
|
||||
if not tv_root:
|
||||
return ResolvedEpisodeDestination(
|
||||
status="error",
|
||||
error="library_not_set",
|
||||
message="TV show library path is not configured.",
|
||||
)
|
||||
|
||||
parsed = _resolve_parsed(release_name, source_file, kb, prober)
|
||||
ext = Path(source_file).suffix
|
||||
tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
|
||||
tmdb_episode_title_safe = (
|
||||
kb.sanitize_for_fs(tmdb_episode_title) if tmdb_episode_title else None
|
||||
)
|
||||
computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)
|
||||
|
||||
resolved = _resolve_series_folder(
|
||||
tv_root, tmdb_title, tmdb_title_safe, tmdb_year, computed_name, confirmed_folder
|
||||
)
|
||||
if isinstance(resolved, _Clarification):
|
||||
return ResolvedEpisodeDestination(
|
||||
status="needs_clarification",
|
||||
question=resolved.question,
|
||||
options=resolved.options,
|
||||
)
|
||||
|
||||
series_folder_name, is_new = resolved
|
||||
season_folder_name = parsed.season_folder_name()
|
||||
filename = parsed.episode_filename(tmdb_episode_title_safe, ext)
|
||||
|
||||
series_path = tv_root / series_folder_name
|
||||
season_path = series_path / season_folder_name
|
||||
file_path = season_path / filename
|
||||
|
||||
return ResolvedEpisodeDestination(
|
||||
status="ok",
|
||||
series_folder=str(series_path),
|
||||
season_folder=str(season_path),
|
||||
library_file=str(file_path),
|
||||
series_folder_name=series_folder_name,
|
||||
season_folder_name=season_folder_name,
|
||||
filename=filename,
|
||||
is_new_series_folder=is_new,
|
||||
)
|
||||
|
||||
|
||||
def resolve_movie_destination(
|
||||
release_name: str,
|
||||
source_file: str,
|
||||
tmdb_title: str,
|
||||
tmdb_year: int,
|
||||
kb: ReleaseKnowledge,
|
||||
prober: MediaProber,
|
||||
) -> ResolvedMovieDestination:
|
||||
"""
|
||||
Compute destination paths for a movie file.
|
||||
|
||||
Returns movie_folder + library_file (full path to .mkv).
|
||||
``source_file`` doubles as the inspection target — when it exists,
|
||||
ffprobe enrichment refreshes tech tokens missing from the release name.
|
||||
"""
|
||||
memory = get_memory()
|
||||
movies_root = memory.ltm.library_paths.get("movie")
|
||||
if not movies_root:
|
||||
return ResolvedMovieDestination(
|
||||
status="error",
|
||||
error="library_not_set",
|
||||
message="Movie library path is not configured.",
|
||||
)
|
||||
|
||||
parsed = _resolve_parsed(release_name, source_file, kb, prober)
|
||||
ext = Path(source_file).suffix
|
||||
tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
|
||||
|
||||
folder_name = parsed.movie_folder_name(tmdb_title_safe, tmdb_year)
|
||||
filename = parsed.movie_filename(tmdb_title_safe, tmdb_year, ext)
|
||||
|
||||
folder_path = Path(movies_root) / folder_name
|
||||
file_path = folder_path / filename
|
||||
|
||||
return ResolvedMovieDestination(
|
||||
status="ok",
|
||||
movie_folder=str(folder_path),
|
||||
library_file=str(file_path),
|
||||
movie_folder_name=folder_name,
|
||||
filename=filename,
|
||||
is_new_folder=not folder_path.exists(),
|
||||
)
|
||||
|
||||
|
||||
def resolve_series_destination(
|
||||
release_name: str,
|
||||
tmdb_title: str,
|
||||
tmdb_year: int,
|
||||
kb: ReleaseKnowledge,
|
||||
prober: MediaProber,
|
||||
confirmed_folder: str | None = None,
|
||||
source_path: str | None = None,
|
||||
) -> ResolvedSeriesDestination:
|
||||
"""
|
||||
Compute destination path for a complete multi-season series pack.
|
||||
|
||||
Returns only series_folder — the whole pack lands directly inside it.
|
||||
|
||||
When ``source_path`` points to the release on disk, ffprobe
|
||||
enrichment refreshes tech tokens missing from the release name.
|
||||
"""
|
||||
tv_root = _get_tv_root()
|
||||
if not tv_root:
|
||||
return ResolvedSeriesDestination(
|
||||
status="error",
|
||||
error="library_not_set",
|
||||
message="TV show library path is not configured.",
|
||||
)
|
||||
|
||||
parsed = _resolve_parsed(release_name, source_path, kb, prober)
|
||||
tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
|
||||
computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)
|
||||
|
||||
resolved = _resolve_series_folder(
|
||||
tv_root, tmdb_title, tmdb_title_safe, tmdb_year, computed_name, confirmed_folder
|
||||
)
|
||||
if isinstance(resolved, _Clarification):
|
||||
return ResolvedSeriesDestination(
|
||||
status="needs_clarification",
|
||||
question=resolved.question,
|
||||
options=resolved.options,
|
||||
)
|
||||
|
||||
series_folder_name, is_new = resolved
|
||||
series_path = tv_root / series_folder_name
|
||||
|
||||
return ResolvedSeriesDestination(
|
||||
status="ok",
|
||||
series_folder=str(series_path),
|
||||
series_folder_name=series_folder_name,
|
||||
is_new_series_folder=is_new,
|
||||
)
|
||||
return sorted(matches)
|
||||
|
||||
@@ -1,20 +0,0 @@
|
||||
"""Release application layer — orchestrators sitting between domain
|
||||
parsing and infrastructure I/O.
|
||||
|
||||
Public surface:
|
||||
|
||||
- :func:`is_supported_video` / :func:`find_main_video` — pre-pipeline
|
||||
filesystem helpers (extension-only filtering, top-level video pick).
|
||||
- :func:`inspect_release` / :class:`InspectedResult` — full inspection
|
||||
pipeline combining parse + filesystem refinement + probe enrichment.
|
||||
"""
|
||||
|
||||
from .inspect import InspectedResult, inspect_release
|
||||
from .supported_media import find_main_video, is_supported_video
|
||||
|
||||
__all__ = [
|
||||
"InspectedResult",
|
||||
"find_main_video",
|
||||
"inspect_release",
|
||||
"is_supported_video",
|
||||
]
|
||||
@@ -1,67 +0,0 @@
|
||||
"""
|
||||
detect_media_type — filesystem-based media type refinement.
|
||||
|
||||
Enriches a ParsedRelease.media_type with evidence from the actual source path
|
||||
(file or folder). Called after parse_release() to produce a final classification.
|
||||
|
||||
Classification logic:
|
||||
1. If source_path is a file — check its extension directly.
|
||||
2. If source_path is a folder — collect all extensions inside (non-recursive
|
||||
for the first level, then recursive if nothing conclusive found).
|
||||
3. Decision:
|
||||
- Any non_video extension AND no video extension → "other"
|
||||
- Any video extension → keep parsed media_type ("movie" | "tv_show" | "unknown")
|
||||
- No conclusive extension found → keep parsed media_type as-is
|
||||
- Mixed (video + non_video) → "unknown"
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
from alfred.domain.release.ports import ReleaseKnowledge
|
||||
from alfred.domain.release.value_objects import ParsedRelease
|
||||
|
||||
|
||||
def detect_media_type(
|
||||
parsed: ParsedRelease, source_path: Path, kb: ReleaseKnowledge
|
||||
) -> str:
|
||||
"""
|
||||
Return a refined media_type string for the given source_path.
|
||||
|
||||
Does not mutate parsed — returns the new media_type value only.
|
||||
The caller is responsible for updating the ParsedRelease if needed.
|
||||
"""
|
||||
extensions = _collect_extensions(source_path)
|
||||
# Metadata extensions (.nfo, .srt, …) are always present alongside releases
|
||||
# and must not influence the type decision.
|
||||
conclusive = extensions - kb.metadata_extensions
|
||||
|
||||
has_video = bool(conclusive & kb.video_extensions)
|
||||
has_non_video = bool(conclusive & kb.non_video_extensions)
|
||||
|
||||
if has_video and has_non_video:
|
||||
return "unknown"
|
||||
if has_non_video and not has_video:
|
||||
return "other"
|
||||
if has_video:
|
||||
return parsed.media_type # trust token-level inference
|
||||
# No conclusive extension — trust token-level inference
|
||||
return parsed.media_type
|
||||
|
||||
|
||||
def _collect_extensions(path: Path) -> set[str]:
|
||||
"""Return the set of lowercase extensions found at path (file or folder)."""
|
||||
if not path.exists():
|
||||
return set()
|
||||
|
||||
if path.is_file():
|
||||
return {path.suffix.lower()}
|
||||
|
||||
# Folder — scan first level only
|
||||
exts: set[str] = set()
|
||||
for child in path.iterdir():
|
||||
if child.is_file():
|
||||
exts.add(child.suffix.lower())
|
||||
|
||||
return exts
|
||||
@@ -1,74 +0,0 @@
|
||||
"""enrich_from_probe — fill missing ParsedRelease fields from MediaInfo."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import replace
|
||||
|
||||
from alfred.domain.release.ports import ReleaseKnowledge
|
||||
from alfred.domain.release.value_objects import ParsedRelease
|
||||
from alfred.domain.shared.media import MediaInfo
|
||||
|
||||
|
||||
def enrich_from_probe(
|
||||
parsed: ParsedRelease, info: MediaInfo, kb: ReleaseKnowledge
|
||||
) -> ParsedRelease:
|
||||
"""
|
||||
Return a new ParsedRelease with None fields filled from ffprobe MediaInfo.
|
||||
|
||||
Only overwrites fields that are currently None — token-level values
|
||||
from the release name always take priority. ``ParsedRelease`` is
|
||||
frozen; this returns a new instance via :func:`dataclasses.replace`.
|
||||
|
||||
Translation tables (ffprobe codec name → scene token, channel count
|
||||
→ layout) live in ``kb.probe_mappings`` (loaded from
|
||||
``alfred/knowledge/release/probe_mappings.yaml``). When ffprobe
|
||||
reports a value with no mapping entry, the fallback is the uppercase
|
||||
raw value so unknown codecs still surface in a predictable form.
|
||||
"""
|
||||
mappings = kb.probe_mappings
|
||||
video_codec_map: dict[str, str] = mappings.get("video_codec", {})
|
||||
audio_codec_map: dict[str, str] = mappings.get("audio_codec", {})
|
||||
channel_map: dict[int, str] = mappings.get("audio_channels", {})
|
||||
|
||||
updates: dict[str, object] = {}
|
||||
|
||||
if parsed.quality is None and info.resolution:
|
||||
updates["quality"] = info.resolution
|
||||
|
||||
if parsed.codec is None and info.video_codec:
|
||||
updates["codec"] = video_codec_map.get(
|
||||
info.video_codec.lower(), info.video_codec.upper()
|
||||
)
|
||||
|
||||
# bit_depth: ffprobe exposes it via pix_fmt — not in MediaInfo yet, skip.
|
||||
|
||||
# Audio — use the default track, fallback to first
|
||||
default_track = next((t for t in info.audio_tracks if t.is_default), None)
|
||||
track = default_track or (info.audio_tracks[0] if info.audio_tracks else None)
|
||||
|
||||
if track:
|
||||
if parsed.audio_codec is None and track.codec:
|
||||
updates["audio_codec"] = audio_codec_map.get(
|
||||
track.codec.lower(), track.codec.upper()
|
||||
)
|
||||
|
||||
if parsed.audio_channels is None and track.channels:
|
||||
updates["audio_channels"] = channel_map.get(
|
||||
track.channels, f"{track.channels}ch"
|
||||
)
|
||||
|
||||
# Languages — merge ffprobe languages with token-level ones
|
||||
# "und" = undetermined, not useful
|
||||
if info.audio_languages:
|
||||
existing_upper = {lang.upper() for lang in parsed.languages}
|
||||
new_languages = list(parsed.languages)
|
||||
for lang in info.audio_languages:
|
||||
if lang.lower() != "und" and lang.upper() not in existing_upper:
|
||||
new_languages.append(lang)
|
||||
existing_upper.add(lang.upper())
|
||||
if len(new_languages) != len(parsed.languages):
|
||||
updates["languages"] = tuple(new_languages)
|
||||
|
||||
if not updates:
|
||||
return parsed
|
||||
return replace(parsed, **updates)
|
||||
@@ -1,193 +0,0 @@
|
||||
"""Release inspection orchestrator — the canonical "look at this thing"
|
||||
entry point.
|
||||
|
||||
``inspect_release`` is the single composition of the four layers we
|
||||
care about for a freshly-arrived release:
|
||||
|
||||
1. **Parse the name** — :func:`alfred.domain.release.services.parse_release`
|
||||
gives a ``ParsedRelease`` plus a ``ParseReport`` (confidence + road).
|
||||
2. **Pick the main video** — :func:`find_main_video` runs a top-level
|
||||
scan over the source path. If nothing qualifies the result still
|
||||
completes; downstream callers decide what to do with a videoless
|
||||
release.
|
||||
3. **Refine the media type** — :func:`detect_media_type` uses the
|
||||
on-disk extension mix to override any token-level guess (e.g. a
|
||||
bare ``.iso`` folder becomes ``"other"``). The refined value is
|
||||
patched onto ``parsed`` in place — same convention as
|
||||
``analyze_release`` had before.
|
||||
4. **Probe the video** — the injected :class:`MediaProber` fills in
|
||||
missing technical fields via :func:`enrich_from_probe`. Skipped
|
||||
when there is no main video or when ``media_type`` ended up in
|
||||
``{"unknown", "other"}`` (the probe would tell us nothing useful).
|
||||
|
||||
The return type is :class:`InspectedResult`, a frozen VO that bundles
|
||||
everything downstream callers need (``analyze_release`` tool,
|
||||
``resolve_destination``, future workflow stages) without forcing them
|
||||
to redo the same four calls.
|
||||
|
||||
Design notes:
|
||||
|
||||
- **Application layer.** This module touches both domain
|
||||
(``parse_release``) and infrastructure (``MediaProber`` port). That
|
||||
is exactly application's job — orchestrate.
|
||||
- **Knowledge base is injected.** ``inspect_release`` takes ``kb`` and
|
||||
``prober`` as parameters; no module-level singletons here. Callers
|
||||
(the tool wrapper, tests) decide what to plug in.
|
||||
- **Mutation is contained.** We still mutate ``parsed.media_type`` and
|
||||
let ``enrich_from_probe`` fill its ``None`` fields, because
|
||||
``ParsedRelease`` is intentionally a mutable dataclass. The outer
|
||||
``InspectedResult`` is frozen so the *bundle* is immutable from the
|
||||
caller's perspective.
|
||||
- **Never raises.** Filesystem / probe errors surface as ``None``
|
||||
fields on the result, never as exceptions — same contract as the
|
||||
underlying adapters.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, replace
|
||||
from pathlib import Path
|
||||
|
||||
from alfred.application.release.detect_media_type import detect_media_type
|
||||
from alfred.application.release.enrich_from_probe import enrich_from_probe
|
||||
from alfred.application.release.supported_media import find_main_video
|
||||
from alfred.domain.release.ports import ReleaseKnowledge
|
||||
from alfred.domain.release.services import parse_release
|
||||
from alfred.domain.release.value_objects import (
|
||||
MediaTypeToken,
|
||||
ParsedRelease,
|
||||
ParseReport,
|
||||
)
|
||||
from alfred.domain.shared.media import MediaInfo
|
||||
from alfred.domain.shared.ports import MediaProber
|
||||
|
||||
|
||||
# Media types for which a probe carries no useful information.
|
||||
_NON_PROBABLE_MEDIA_TYPES = frozenset({"unknown", "other"})
|
||||
|
||||
# Media types for which there's nothing for the organizer to do.
|
||||
# ``other`` covers things like games / ISOs / archives sitting on the
|
||||
# downloads folder. ``unknown`` does NOT belong here — those need a
|
||||
# user decision, not a skip.
|
||||
_SKIPPABLE_MEDIA_TYPES = frozenset({"other"})
|
||||
|
||||
# Roads that signal the parser couldn't reach a confident answer on its
|
||||
# own. ``Road`` values are kept as strings on the report to avoid a
|
||||
# cross-package import here.
|
||||
_ASK_USER_ROADS = frozenset({"path_of_pain"})
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class InspectedResult:
|
||||
"""The full picture of a release: parsed name + filesystem reality.
|
||||
|
||||
Bundles everything the downstream pipeline needs after a single
|
||||
inspection pass:
|
||||
|
||||
- ``parsed`` — :class:`ParsedRelease`, with ``media_type`` already
|
||||
refined by :func:`detect_media_type` and ``None`` tech fields
|
||||
filled in by :func:`enrich_from_probe` when a probe ran.
|
||||
- ``report`` — :class:`ParseReport` from the parser (confidence +
|
||||
road, untouched by inspection).
|
||||
- ``source_path`` — the path the inspector was pointed at (file or
|
||||
folder), as supplied by the caller.
|
||||
- ``main_video`` — the canonical video file inside ``source_path``,
|
||||
or ``None`` if no eligible file was found.
|
||||
- ``media_info`` — the :class:`MediaInfo` snapshot when a probe
|
||||
succeeded; ``None`` when no video was probed (no main video, or
|
||||
``media_type`` in ``{"unknown", "other"}``) or when ffprobe
|
||||
failed.
|
||||
- ``probe_used`` — ``True`` iff ``media_info`` is non-``None`` and
|
||||
``enrich_from_probe`` actually ran. Explicit flag so callers
|
||||
don't have to re-derive the condition.
|
||||
- ``recommended_action`` — derived hint for the orchestrator (see
|
||||
property docstring). Encodes the exclusion / clarification /
|
||||
go-ahead decision in one place so downstream callers don't
|
||||
re-implement the same checks.
|
||||
"""
|
||||
|
||||
parsed: ParsedRelease
|
||||
report: ParseReport
|
||||
source_path: Path
|
||||
main_video: Path | None
|
||||
media_info: MediaInfo | None
|
||||
probe_used: bool
|
||||
|
||||
@property
|
||||
def recommended_action(self) -> str:
|
||||
"""Return one of ``"skip"`` / ``"ask_user"`` / ``"process"``.
|
||||
|
||||
- ``"skip"`` — nothing to organize:
|
||||
* the source has no main video file, **or**
|
||||
* ``media_type`` is ``"other"`` (games / ISOs / archives).
|
||||
- ``"ask_user"`` — a decision is required before any action:
|
||||
* ``media_type`` is ``"unknown"`` (parser couldn't classify), **or**
|
||||
* the parse landed on ``Road.PATH_OF_PAIN``
|
||||
(low-confidence, malformed name, etc.).
|
||||
- ``"process"`` — everything else: a confident parse with a
|
||||
usable media type and a main video on disk. The orchestrator
|
||||
can move straight to the planning step.
|
||||
|
||||
The check ordering matters: ``"skip"`` wins over ``"ask_user"``
|
||||
because if there's no video to organize, no question to the
|
||||
user can change that. ``"ask_user"`` then wins over
|
||||
``"process"`` because a confident parse alone isn't enough if
|
||||
the type or road still flag uncertainty.
|
||||
"""
|
||||
if self.main_video is None:
|
||||
return "skip"
|
||||
if self.parsed.media_type.value in _SKIPPABLE_MEDIA_TYPES:
|
||||
return "skip"
|
||||
if self.parsed.media_type.value == "unknown":
|
||||
return "ask_user"
|
||||
if self.report.road in _ASK_USER_ROADS:
|
||||
return "ask_user"
|
||||
return "process"
|
||||
|
||||
|
||||
def inspect_release(
|
||||
release_name: str,
|
||||
source_path: Path,
|
||||
kb: ReleaseKnowledge,
|
||||
prober: MediaProber,
|
||||
) -> InspectedResult:
|
||||
"""Run the full inspection pipeline on ``release_name`` /
|
||||
``source_path``.
|
||||
|
||||
See module docstring for the four-step flow. ``kb`` and ``prober``
|
||||
are injected so the caller controls the knowledge base layering
|
||||
and the probe adapter (real ffprobe in production, stubs in tests).
|
||||
|
||||
Never raises. A missing or unreadable ``source_path`` simply
|
||||
results in ``main_video=None`` and ``media_info=None``.
|
||||
"""
|
||||
parsed, report = parse_release(release_name, kb)
|
||||
|
||||
# Step 2: refine media_type from the on-disk extension mix.
|
||||
# detect_media_type tolerates non-existent paths (returns parsed.media_type
|
||||
# untouched), so no need to guard here. ParsedRelease is frozen — use
|
||||
# dataclasses.replace to rebind with the refined value.
|
||||
refined_media_type = MediaTypeToken(detect_media_type(parsed, source_path, kb))
|
||||
if refined_media_type != parsed.media_type:
|
||||
parsed = replace(parsed, media_type=refined_media_type)
|
||||
|
||||
# Step 3: pick the canonical main video (top-level scan only).
|
||||
main_video = find_main_video(source_path, kb)
|
||||
|
||||
# Step 4: probe + enrich, when it makes sense.
|
||||
media_info: MediaInfo | None = None
|
||||
probe_used = False
|
||||
if main_video is not None and parsed.media_type not in _NON_PROBABLE_MEDIA_TYPES:
|
||||
media_info = prober.probe(main_video)
|
||||
if media_info is not None:
|
||||
parsed = enrich_from_probe(parsed, media_info, kb)
|
||||
probe_used = True
|
||||
|
||||
return InspectedResult(
|
||||
parsed=parsed,
|
||||
report=report,
|
||||
source_path=source_path,
|
||||
main_video=main_video,
|
||||
media_info=media_info,
|
||||
probe_used=probe_used,
|
||||
)
|
||||
@@ -1,74 +0,0 @@
|
||||
"""Pre-pipeline exclusion — decide which files are worth parsing.
|
||||
|
||||
These helpers live one notch above the domain: they touch the
|
||||
filesystem (``Path.iterdir``, ``Path.suffix``) but carry no parsing
|
||||
logic of their own. The goal is to filter out non-video files and pick
|
||||
the canonical "main video" from a release folder *before* anything
|
||||
hits :func:`~alfred.domain.release.parse_release`.
|
||||
|
||||
Design notes (Phase A bis, 2026-05-20):
|
||||
|
||||
- **Extension is the sole eligibility criterion.** A file is supported
|
||||
iff its suffix is in ``kb.video_extensions``. No size threshold, no
|
||||
filename heuristics ("sample", "trailer", …). If a release packs a
|
||||
bloated featurette or names its sample alphabetically before the
|
||||
main feature, that's PATH_OF_PAIN territory — not this layer's job.
|
||||
|
||||
- **Top-level scan only.** ``find_main_video`` does not descend into
|
||||
subdirectories. Releases that wrap the main video in ``Sample/`` or
|
||||
similar are non-scene-standard and handled by the orchestrator
|
||||
upstream.
|
||||
|
||||
- **Lexicographic tie-break.** When several candidates qualify
|
||||
(legitimate for season packs), we return the first by alphabetical
|
||||
order. Deterministic, no size-based ranking.
|
||||
|
||||
- **Direct ``Path`` I/O.** No ``FilesystemScanner`` port — this layer
|
||||
is application, not domain. If isolation becomes necessary for
|
||||
testing scale, we'll introduce a port then.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
from alfred.domain.release.ports.knowledge import ReleaseKnowledge
|
||||
|
||||
|
||||
def is_supported_video(path: Path, kb: ReleaseKnowledge) -> bool:
|
||||
"""Return True when ``path`` is a video file the parser should
|
||||
consider.
|
||||
|
||||
The check is purely extension-based: ``path.suffix.lower()`` must
|
||||
belong to ``kb.video_extensions``. ``path`` must also be a regular
|
||||
file — directories and broken symlinks return False.
|
||||
"""
|
||||
if not path.is_file():
|
||||
return False
|
||||
return path.suffix.lower() in kb.video_extensions
|
||||
|
||||
|
||||
def find_main_video(folder: Path, kb: ReleaseKnowledge) -> Path | None:
|
||||
"""Return the canonical main video file inside ``folder``, or
|
||||
``None`` if there isn't one.
|
||||
|
||||
Behavior:
|
||||
|
||||
- Top-level scan only — subdirectories are ignored.
|
||||
- Eligibility is :func:`is_supported_video`.
|
||||
- When several files qualify, the lexicographically first one wins.
|
||||
- When ``folder`` itself is a video file, it is returned as-is
|
||||
(single-file releases are valid).
|
||||
- When ``folder`` doesn't exist or isn't a directory (and isn't a
|
||||
video file either), returns ``None``.
|
||||
"""
|
||||
if folder.is_file():
|
||||
return folder if is_supported_video(folder, kb) else None
|
||||
|
||||
if not folder.is_dir():
|
||||
return None
|
||||
|
||||
candidates = sorted(
|
||||
child for child in folder.iterdir() if is_supported_video(child, kb)
|
||||
)
|
||||
return candidates[0] if candidates else None
|
||||
@@ -0,0 +1,5 @@
|
||||
"""Media domain — shared naming and release parsing."""
|
||||
|
||||
from .release_parser import ParsedRelease, parse_release
|
||||
|
||||
__all__ = ["ParsedRelease", "parse_release"]
|
||||
@@ -0,0 +1,306 @@
|
||||
"""
|
||||
release_parser.py — Parse a release name into structured components.
|
||||
|
||||
Handles both dot-separated and space-separated release names:
|
||||
Oz.S03.1080p.WEBRip.x265-KONTRAST
|
||||
Oz S03 1080p WEBRip x265-KONTRAST
|
||||
Inception.2010.1080p.BluRay.x265-GROUP
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
from dataclasses import dataclass, field
|
||||
|
||||
# Known quality tokens
|
||||
_QUALITIES = {"2160p", "1080p", "720p", "480p", "576p", "4k", "8k"}
|
||||
|
||||
# Known source tokens (case-insensitive match)
|
||||
_SOURCES = {
|
||||
"bluray", "blu-ray", "bdrip", "brrip",
|
||||
"webrip", "web-rip", "webdl", "web-dl", "web",
|
||||
"hdtv", "hdrip", "dvdrip", "dvd", "vodrip",
|
||||
"amzn", "nf", "dsnp", "hmax", "atvp",
|
||||
}
|
||||
|
||||
# Known codec tokens
|
||||
_CODECS = {
|
||||
"x264", "x265", "h264", "h265", "hevc", "avc",
|
||||
"xvid", "divx", "av1", "vp9",
|
||||
"h.264", "h.265",
|
||||
}
|
||||
|
||||
# Windows-forbidden characters (we strip these from display names)
|
||||
_WIN_FORBIDDEN = re.compile(r'[?:*"<>|\\]')
|
||||
|
||||
# Episode/season pattern: S01, S01E02, S01E02E03, 1x02, etc.
|
||||
_SEASON_EP_RE = re.compile(
|
||||
r"S(\d{1,2})(?:E(\d{2})(?:E(\d{2}))?)?",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
# Year pattern
|
||||
_YEAR_RE = re.compile(r"\b(19\d{2}|20\d{2})\b")
|
||||
|
||||
|
||||
@dataclass
|
||||
class ParsedRelease:
|
||||
"""Structured representation of a parsed release name."""
|
||||
|
||||
raw: str # original release name (untouched)
|
||||
normalised: str # dots instead of spaces
|
||||
title: str # show/movie title (dots, no year/season/tech)
|
||||
year: int | None # movie year or show start year (from TMDB)
|
||||
season: int | None # season number (None for movies)
|
||||
episode: int | None # first episode number (None if season-pack)
|
||||
episode_end: int | None # last episode for multi-ep (None otherwise)
|
||||
quality: str | None # 1080p, 2160p, …
|
||||
source: str | None # WEBRip, BluRay, …
|
||||
codec: str | None # x265, HEVC, …
|
||||
group: str # release group, "UNKNOWN" if missing
|
||||
tech_string: str # quality.source.codec joined with dots
|
||||
|
||||
# -------------------------------------------------------------------------
|
||||
# Derived helpers
|
||||
# -------------------------------------------------------------------------
|
||||
|
||||
@property
|
||||
def is_movie(self) -> bool:
|
||||
return self.season is None
|
||||
|
||||
@property
|
||||
def is_season_pack(self) -> bool:
|
||||
return self.season is not None and self.episode is None
|
||||
|
||||
def show_folder_name(self, tmdb_title: str, tmdb_year: int) -> str:
|
||||
"""
|
||||
Build the series root folder name.
|
||||
|
||||
Format: {Title}.{Year}.{Tech}-{Group}
|
||||
Example: Oz.1997.1080p.WEBRip.x265-KONTRAST
|
||||
"""
|
||||
title_part = _sanitise_for_fs(tmdb_title).replace(" ", ".")
|
||||
tech = self.tech_string or "Unknown"
|
||||
return f"{title_part}.{tmdb_year}.{tech}-{self.group}"
|
||||
|
||||
def season_folder_name(self) -> str:
|
||||
"""
|
||||
Build the season subfolder name = normalised release name (no episode).
|
||||
|
||||
Example: Oz.S03.1080p.WEBRip.x265-KONTRAST
|
||||
For a single-episode release we still strip the episode token so the
|
||||
folder can hold the whole season.
|
||||
"""
|
||||
return _strip_episode_from_normalised(self.normalised)
|
||||
|
||||
def episode_filename(self, tmdb_episode_title: str | None, ext: str) -> str:
|
||||
"""
|
||||
Build the episode filename.
|
||||
|
||||
Format: {Title}.{SxxExx}.{EpisodeTitle}.{Tech}-{Group}.{ext}
|
||||
Example: Oz.S01E01.The.Routine.1080p.WEBRip.x265-KONTRAST.mkv
|
||||
|
||||
If tmdb_episode_title is None, omits the episode title segment.
|
||||
"""
|
||||
title_part = _sanitise_for_fs(self.title) # already dotted from normalised
|
||||
s = f"S{self.season:02d}" if self.season is not None else ""
|
||||
e = f"E{self.episode:02d}" if self.episode is not None else ""
|
||||
se = s + e
|
||||
|
||||
ep_title = ""
|
||||
if tmdb_episode_title:
|
||||
ep_title = "." + _sanitise_for_fs(tmdb_episode_title).replace(" ", ".")
|
||||
|
||||
tech = self.tech_string or "Unknown"
|
||||
ext_clean = ext.lstrip(".")
|
||||
return f"{title_part}.{se}{ep_title}.{tech}-{self.group}.{ext_clean}"
|
||||
|
||||
def movie_folder_name(self, tmdb_title: str, tmdb_year: int) -> str:
|
||||
"""
|
||||
Build the movie folder name.
|
||||
|
||||
Format: {Title}.{Year}.{Tech}-{Group}
|
||||
Example: Inception.2010.1080p.BluRay.x265-GROUP
|
||||
"""
|
||||
return self.show_folder_name(tmdb_title, tmdb_year)
|
||||
|
||||
def movie_filename(self, tmdb_title: str, tmdb_year: int, ext: str) -> str:
|
||||
"""
|
||||
Build the movie filename (same as folder name + extension).
|
||||
|
||||
Example: Inception.2010.1080p.BluRay.x265-GROUP.mkv
|
||||
"""
|
||||
ext_clean = ext.lstrip(".")
|
||||
return f"{self.movie_folder_name(tmdb_title, tmdb_year)}.{ext_clean}"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Public API
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def parse_release(name: str) -> ParsedRelease:
|
||||
"""
|
||||
Parse a release name and return a ParsedRelease.
|
||||
|
||||
Accepts both dot-separated and space-separated names.
|
||||
"""
|
||||
normalised = _normalise(name)
|
||||
tokens = normalised.split(".")
|
||||
|
||||
season, episode, episode_end = _extract_season_episode(tokens)
|
||||
quality, source, codec, group, tech_tokens = _extract_tech(tokens)
|
||||
title = _extract_title(tokens, season, episode, tech_tokens)
|
||||
year = _extract_year(tokens, title)
|
||||
|
||||
tech_parts = [p for p in [quality, source, codec] if p]
|
||||
tech_string = ".".join(tech_parts)
|
||||
|
||||
return ParsedRelease(
|
||||
raw=name,
|
||||
normalised=normalised,
|
||||
title=title,
|
||||
year=year,
|
||||
season=season,
|
||||
episode=episode,
|
||||
episode_end=episode_end,
|
||||
quality=quality,
|
||||
source=source,
|
||||
codec=codec,
|
||||
group=group,
|
||||
tech_string=tech_string,
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Internal helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _normalise(name: str) -> str:
|
||||
"""Replace spaces with dots, collapse multiple dots."""
|
||||
s = name.replace(" ", ".")
|
||||
s = re.sub(r"\.{2,}", ".", s)
|
||||
return s.strip(".")
|
||||
|
||||
|
||||
def _sanitise_for_fs(text: str) -> str:
|
||||
"""Remove Windows-forbidden characters from a string."""
|
||||
return _WIN_FORBIDDEN.sub("", text)
|
||||
|
||||
|
||||
def _extract_season_episode(tokens: list[str]) -> tuple[int | None, int | None, int | None]:
|
||||
joined = ".".join(tokens)
|
||||
m = _SEASON_EP_RE.search(joined)
|
||||
if not m:
|
||||
return None, None, None
|
||||
season = int(m.group(1))
|
||||
episode = int(m.group(2)) if m.group(2) else None
|
||||
episode_end = int(m.group(3)) if m.group(3) else None
|
||||
return season, episode, episode_end
|
||||
|
||||
|
||||
def _extract_tech(
|
||||
tokens: list[str],
|
||||
) -> tuple[str | None, str | None, str | None, str, set[str]]:
|
||||
"""
|
||||
Extract quality, source, codec, group from tokens.
|
||||
|
||||
Returns (quality, source, codec, group, tech_token_set).
|
||||
|
||||
Group extraction strategy (in priority order):
|
||||
1. Token where prefix is a known codec: x265-GROUP
|
||||
2. Last token in the list that contains a dash (fallback for 10bit-GROUP, AAC5.1-GROUP, etc.)
|
||||
"""
|
||||
quality: str | None = None
|
||||
source: str | None = None
|
||||
codec: str | None = None
|
||||
group = "UNKNOWN"
|
||||
tech_tokens: set[str] = set()
|
||||
|
||||
for tok in tokens:
|
||||
tl = tok.lower()
|
||||
|
||||
if tl in _QUALITIES:
|
||||
quality = tok
|
||||
tech_tokens.add(tok)
|
||||
continue
|
||||
|
||||
if tl in _SOURCES:
|
||||
source = tok
|
||||
tech_tokens.add(tok)
|
||||
continue
|
||||
|
||||
if "-" in tok:
|
||||
parts = tok.rsplit("-", 1)
|
||||
# codec-GROUP (highest priority for group)
|
||||
if parts[0].lower() in _CODECS:
|
||||
codec = parts[0]
|
||||
group = parts[1] if parts[1] else "UNKNOWN"
|
||||
tech_tokens.add(tok)
|
||||
continue
|
||||
# source with dash: Web-DL, WEB-DL, etc.
|
||||
if parts[0].lower() in _SOURCES or tok.lower().replace("-", "") in _SOURCES:
|
||||
source = tok
|
||||
tech_tokens.add(tok)
|
||||
continue
|
||||
|
||||
if tl in _CODECS:
|
||||
codec = tok
|
||||
tech_tokens.add(tok)
|
||||
|
||||
# Fallback: if group still UNKNOWN, use the rightmost token with a dash
|
||||
# that isn't a known source (handles "10bit-Protozoan", "AAC5.1-YTS", etc.)
|
||||
if group == "UNKNOWN":
|
||||
for tok in reversed(tokens):
|
||||
if "-" in tok:
|
||||
parts = tok.rsplit("-", 1)
|
||||
tl = tok.lower()
|
||||
if tl in _SOURCES or tok.lower().replace("-", "") in _SOURCES:
|
||||
continue
|
||||
if parts[1]: # non-empty group part
|
||||
group = parts[1]
|
||||
break
|
||||
|
||||
return quality, source, codec, group, tech_tokens
|
||||
|
||||
|
||||
def _extract_title(tokens: list[str], season: int | None, episode: int | None, tech_tokens: set[str]) -> str:
|
||||
"""
|
||||
Extract the title portion: everything before the first season/year/tech token.
|
||||
"""
|
||||
title_parts = []
|
||||
for tok in tokens:
|
||||
# Stop at season token
|
||||
if _SEASON_EP_RE.match(tok):
|
||||
break
|
||||
# Stop at year
|
||||
if _YEAR_RE.fullmatch(tok):
|
||||
break
|
||||
# Stop at tech tokens
|
||||
if tok in tech_tokens or tok.lower() in _QUALITIES | _SOURCES | _CODECS:
|
||||
break
|
||||
# Stop if token contains a dash (likely codec-GROUP)
|
||||
if "-" in tok and any(p.lower() in _CODECS | _SOURCES for p in tok.split("-")):
|
||||
break
|
||||
title_parts.append(tok)
|
||||
|
||||
return ".".join(title_parts) if title_parts else tokens[0]
|
||||
|
||||
|
||||
def _extract_year(tokens: list[str], title: str) -> int | None:
|
||||
"""Extract a 4-digit year from tokens (only after the title)."""
|
||||
title_len = len(title.split("."))
|
||||
for tok in tokens[title_len:]:
|
||||
m = _YEAR_RE.fullmatch(tok)
|
||||
if m:
|
||||
return int(m.group(1))
|
||||
return None
|
||||
|
||||
|
||||
def _strip_episode_from_normalised(normalised: str) -> str:
|
||||
"""
|
||||
Remove all episode parts (Exx) from a normalised release name, keeping Sxx.
|
||||
|
||||
Oz.S03E01.1080p... → Oz.S03.1080p...
|
||||
Archer.S14E09E10E11.1080p... → Archer.S14.1080p...
|
||||
"""
|
||||
return re.sub(r"(S\d{2})(E\d{2})+", r"\1", normalised, flags=re.IGNORECASE)
|
||||
@@ -2,6 +2,7 @@
|
||||
|
||||
from .entities import Movie
|
||||
from .exceptions import InvalidMovieData, MovieNotFound
|
||||
from .services import MovieService
|
||||
from .value_objects import MovieTitle, Quality, ReleaseYear
|
||||
|
||||
__all__ = [
|
||||
@@ -11,4 +12,5 @@ __all__ = [
|
||||
"Quality",
|
||||
"MovieNotFound",
|
||||
"InvalidMovieData",
|
||||
"MovieService",
|
||||
]
|
||||
|
||||
@@ -3,30 +3,16 @@
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime
|
||||
|
||||
from ..shared.media import AudioTrack, MediaWithTracks, SubtitleTrack
|
||||
from ..shared.value_objects import FilePath, FileSize, ImdbId
|
||||
from .value_objects import MovieTitle, Quality, ReleaseYear
|
||||
|
||||
|
||||
@dataclass(frozen=True, eq=False)
|
||||
class Movie(MediaWithTracks):
|
||||
@dataclass
|
||||
class Movie:
|
||||
"""
|
||||
Movie aggregate root for the movies domain.
|
||||
Movie entity representing a movie in the media library.
|
||||
|
||||
Carries file metadata (path, size) and the tracks discovered by the
|
||||
ffprobe + subtitle scan pipeline. The track tuples may be empty when the
|
||||
movie is known but not yet scanned, or when no file is downloaded.
|
||||
|
||||
Track helpers follow the same "C+" contract as ``Episode``: pass a
|
||||
``Language`` for cross-format matching, or a ``str`` for case-insensitive
|
||||
direct comparison.
|
||||
|
||||
Frozen: rebuild via ``dataclasses.replace`` to project enrichment results
|
||||
(audio/subtitle tracks, file metadata) onto a new instance.
|
||||
|
||||
Equality is identity-based: two ``Movie`` instances are equal iff they
|
||||
share the same ``imdb_id``, regardless of file/track contents. This is
|
||||
the DDD aggregate invariant — the aggregate is identified by its root id.
|
||||
This is the main aggregate root for the movies domain.
|
||||
"""
|
||||
|
||||
imdb_id: ImdbId
|
||||
@@ -37,8 +23,6 @@ class Movie(MediaWithTracks):
|
||||
file_size: FileSize | None = None
|
||||
tmdb_id: int | None = None
|
||||
added_at: datetime = field(default_factory=datetime.now)
|
||||
audio_tracks: tuple[AudioTrack, ...] = field(default_factory=tuple)
|
||||
subtitle_tracks: tuple[SubtitleTrack, ...] = field(default_factory=tuple)
|
||||
|
||||
def __post_init__(self):
|
||||
"""Validate movie entity."""
|
||||
@@ -60,16 +44,13 @@ class Movie(MediaWithTracks):
|
||||
f"title must be MovieTitle or str, got {type(self.title)}"
|
||||
)
|
||||
|
||||
def __eq__(self, other: object) -> bool:
|
||||
if not isinstance(other, Movie):
|
||||
return NotImplemented
|
||||
return self.imdb_id == other.imdb_id
|
||||
def has_file(self) -> bool:
|
||||
"""Check if the movie has an associated file."""
|
||||
return self.file_path is not None and self.file_path.exists()
|
||||
|
||||
def __hash__(self) -> int:
|
||||
return hash(self.imdb_id)
|
||||
|
||||
# Track helpers (has_audio_in / audio_languages / has_subtitles_in /
|
||||
# has_forced_subs / subtitle_languages) come from MediaWithTracks.
|
||||
def is_downloaded(self) -> bool:
|
||||
"""Check if the movie is downloaded (has a file)."""
|
||||
return self.has_file()
|
||||
|
||||
def get_folder_name(self) -> str:
|
||||
"""
|
||||
|
||||
@@ -0,0 +1,192 @@
|
||||
"""Movie domain services - Business logic."""
|
||||
|
||||
import logging
|
||||
import re
|
||||
|
||||
from ..shared.value_objects import FilePath, ImdbId
|
||||
from .entities import Movie
|
||||
from .exceptions import MovieAlreadyExists, MovieNotFound
|
||||
from .repositories import MovieRepository
|
||||
from .value_objects import Quality
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class MovieService:
|
||||
"""
|
||||
Domain service for movie-related business logic.
|
||||
|
||||
This service contains business rules that don't naturally fit
|
||||
within a single entity.
|
||||
"""
|
||||
|
||||
def __init__(self, repository: MovieRepository):
|
||||
"""
|
||||
Initialize movie service.
|
||||
|
||||
Args:
|
||||
repository: Movie repository for persistence
|
||||
"""
|
||||
self.repository = repository
|
||||
|
||||
def add_movie(self, movie: Movie) -> None:
|
||||
"""
|
||||
Add a new movie to the library.
|
||||
|
||||
Args:
|
||||
movie: Movie entity to add
|
||||
|
||||
Raises:
|
||||
MovieAlreadyExists: If movie with same IMDb ID already exists
|
||||
"""
|
||||
if self.repository.exists(movie.imdb_id):
|
||||
raise MovieAlreadyExists(
|
||||
f"Movie with IMDb ID {movie.imdb_id} already exists"
|
||||
)
|
||||
|
||||
self.repository.save(movie)
|
||||
logger.info(f"Added movie: {movie.title.value} ({movie.imdb_id})")
|
||||
|
||||
def get_movie(self, imdb_id: ImdbId) -> Movie:
|
||||
"""
|
||||
Get a movie by IMDb ID.
|
||||
|
||||
Args:
|
||||
imdb_id: IMDb ID of the movie
|
||||
|
||||
Returns:
|
||||
Movie entity
|
||||
|
||||
Raises:
|
||||
MovieNotFound: If movie not found
|
||||
"""
|
||||
movie = self.repository.find_by_imdb_id(imdb_id)
|
||||
if not movie:
|
||||
raise MovieNotFound(f"Movie with IMDb ID {imdb_id} not found")
|
||||
return movie
|
||||
|
||||
def get_all_movies(self) -> list[Movie]:
|
||||
"""
|
||||
Get all movies in the library.
|
||||
|
||||
Returns:
|
||||
List of all movies
|
||||
"""
|
||||
return self.repository.find_all()
|
||||
|
||||
def update_movie(self, movie: Movie) -> None:
|
||||
"""
|
||||
Update an existing movie.
|
||||
|
||||
Args:
|
||||
movie: Movie entity with updated data
|
||||
|
||||
Raises:
|
||||
MovieNotFound: If movie doesn't exist
|
||||
"""
|
||||
if not self.repository.exists(movie.imdb_id):
|
||||
raise MovieNotFound(f"Movie with IMDb ID {movie.imdb_id} not found")
|
||||
|
||||
self.repository.save(movie)
|
||||
logger.info(f"Updated movie: {movie.title.value} ({movie.imdb_id})")
|
||||
|
||||
def remove_movie(self, imdb_id: ImdbId) -> None:
|
||||
"""
|
||||
Remove a movie from the library.
|
||||
|
||||
Args:
|
||||
imdb_id: IMDb ID of the movie to remove
|
||||
|
||||
Raises:
|
||||
MovieNotFound: If movie not found
|
||||
"""
|
||||
if not self.repository.delete(imdb_id):
|
||||
raise MovieNotFound(f"Movie with IMDb ID {imdb_id} not found")
|
||||
|
||||
logger.info(f"Removed movie with IMDb ID: {imdb_id}")
|
||||
|
||||
def detect_quality_from_filename(self, filename: str) -> Quality:
|
||||
"""
|
||||
Detect video quality from filename.
|
||||
|
||||
Args:
|
||||
filename: Filename to analyze
|
||||
|
||||
Returns:
|
||||
Detected quality or UNKNOWN
|
||||
"""
|
||||
filename_lower = filename.lower()
|
||||
|
||||
# Check for quality indicators
|
||||
if "2160p" in filename_lower or "4k" in filename_lower:
|
||||
return Quality.UHD_4K
|
||||
elif "1080p" in filename_lower:
|
||||
return Quality.FULL_HD
|
||||
elif "720p" in filename_lower:
|
||||
return Quality.HD
|
||||
elif "480p" in filename_lower:
|
||||
return Quality.SD
|
||||
|
||||
return Quality.UNKNOWN
|
||||
|
||||
def extract_year_from_filename(self, filename: str) -> int | None:
|
||||
"""
|
||||
Extract release year from filename.
|
||||
|
||||
Args:
|
||||
filename: Filename to analyze
|
||||
|
||||
Returns:
|
||||
Year if found, None otherwise
|
||||
"""
|
||||
# Look for 4-digit year in parentheses or standalone
|
||||
# Examples: "Movie (2010)", "Movie.2010.1080p"
|
||||
patterns = [
|
||||
r"\((\d{4})\)", # (2010)
|
||||
r"\.(\d{4})\.", # .2010.
|
||||
r"\s(\d{4})\s", # 2010
|
||||
]
|
||||
|
||||
for pattern in patterns:
|
||||
match = re.search(pattern, filename)
|
||||
if match:
|
||||
year = int(match.group(1))
|
||||
# Validate year is reasonable
|
||||
if 1888 <= year <= 2100:
|
||||
return year
|
||||
|
||||
return None
|
||||
|
||||
def validate_movie_file(self, file_path: FilePath) -> bool:
|
||||
"""
|
||||
Validate that a file is a valid movie file.
|
||||
|
||||
Args:
|
||||
file_path: Path to the file
|
||||
|
||||
Returns:
|
||||
True if valid movie file, False otherwise
|
||||
"""
|
||||
if not file_path.exists():
|
||||
logger.warning(f"File does not exist: {file_path}")
|
||||
return False
|
||||
|
||||
if not file_path.is_file():
|
||||
logger.warning(f"Path is not a file: {file_path}")
|
||||
return False
|
||||
|
||||
# Check file extension
|
||||
valid_extensions = {".mkv", ".mp4", ".avi", ".mov", ".wmv", ".flv", ".webm"}
|
||||
if file_path.value.suffix.lower() not in valid_extensions:
|
||||
logger.warning(f"Invalid file extension: {file_path.value.suffix}")
|
||||
return False
|
||||
|
||||
# Check file size (should be at least 100 MB for a movie)
|
||||
min_size = 100 * 1024 * 1024 # 100 MB
|
||||
if file_path.value.stat().st_size < min_size:
|
||||
logger.warning(
|
||||
f"File too small to be a movie: {file_path.value.stat().st_size} bytes"
|
||||
)
|
||||
return False
|
||||
|
||||
return True
|
||||
@@ -1,10 +1,10 @@
|
||||
"""Movie domain value objects."""
|
||||
|
||||
import re
|
||||
from dataclasses import dataclass
|
||||
from enum import Enum
|
||||
|
||||
from ..shared.exceptions import ValidationError
|
||||
from ..shared.value_objects import to_dot_folder_name
|
||||
|
||||
|
||||
class Quality(Enum):
|
||||
@@ -17,7 +17,7 @@ class Quality(Enum):
|
||||
UNKNOWN = "unknown"
|
||||
|
||||
@classmethod
|
||||
def from_string(cls, quality_str: str) -> Quality:
|
||||
def from_string(cls, quality_str: str) -> "Quality":
|
||||
"""
|
||||
Parse quality from string.
|
||||
|
||||
@@ -67,7 +67,11 @@ class MovieTitle:
|
||||
|
||||
Removes special characters and replaces spaces with dots.
|
||||
"""
|
||||
return to_dot_folder_name(self.value)
|
||||
# Remove special characters except spaces, dots, and hyphens
|
||||
cleaned = re.sub(r"[^\w\s\.\-]", "", self.value)
|
||||
# Replace spaces with dots
|
||||
normalized = cleaned.replace(" ", ".")
|
||||
return normalized
|
||||
|
||||
def __str__(self) -> str:
|
||||
return self.value
|
||||
|
||||
@@ -1,6 +0,0 @@
|
||||
"""Release domain — release name parsing and naming conventions."""
|
||||
|
||||
from .services import parse_release
|
||||
from .value_objects import ParsedRelease, ParseReport
|
||||
|
||||
__all__ = ["ParsedRelease", "ParseReport", "parse_release"]
|
||||
@@ -1,31 +0,0 @@
|
||||
"""Release parser v2 — annotate-based pipeline.
|
||||
|
||||
This package is the future home of ``parse_release``. It restructures the
|
||||
parsing logic around a **tokenize → annotate → assemble** pipeline:
|
||||
|
||||
1. **tokenize**: split the release name into atomic tokens.
|
||||
2. **annotate**: walk tokens left-to-right, assigning each one a
|
||||
:class:`TokenRole` (TITLE, YEAR, SEASON, RESOLUTION, …) using the
|
||||
injected :class:`~alfred.domain.release.ports.knowledge.ReleaseKnowledge`.
|
||||
3. **assemble**: fold the annotated tokens into a :class:`ParsedRelease`.
|
||||
|
||||
The pipeline has three internal paths driven by the detected release group:
|
||||
|
||||
- **EASY**: known group (KONTRAST, RARBG, …) with a schema-driven layout
|
||||
declared in ``knowledge/release/release_groups/<group>.yaml``.
|
||||
- **SHITTY**: unknown group, best-effort matching against the global
|
||||
knowledge sets, with a 0-100 confidence score.
|
||||
- **PATH OF PAIN**: score below threshold OR critical chunks missing —
|
||||
signaled to the caller, who decides whether to involve the LLM/user.
|
||||
|
||||
Today the package exposes scaffolding only (token VOs and a thin pipeline
|
||||
stub). The legacy ``parse_release`` in ``release.services`` keeps serving
|
||||
production until each piece of the v2 pipeline is wired in.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from .schema import GroupSchema, SchemaChunk
|
||||
from .tokens import Token, TokenRole
|
||||
|
||||
__all__ = ["GroupSchema", "SchemaChunk", "Token", "TokenRole"]
|
||||
@@ -1,763 +0,0 @@
|
||||
"""Annotate-based pipeline.
|
||||
|
||||
Three stages:
|
||||
|
||||
1. :func:`tokenize` — release name → ``list[Token]`` (all UNKNOWN), plus
|
||||
a separately-returned site tag (e.g. ``[YTS.MX]``) that is never
|
||||
tokenized.
|
||||
2. :func:`annotate` — promote each token's :class:`TokenRole` using the
|
||||
injected knowledge base. Two sub-passes:
|
||||
|
||||
a. **Structural** (schema-driven, EASY only). Detects the group at
|
||||
the right end, looks up its :class:`GroupSchema`, then matches
|
||||
the schema's chunk sequence against the token stream. Between
|
||||
two structural chunks, any number of unmatched tokens may
|
||||
remain — they are left UNKNOWN for the enricher pass to handle.
|
||||
b. **Enrichers** (non-positional). Walks UNKNOWN tokens and tags
|
||||
audio / video-meta / edition / language roles. Multi-token
|
||||
sequences (``DTS.HD.MA``, ``DV.HDR10``, ``DIRECTORS.CUT``) are
|
||||
matched first, single tokens after.
|
||||
|
||||
3. :func:`assemble` — fold annotated tokens into a
|
||||
:class:`~alfred.domain.release.value_objects.ParsedRelease`-compatible
|
||||
dict.
|
||||
|
||||
The pipeline is **pure**: no I/O, no TMDB, no probe. All knowledge
|
||||
arrives through ``kb: ReleaseKnowledge``.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from ..ports.knowledge import ReleaseKnowledge
|
||||
from ..value_objects import MediaTypeToken
|
||||
from .schema import GroupSchema
|
||||
from .tokens import Token, TokenRole
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Stage 1 — tokenize
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def strip_site_tag(name: str) -> tuple[str, str | None]:
|
||||
"""Split off a ``[site.tag]`` prefix or suffix.
|
||||
|
||||
Returns ``(clean_name, tag)``. If no tag is found, returns
|
||||
``(name.strip(), None)``.
|
||||
"""
|
||||
s = name.strip()
|
||||
|
||||
if s.startswith("["):
|
||||
close = s.find("]")
|
||||
if close != -1:
|
||||
tag = s[1:close].strip()
|
||||
remainder = s[close + 1 :].strip()
|
||||
if tag and remainder:
|
||||
return remainder, tag
|
||||
|
||||
if s.endswith("]"):
|
||||
open_bracket = s.rfind("[")
|
||||
if open_bracket != -1:
|
||||
tag = s[open_bracket + 1 : -1].strip()
|
||||
remainder = s[:open_bracket].strip()
|
||||
if tag and remainder:
|
||||
return remainder, tag
|
||||
|
||||
return s, None
|
||||
|
||||
|
||||
def tokenize(name: str, kb: ReleaseKnowledge) -> tuple[list[Token], str | None]:
|
||||
"""Split ``name`` into tokens after stripping any site tag.
|
||||
|
||||
String-ops style: replace every configured separator with a single
|
||||
NUL byte then split. NUL cannot legally appear in a release name, so
|
||||
it's a safe sentinel.
|
||||
"""
|
||||
clean, site_tag = strip_site_tag(name)
|
||||
|
||||
DELIM = "\x00"
|
||||
buf = clean
|
||||
for sep in kb.separators:
|
||||
if sep != DELIM:
|
||||
buf = buf.replace(sep, DELIM)
|
||||
|
||||
pieces = [p for p in buf.split(DELIM) if p]
|
||||
tokens = [Token(text=p, index=i) for i, p in enumerate(pieces)]
|
||||
return tokens, site_tag
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers shared across passes
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _parse_season_episode(text: str) -> tuple[int, int | None, int | None] | None:
|
||||
"""Parse a single token as ``SxxExx`` / ``SxxExxExx`` / ``Sxx`` /
|
||||
``Sxx-yy`` (season range) / ``NxNN``.
|
||||
|
||||
Returns ``(season, episode, episode_end)`` or ``None`` if the token
|
||||
is not a season/episode marker. For ``Sxx-yy``, returns the first
|
||||
season with no episode info — the caller is expected to detect the
|
||||
range form and promote ``media_type`` to ``tv_complete`` separately.
|
||||
"""
|
||||
upper = text.upper()
|
||||
|
||||
# SxxExx form (and Sxx, Sxx-yy)
|
||||
if len(upper) >= 3 and upper[0] == "S" and upper[1:3].isdigit():
|
||||
season = int(upper[1:3])
|
||||
rest = upper[3:]
|
||||
|
||||
if not rest:
|
||||
return season, None, None
|
||||
|
||||
# Sxx-yy season-range form: capture the first season, treat as a
|
||||
# complete-series marker (no episode info).
|
||||
if (
|
||||
len(rest) == 3
|
||||
and rest[0] == "-"
|
||||
and rest[1:3].isdigit()
|
||||
):
|
||||
return season, None, None
|
||||
|
||||
episodes: list[int] = []
|
||||
while rest.startswith("E") and len(rest) >= 3 and rest[1:3].isdigit():
|
||||
episodes.append(int(rest[1:3]))
|
||||
rest = rest[3:]
|
||||
|
||||
if not episodes:
|
||||
return None
|
||||
# For chained multi-episode markers (E09E10E11), the range is the
|
||||
# first → last episode. Intermediate values are implied.
|
||||
return season, episodes[0], episodes[-1] if len(episodes) >= 2 else None
|
||||
|
||||
# NxNN form
|
||||
if "X" in upper:
|
||||
parts = upper.split("X")
|
||||
if len(parts) >= 2 and all(p.isdigit() and p for p in parts):
|
||||
season = int(parts[0])
|
||||
episode = int(parts[1])
|
||||
episode_end = int(parts[2]) if len(parts) >= 3 else None
|
||||
return season, episode, episode_end
|
||||
|
||||
return None
|
||||
|
||||
|
||||
def _is_year(text: str) -> bool:
|
||||
"""Return True if ``text`` is a 4-digit year in [1900, 2099]."""
|
||||
return len(text) == 4 and text.isdigit() and 1900 <= int(text) <= 2099
|
||||
|
||||
|
||||
def _split_codec_group(text: str, kb: ReleaseKnowledge) -> tuple[str, str] | None:
|
||||
"""Split a ``codec-GROUP`` token into ``(codec, group)`` if it fits.
|
||||
|
||||
Returns ``None`` if the token doesn't match the ``codec-GROUP``
|
||||
shape. Handles the empty-group case (``x265-``) as ``(codec, "")``.
|
||||
"""
|
||||
if "-" not in text:
|
||||
return None
|
||||
head, _, tail = text.rpartition("-")
|
||||
if head.lower() in kb.codecs:
|
||||
return head, tail
|
||||
return None
|
||||
|
||||
|
||||
def _match_role(text: str, role: TokenRole, kb: ReleaseKnowledge) -> TokenRole | None:
|
||||
"""Return ``role`` if ``text`` matches it under ``kb``, else ``None``."""
|
||||
lower = text.lower()
|
||||
|
||||
if role is TokenRole.YEAR:
|
||||
return TokenRole.YEAR if _is_year(text) else None
|
||||
|
||||
if role is TokenRole.SEASON_EPISODE:
|
||||
return (
|
||||
TokenRole.SEASON_EPISODE
|
||||
if _parse_season_episode(text) is not None
|
||||
else None
|
||||
)
|
||||
|
||||
if role is TokenRole.RESOLUTION:
|
||||
return TokenRole.RESOLUTION if lower in kb.resolutions else None
|
||||
|
||||
if role is TokenRole.SOURCE:
|
||||
return TokenRole.SOURCE if lower in kb.sources else None
|
||||
|
||||
if role is TokenRole.CODEC:
|
||||
return TokenRole.CODEC if lower in kb.codecs else None
|
||||
|
||||
return None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Stage 2a — group detection
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _detect_group(tokens: list[Token], kb: ReleaseKnowledge) -> tuple[str, int | None]:
|
||||
"""Identify the release group by walking tokens right-to-left.
|
||||
|
||||
Returns ``(group_name, token_index_carrying_group)``. ``index`` is
|
||||
``None`` when the group is absent (no trailing ``-`` in the stream).
|
||||
"""
|
||||
# Priority 1: codec-GROUP shape (clearest signal).
|
||||
for tok in reversed(tokens):
|
||||
split = _split_codec_group(tok.text, kb)
|
||||
if split is not None:
|
||||
_, group = split
|
||||
return (group or "UNKNOWN"), tok.index
|
||||
|
||||
# Priority 2: rightmost dash, excluding dashed sources (Web-DL, etc.).
|
||||
for tok in reversed(tokens):
|
||||
if "-" not in tok.text:
|
||||
continue
|
||||
head, _, tail = tok.text.rpartition("-")
|
||||
if (
|
||||
head.lower() in kb.sources
|
||||
or tok.text.lower().replace("-", "") in kb.sources
|
||||
):
|
||||
continue
|
||||
if tail:
|
||||
return tail, tok.index
|
||||
|
||||
return "UNKNOWN", None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Stage 2b — structural annotation (schema-driven)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _annotate_structural(
|
||||
tokens: list[Token],
|
||||
kb: ReleaseKnowledge,
|
||||
schema: GroupSchema,
|
||||
group_token_index: int,
|
||||
) -> list[Token] | None:
|
||||
"""Annotate structural tokens following a known group schema.
|
||||
|
||||
Walks the schema's chunks against the body (tokens up to the group
|
||||
token). For each chunk, scans forward in the body for a matching
|
||||
token — tokens passed over without match are left UNKNOWN (the
|
||||
enricher pass will handle them).
|
||||
|
||||
Returns ``None`` if any mandatory chunk fails to find a match.
|
||||
"""
|
||||
result = list(tokens)
|
||||
|
||||
# The codec-GROUP token carries CODEC + GROUP. Split it now so the
|
||||
# schema walk knows the codec is "pre-consumed" at the end.
|
||||
group_token = result[group_token_index]
|
||||
cg_split = _split_codec_group(group_token.text, kb)
|
||||
codec_pre_consumed = False
|
||||
if cg_split is not None:
|
||||
codec, group = cg_split
|
||||
result[group_token_index] = group_token.with_role(
|
||||
TokenRole.CODEC, codec=codec, group=group or "UNKNOWN"
|
||||
)
|
||||
codec_pre_consumed = True
|
||||
else:
|
||||
head, _, tail = group_token.text.rpartition("-")
|
||||
result[group_token_index] = group_token.with_role(
|
||||
TokenRole.GROUP, group=tail or "UNKNOWN", prefix=head
|
||||
)
|
||||
|
||||
body_end = group_token_index # exclusive
|
||||
tok_idx = 0
|
||||
chunk_idx = 0
|
||||
|
||||
# 1) TITLE — leftmost contiguous tokens up to the first structural
|
||||
# boundary. Title is special because it can be multi-token.
|
||||
while (
|
||||
chunk_idx < len(schema.chunks)
|
||||
and schema.chunks[chunk_idx].role is TokenRole.TITLE
|
||||
):
|
||||
title_end = _find_title_end(result, body_end, kb)
|
||||
for i in range(tok_idx, title_end):
|
||||
result[i] = result[i].with_role(TokenRole.TITLE)
|
||||
tok_idx = title_end
|
||||
chunk_idx += 1
|
||||
|
||||
# 2) Remaining structural chunks. For each, scan forward in the body
|
||||
# for a matching token; tokens passed over remain UNKNOWN.
|
||||
for chunk in schema.chunks[chunk_idx:]:
|
||||
if chunk.role is TokenRole.GROUP:
|
||||
continue
|
||||
if chunk.role is TokenRole.CODEC and codec_pre_consumed:
|
||||
continue
|
||||
|
||||
match_idx = _find_chunk(result, tok_idx, body_end, chunk.role, kb)
|
||||
if match_idx is None:
|
||||
if chunk.optional:
|
||||
continue
|
||||
return None
|
||||
|
||||
result[match_idx] = result[match_idx].with_role(chunk.role)
|
||||
tok_idx = match_idx + 1
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def _find_title_end(
|
||||
tokens: list[Token], body_end: int, kb: ReleaseKnowledge
|
||||
) -> int:
|
||||
"""Return the exclusive index where the title ends.
|
||||
|
||||
The title is the leftmost run of tokens whose text does not match
|
||||
any structural role (year, season/episode, resolution, source,
|
||||
codec). Enricher tokens (audio, HDR, language) are *not* boundaries
|
||||
because they can appear in the middle of the structural sequence;
|
||||
however, in canonical scene names they don't appear inside the title
|
||||
itself, so this heuristic holds in practice.
|
||||
"""
|
||||
for i in range(body_end):
|
||||
text = tokens[i].text
|
||||
if _parse_season_episode(text) is not None:
|
||||
return i
|
||||
if _is_year(text):
|
||||
return i
|
||||
lower = text.lower()
|
||||
if lower in kb.resolutions:
|
||||
return i
|
||||
if lower in kb.sources:
|
||||
return i
|
||||
if lower in kb.codecs:
|
||||
return i
|
||||
# codec-GROUP token (e.g. "x265-KONTRAST") or dashed source (Web-DL).
|
||||
if "-" in text:
|
||||
head, _, _ = text.rpartition("-")
|
||||
if (
|
||||
head.lower() in kb.codecs
|
||||
or head.lower() in kb.sources
|
||||
or text.lower().replace("-", "") in kb.sources
|
||||
):
|
||||
return i
|
||||
return body_end
|
||||
|
||||
|
||||
def _find_chunk(
|
||||
tokens: list[Token],
|
||||
start: int,
|
||||
end: int,
|
||||
role: TokenRole,
|
||||
kb: ReleaseKnowledge,
|
||||
) -> int | None:
|
||||
"""Return the first index in ``[start, end)`` whose token matches ``role``.
|
||||
|
||||
Returns ``None`` if no token in the range matches. Tokens already
|
||||
annotated (non-UNKNOWN) are skipped — they belong to another chunk.
|
||||
"""
|
||||
for i in range(start, end):
|
||||
if tokens[i].role is not TokenRole.UNKNOWN:
|
||||
continue
|
||||
if _match_role(tokens[i].text, role, kb) is not None:
|
||||
return i
|
||||
return None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Stage 2b' — SHITTY annotation (schema-less heuristic)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _annotate_shitty(
|
||||
tokens: list[Token],
|
||||
kb: ReleaseKnowledge,
|
||||
group_index: int | None,
|
||||
) -> list[Token]:
|
||||
"""Schema-less, dictionary-driven annotation.
|
||||
|
||||
SHITTY's job is narrow: for releases that *look* like scene names
|
||||
but don't have a registered group schema, tag every token whose text
|
||||
falls into a known YAML bucket (resolutions, codecs, sources, …).
|
||||
Anything we can't classify stays UNKNOWN. The leftmost run of
|
||||
UNKNOWN tokens becomes the title. Done.
|
||||
|
||||
Anything that requires more reasoning (parenthesized tech blocks,
|
||||
bare-dashed title fragments, year-disguised slug suffixes, …) is
|
||||
PATH OF PAIN territory and stays out of here on purpose.
|
||||
"""
|
||||
result = list(tokens)
|
||||
|
||||
# 1) Group token — split codec-GROUP or tag GROUP. Same logic as EASY.
|
||||
if group_index is not None:
|
||||
gt = result[group_index]
|
||||
cg_split = _split_codec_group(gt.text, kb)
|
||||
if cg_split is not None:
|
||||
codec, group = cg_split
|
||||
result[group_index] = gt.with_role(
|
||||
TokenRole.CODEC, codec=codec, group=group or "UNKNOWN"
|
||||
)
|
||||
else:
|
||||
_, _, tail = gt.text.rpartition("-")
|
||||
result[group_index] = gt.with_role(
|
||||
TokenRole.GROUP, group=tail or "UNKNOWN"
|
||||
)
|
||||
|
||||
# 2) Enrichers (audio / video-meta / edition / language).
|
||||
result = _annotate_enrichers(result, kb)
|
||||
|
||||
# 3) Single pass: tag each UNKNOWN token by looking it up in the kb
|
||||
# buckets. First match wins per token, first occurrence wins per
|
||||
# role (we don't overwrite an already-tagged role).
|
||||
matchers: list[tuple[TokenRole, callable]] = [
|
||||
(TokenRole.SEASON_EPISODE, lambda t: _parse_season_episode(t) is not None),
|
||||
(TokenRole.YEAR, _is_year),
|
||||
(TokenRole.RESOLUTION, lambda t: t.lower() in kb.resolutions),
|
||||
(TokenRole.DISTRIBUTOR, lambda t: t.upper() in kb.distributors),
|
||||
(TokenRole.SOURCE, lambda t: t.lower() in kb.sources),
|
||||
(TokenRole.CODEC, lambda t: t.lower() in kb.codecs),
|
||||
]
|
||||
seen: set[TokenRole] = set()
|
||||
|
||||
for i, tok in enumerate(result):
|
||||
if tok.role is not TokenRole.UNKNOWN:
|
||||
continue
|
||||
for role, matches in matchers:
|
||||
if role in seen:
|
||||
continue
|
||||
if matches(tok.text):
|
||||
result[i] = tok.with_role(role)
|
||||
seen.add(role)
|
||||
break
|
||||
|
||||
# 4) Title = leftmost contiguous UNKNOWN tokens.
|
||||
for i, tok in enumerate(result):
|
||||
if tok.role is not TokenRole.UNKNOWN:
|
||||
break
|
||||
result[i] = tok.with_role(TokenRole.TITLE)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Stage 2c — enricher pass (non-positional roles)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _annotate_enrichers(tokens: list[Token], kb: ReleaseKnowledge) -> list[Token]:
|
||||
"""Tag the remaining UNKNOWN tokens with non-positional roles.
|
||||
|
||||
Multi-token sequences are matched first (so ``DTS.HD.MA`` wins over
|
||||
a single-token ``DTS``). For each sequence match, the first token
|
||||
receives the role + ``extra["sequence"]`` (the canonical joined
|
||||
value), and the trailing members are marked with the same role +
|
||||
``extra["sequence_member"]=True`` so :func:`assemble` extracts the
|
||||
value only from the primary.
|
||||
"""
|
||||
result = list(tokens)
|
||||
|
||||
# Multi-token sequences first.
|
||||
_apply_sequences(
|
||||
result, kb.audio.get("sequences", []), "codec", TokenRole.AUDIO_CODEC
|
||||
)
|
||||
_apply_sequences(
|
||||
result, kb.video_meta.get("sequences", []), "hdr", TokenRole.HDR
|
||||
)
|
||||
_apply_sequences(
|
||||
result, kb.editions.get("sequences", []), "edition", TokenRole.EDITION
|
||||
)
|
||||
|
||||
# Single tokens.
|
||||
known_audio_codecs = {c.upper() for c in kb.audio.get("codecs", [])}
|
||||
known_audio_channels = set(kb.audio.get("channels", []))
|
||||
known_hdr = {h.upper() for h in kb.video_meta.get("hdr", [])} | kb.hdr_extra
|
||||
known_bit_depth = {d.lower() for d in kb.video_meta.get("bit_depth", [])}
|
||||
known_editions = {t.upper() for t in kb.editions.get("tokens", [])}
|
||||
|
||||
# Channel layouts like "5.1" are tokenized as two tokens ("5", "1")
|
||||
# because "." is a separator. Detect consecutive pairs whose joined
|
||||
# value (without any trailing "-GROUP") is in the channel set.
|
||||
_detect_channel_pairs(result, known_audio_channels)
|
||||
|
||||
for i, tok in enumerate(result):
|
||||
if tok.role is not TokenRole.UNKNOWN:
|
||||
continue
|
||||
text = tok.text
|
||||
upper = text.upper()
|
||||
lower = text.lower()
|
||||
|
||||
if upper in known_audio_codecs:
|
||||
result[i] = tok.with_role(TokenRole.AUDIO_CODEC)
|
||||
continue
|
||||
if text in known_audio_channels:
|
||||
result[i] = tok.with_role(TokenRole.AUDIO_CHANNELS)
|
||||
continue
|
||||
if upper in known_hdr:
|
||||
result[i] = tok.with_role(TokenRole.HDR)
|
||||
continue
|
||||
if lower in known_bit_depth:
|
||||
result[i] = tok.with_role(TokenRole.BIT_DEPTH)
|
||||
continue
|
||||
if upper in known_editions:
|
||||
result[i] = tok.with_role(TokenRole.EDITION)
|
||||
continue
|
||||
if upper in kb.language_tokens:
|
||||
result[i] = tok.with_role(TokenRole.LANGUAGE)
|
||||
continue
|
||||
if upper in kb.distributors:
|
||||
result[i] = tok.with_role(TokenRole.DISTRIBUTOR)
|
||||
continue
|
||||
|
||||
return result
|
||||
|
||||
|
||||
def _apply_sequences(
|
||||
tokens: list[Token],
|
||||
sequences: list[dict],
|
||||
value_key: str,
|
||||
role: TokenRole,
|
||||
) -> None:
|
||||
"""Mark the first occurrence of each sequence in place.
|
||||
|
||||
Mutates ``tokens`` (replacing entries with new role-tagged Token
|
||||
instances). Sequences in the YAML must be ordered most-specific
|
||||
first; the first match wins per starting position.
|
||||
"""
|
||||
if not sequences:
|
||||
return
|
||||
|
||||
upper_texts = [t.text.upper() for t in tokens]
|
||||
consumed: set[int] = set()
|
||||
|
||||
for seq in sequences:
|
||||
seq_upper = [s.upper() for s in seq["tokens"]]
|
||||
n = len(seq_upper)
|
||||
for start in range(len(tokens) - n + 1):
|
||||
if any(idx in consumed for idx in range(start, start + n)):
|
||||
continue
|
||||
if any(
|
||||
tokens[start + k].role is not TokenRole.UNKNOWN for k in range(n)
|
||||
):
|
||||
continue
|
||||
if upper_texts[start : start + n] == seq_upper:
|
||||
tokens[start] = tokens[start].with_role(
|
||||
role, sequence=seq[value_key]
|
||||
)
|
||||
for k in range(1, n):
|
||||
tokens[start + k] = tokens[start + k].with_role(
|
||||
role, sequence_member="True"
|
||||
)
|
||||
consumed.update(range(start, start + n))
|
||||
|
||||
|
||||
def _detect_channel_pairs(
|
||||
tokens: list[Token], known_channels: set[str]
|
||||
) -> None:
|
||||
"""Spot two consecutive numeric tokens that form a channel layout.
|
||||
|
||||
Example: ``["5", "1-KTH"]`` → joined ``"5.1"`` (after stripping the
|
||||
``-GROUP`` suffix on the second). The second token may be the trailing
|
||||
codec-GROUP token, in which case it's already tagged CODEC and we
|
||||
skip — we'd corrupt its role.
|
||||
"""
|
||||
for i in range(len(tokens) - 1):
|
||||
first = tokens[i]
|
||||
second = tokens[i + 1]
|
||||
if first.role is not TokenRole.UNKNOWN:
|
||||
continue
|
||||
# Strip a "-GROUP" suffix on the second token before joining.
|
||||
second_text = second.text.split("-")[0]
|
||||
candidate = f"{first.text}.{second_text}"
|
||||
if candidate not in known_channels:
|
||||
continue
|
||||
# Only tag the first token (carries the channel value). The
|
||||
# second token may legitimately remain UNKNOWN (or be the
|
||||
# codec-GROUP token, already tagged CODEC).
|
||||
tokens[i] = first.with_role(
|
||||
TokenRole.AUDIO_CHANNELS, sequence=candidate
|
||||
)
|
||||
if second.role is TokenRole.UNKNOWN:
|
||||
tokens[i + 1] = second.with_role(
|
||||
TokenRole.AUDIO_CHANNELS, sequence_member="True"
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Stage 2 entry point
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def annotate(tokens: list[Token], kb: ReleaseKnowledge) -> list[Token]:
|
||||
"""Annotate token roles.
|
||||
|
||||
Dispatch:
|
||||
|
||||
* If a group is detected AND has a known schema, run the EASY
|
||||
structural walk. If the schema walk aborts on a mandatory chunk
|
||||
mismatch, fall through to SHITTY (the heuristic still does better
|
||||
than giving up).
|
||||
* Otherwise run SHITTY — schema-less, best-effort, never aborts.
|
||||
|
||||
The enricher pass runs in both cases. The pipeline always returns a
|
||||
populated token list; downstream callers don't need to distinguish
|
||||
EASY vs SHITTY at this layer (the parse_path is decided in the
|
||||
service based on whether a schema matched).
|
||||
"""
|
||||
group_name, group_index = _detect_group(tokens, kb)
|
||||
|
||||
schema = kb.group_schema(group_name) if group_index is not None else None
|
||||
if schema is not None and group_index is not None:
|
||||
structural = _annotate_structural(tokens, kb, schema, group_index)
|
||||
if structural is not None:
|
||||
return _annotate_enrichers(structural, kb)
|
||||
|
||||
# SHITTY fallback — heuristic positional pass. ``_annotate_shitty``
|
||||
# runs its own enricher pass internally (it has to, so the title
|
||||
# scan can skip enricher-tagged tokens).
|
||||
return _annotate_shitty(tokens, kb, group_index)
|
||||
|
||||
|
||||
def has_known_schema(tokens: list[Token], kb: ReleaseKnowledge) -> bool:
|
||||
"""Return True if ``tokens`` would take the EASY path in :func:`annotate`."""
|
||||
group_name, group_index = _detect_group(tokens, kb)
|
||||
if group_index is None:
|
||||
return False
|
||||
return kb.group_schema(group_name) is not None
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Stage 3 — assemble
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def assemble(
|
||||
annotated: list[Token],
|
||||
site_tag: str | None,
|
||||
raw_name: str,
|
||||
kb: ReleaseKnowledge,
|
||||
) -> dict:
|
||||
"""Fold annotated tokens into a ``ParsedRelease``-compatible dict.
|
||||
|
||||
Returns a dict (not a ``ParsedRelease`` instance) so the caller can
|
||||
layer in additional fields (``parse_path``, ``raw``, …) before
|
||||
instantiation.
|
||||
"""
|
||||
# Pure-punctuation tokens (e.g. a stray "-" left by ` - ` separators in
|
||||
# human-friendly release names) carry no title content and would leak
|
||||
# into the joined title as ``"Show.-.Episode"``. Drop them here.
|
||||
title_parts = [
|
||||
t.text
|
||||
for t in annotated
|
||||
if t.role is TokenRole.TITLE and any(c.isalnum() for c in t.text)
|
||||
]
|
||||
title = ".".join(title_parts) if title_parts else (
|
||||
annotated[0].text if annotated else raw_name
|
||||
)
|
||||
|
||||
year: int | None = None
|
||||
season: int | None = None
|
||||
episode: int | None = None
|
||||
episode_end: int | None = None
|
||||
quality: str | None = None
|
||||
source: str | None = None
|
||||
codec: str | None = None
|
||||
group = "UNKNOWN"
|
||||
audio_codec: str | None = None
|
||||
audio_channels: str | None = None
|
||||
bit_depth: str | None = None
|
||||
hdr_format: str | None = None
|
||||
edition: str | None = None
|
||||
distributor: str | None = None
|
||||
languages: list[str] = []
|
||||
is_season_range = False
|
||||
|
||||
for tok in annotated:
|
||||
# Skip non-primary members of a multi-token sequence.
|
||||
if tok.extra.get("sequence_member") == "True":
|
||||
continue
|
||||
|
||||
role = tok.role
|
||||
if role is TokenRole.YEAR:
|
||||
year = int(tok.text)
|
||||
elif role is TokenRole.SEASON_EPISODE:
|
||||
parsed = _parse_season_episode(tok.text)
|
||||
if parsed is not None:
|
||||
season, episode, episode_end = parsed
|
||||
# Detect Sxx-yy range form to flag it as a multi-season pack.
|
||||
upper = tok.text.upper()
|
||||
if (
|
||||
len(upper) == 6
|
||||
and upper[0] == "S"
|
||||
and upper[1:3].isdigit()
|
||||
and upper[3] == "-"
|
||||
and upper[4:6].isdigit()
|
||||
):
|
||||
is_season_range = True
|
||||
elif role is TokenRole.RESOLUTION:
|
||||
quality = tok.text
|
||||
elif role is TokenRole.SOURCE:
|
||||
source = tok.text
|
||||
elif role is TokenRole.CODEC:
|
||||
codec = tok.extra.get("codec", tok.text)
|
||||
if "group" in tok.extra:
|
||||
group = tok.extra["group"] or "UNKNOWN"
|
||||
elif role is TokenRole.GROUP:
|
||||
group = tok.extra.get("group", tok.text) or "UNKNOWN"
|
||||
elif role is TokenRole.AUDIO_CODEC:
|
||||
if audio_codec is None:
|
||||
audio_codec = tok.extra.get("sequence", tok.text)
|
||||
elif role is TokenRole.AUDIO_CHANNELS:
|
||||
if audio_channels is None:
|
||||
audio_channels = tok.extra.get("sequence", tok.text)
|
||||
elif role is TokenRole.BIT_DEPTH:
|
||||
if bit_depth is None:
|
||||
bit_depth = tok.text.lower()
|
||||
elif role is TokenRole.HDR:
|
||||
if hdr_format is None:
|
||||
hdr_format = tok.extra.get("sequence", tok.text.upper())
|
||||
elif role is TokenRole.EDITION:
|
||||
if edition is None:
|
||||
edition = tok.extra.get("sequence", tok.text.upper())
|
||||
elif role is TokenRole.LANGUAGE:
|
||||
languages.append(tok.text.upper())
|
||||
elif role is TokenRole.DISTRIBUTOR:
|
||||
if distributor is None:
|
||||
distributor = tok.text.upper()
|
||||
|
||||
# Media type heuristic. Doc/concert/integrale tokens win over the
|
||||
# generic tech-based fallback. We look across all tokens (not just
|
||||
# annotated ones) because these markers may be tagged UNKNOWN by the
|
||||
# structural pass — only the assemble step cares about them.
|
||||
upper_tokens = {tok.text.upper() for tok in annotated}
|
||||
doc_tokens = {t.upper() for t in kb.media_type_tokens.get("doc", [])}
|
||||
concert_tokens = {t.upper() for t in kb.media_type_tokens.get("concert", [])}
|
||||
integrale_tokens = {t.upper() for t in kb.media_type_tokens.get("integrale", [])}
|
||||
|
||||
if upper_tokens & doc_tokens:
|
||||
media_type = MediaTypeToken.DOCUMENTARY
|
||||
elif upper_tokens & concert_tokens:
|
||||
media_type = MediaTypeToken.CONCERT
|
||||
elif is_season_range:
|
||||
media_type = MediaTypeToken.TV_COMPLETE
|
||||
elif (
|
||||
edition in {"COMPLETE", "INTEGRALE", "COLLECTION"}
|
||||
or upper_tokens & integrale_tokens
|
||||
) and season is None:
|
||||
media_type = MediaTypeToken.TV_COMPLETE
|
||||
elif season is not None:
|
||||
media_type = MediaTypeToken.TV_SHOW
|
||||
elif any((quality, source, codec, year)):
|
||||
media_type = MediaTypeToken.MOVIE
|
||||
else:
|
||||
media_type = MediaTypeToken.UNKNOWN
|
||||
|
||||
return {
|
||||
"title": title,
|
||||
"title_sanitized": kb.sanitize_for_fs(title),
|
||||
"year": year,
|
||||
"season": season,
|
||||
"episode": episode,
|
||||
"episode_end": episode_end,
|
||||
"quality": quality,
|
||||
"source": source,
|
||||
"codec": codec,
|
||||
"group": group,
|
||||
"media_type": media_type,
|
||||
"site_tag": site_tag,
|
||||
"languages": tuple(languages),
|
||||
"audio_codec": audio_codec,
|
||||
"audio_channels": audio_channels,
|
||||
"bit_depth": bit_depth,
|
||||
"hdr_format": hdr_format,
|
||||
"edition": edition,
|
||||
"distributor": distributor,
|
||||
}
|
||||
@@ -1,47 +0,0 @@
|
||||
"""Group schema value objects.
|
||||
|
||||
A :class:`GroupSchema` describes the canonical chunk layout of releases
|
||||
from a known group (KONTRAST, RARBG, ELiTE, …). It is the EASY-road
|
||||
contract: when a release ends in ``-<GROUP>`` and we know the group,
|
||||
the annotator walks the schema instead of running the heuristic SHITTY
|
||||
matchers.
|
||||
|
||||
Schemas are loaded from ``knowledge/release/release_groups/<group>.yaml``
|
||||
by an infrastructure adapter and surfaced via the
|
||||
:class:`~alfred.domain.release.ports.knowledge.ReleaseKnowledge` port.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
|
||||
from .tokens import TokenRole
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class SchemaChunk:
|
||||
"""One entry in a group's chunk order.
|
||||
|
||||
``role`` is the :class:`TokenRole` the chunk maps to. ``optional``
|
||||
is True for chunks that may be absent (e.g. ``year`` on TV releases,
|
||||
``source`` on bare ELiTE TV releases).
|
||||
"""
|
||||
|
||||
role: TokenRole
|
||||
optional: bool = False
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class GroupSchema:
|
||||
"""Schema for a known release group.
|
||||
|
||||
``chunks`` is the left-to-right canonical order. The annotator walks
|
||||
tokens and chunks in lockstep: an optional chunk that doesn't match
|
||||
the current token is skipped (the chunk index advances, the token
|
||||
index stays), a mandatory chunk that doesn't match aborts the EASY
|
||||
path and falls back to SHITTY.
|
||||
"""
|
||||
|
||||
name: str
|
||||
separator: str
|
||||
chunks: tuple[SchemaChunk, ...]
|
||||
@@ -1,139 +0,0 @@
|
||||
"""Parse-confidence scoring.
|
||||
|
||||
``parse_release`` returns a :class:`ParseReport` alongside its
|
||||
:class:`ParsedRelease`. The report carries:
|
||||
|
||||
- ``confidence``: integer 0–100 derived from which structural and
|
||||
technical fields got populated, minus a penalty per UNKNOWN token
|
||||
left in the annotated stream.
|
||||
- ``road``: which of the three roads the parse took
|
||||
(:class:`Road.EASY` / :class:`Road.SHITTY` / :class:`Road.PATH_OF_PAIN`).
|
||||
- ``unknown_tokens``: textual residue, useful for diagnostics.
|
||||
- ``missing_critical``: structural fields the score-tally found absent
|
||||
(e.g. ``("year", "media_type")``) — the caller can use this to drive
|
||||
PoP recovery (questions, LLM call).
|
||||
|
||||
All weights, penalties and thresholds come from the injected knowledge
|
||||
base (``kb.scoring``), itself loaded from
|
||||
``alfred/knowledge/release/scoring.yaml``. No magic numbers here.
|
||||
|
||||
The scoring functions are pure — they consume the annotated token list
|
||||
and the resulting :class:`ParsedRelease` and return the report. They are
|
||||
called by ``services.parse_release`` after ``assemble`` has run.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from enum import Enum
|
||||
|
||||
from ..ports.knowledge import ReleaseKnowledge
|
||||
from ..value_objects import ParsedRelease
|
||||
from .tokens import Token, TokenRole
|
||||
|
||||
|
||||
class Road(str, Enum):
|
||||
"""How the parser handled a given release name.
|
||||
|
||||
Distinct from :class:`~alfred.domain.release.value_objects.TokenizationRoute`,
|
||||
which records the tokenization route (DIRECT / SANITIZED / AI). Road
|
||||
is about confidence in the *result*, not the *method*.
|
||||
"""
|
||||
|
||||
EASY = "easy" # group schema matched — structural annotation
|
||||
SHITTY = "shitty" # no schema, dict-driven annotation, score ≥ threshold
|
||||
PATH_OF_PAIN = "path_of_pain" # score below threshold, needs help
|
||||
|
||||
|
||||
# Critical structural fields — their absence drives the
|
||||
# ``missing_critical`` list in the report.
|
||||
_CRITICAL_FIELDS: tuple[str, ...] = ("title", "media_type", "year")
|
||||
|
||||
|
||||
def _is_tv_shaped(parsed: ParsedRelease) -> bool:
|
||||
"""Season/episode weights only count for releases that *look* like TV."""
|
||||
return parsed.season is not None
|
||||
|
||||
|
||||
def compute_score(
|
||||
parsed: ParsedRelease,
|
||||
annotated: list[Token],
|
||||
kb: ReleaseKnowledge,
|
||||
) -> int:
|
||||
"""Compute a 0–100 confidence score for the parse.
|
||||
|
||||
Each populated field contributes its weight from
|
||||
``kb.scoring["weights"]``. Season/episode only count when the parse
|
||||
looks like TV. ``group == "UNKNOWN"`` is treated as absent.
|
||||
|
||||
Then a penalty is subtracted per residual UNKNOWN token in
|
||||
``annotated``, capped at ``penalties["max_unknown_penalty"]``.
|
||||
|
||||
Result is clamped to ``[0, 100]``.
|
||||
"""
|
||||
weights = kb.scoring["weights"]
|
||||
penalties = kb.scoring["penalties"]
|
||||
|
||||
score = 0
|
||||
if parsed.title:
|
||||
score += weights.get("title", 0)
|
||||
if parsed.media_type and parsed.media_type.value != "unknown":
|
||||
score += weights.get("media_type", 0)
|
||||
if parsed.year is not None:
|
||||
score += weights.get("year", 0)
|
||||
if _is_tv_shaped(parsed):
|
||||
if parsed.season is not None:
|
||||
score += weights.get("season", 0)
|
||||
if parsed.episode is not None:
|
||||
score += weights.get("episode", 0)
|
||||
if parsed.quality:
|
||||
score += weights.get("resolution", 0)
|
||||
if parsed.source:
|
||||
score += weights.get("source", 0)
|
||||
if parsed.codec:
|
||||
score += weights.get("codec", 0)
|
||||
if parsed.group and parsed.group != "UNKNOWN":
|
||||
score += weights.get("group", 0)
|
||||
|
||||
unknown_count = sum(1 for t in annotated if t.role is TokenRole.UNKNOWN)
|
||||
raw_penalty = unknown_count * penalties.get("unknown_token", 0)
|
||||
capped_penalty = min(raw_penalty, penalties.get("max_unknown_penalty", 0))
|
||||
score -= capped_penalty
|
||||
|
||||
return max(0, min(100, score))
|
||||
|
||||
|
||||
def collect_unknown_tokens(annotated: list[Token]) -> tuple[str, ...]:
|
||||
"""Return the text of every token still tagged UNKNOWN."""
|
||||
return tuple(t.text for t in annotated if t.role is TokenRole.UNKNOWN)
|
||||
|
||||
|
||||
def collect_missing_critical(parsed: ParsedRelease) -> tuple[str, ...]:
|
||||
"""Return the names of critical structural fields that are absent."""
|
||||
missing: list[str] = []
|
||||
if not parsed.title:
|
||||
missing.append("title")
|
||||
if not parsed.media_type or parsed.media_type.value == "unknown":
|
||||
missing.append("media_type")
|
||||
if parsed.year is None:
|
||||
missing.append("year")
|
||||
return tuple(missing)
|
||||
|
||||
|
||||
def decide_road(
|
||||
score: int,
|
||||
has_schema: bool,
|
||||
kb: ReleaseKnowledge,
|
||||
) -> Road:
|
||||
"""Pick the road the parse took.
|
||||
|
||||
EASY is decided structurally: if a known group schema matched, the
|
||||
annotation walked the schema, and that's enough — the score does not
|
||||
veto EASY. Otherwise the score decides between SHITTY and
|
||||
PATH_OF_PAIN using ``kb.scoring["thresholds"]["shitty_min"]``.
|
||||
"""
|
||||
if has_schema:
|
||||
return Road.EASY
|
||||
threshold = kb.scoring["thresholds"].get("shitty_min", 60)
|
||||
if score >= threshold:
|
||||
return Road.SHITTY
|
||||
return Road.PATH_OF_PAIN
|
||||
@@ -1,90 +0,0 @@
|
||||
"""Token value objects for the annotate-based parser.
|
||||
|
||||
A :class:`Token` carries both the original substring and its position in
|
||||
the original release name's token stream. A :class:`TokenRole` is the
|
||||
semantic tag assigned by the annotator.
|
||||
|
||||
Why VOs instead of bare ``str``: the annotate step needs to flag tokens
|
||||
without consuming them (a token may carry residual info — e.g. a
|
||||
``codec-GROUP`` token contributes both a CODEC and a GROUP role). Tracking
|
||||
the index also lets later stages reason about *order* (year must come
|
||||
after title, group must be rightmost, etc.) without re-scanning the list.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from enum import Enum
|
||||
|
||||
|
||||
class TokenRole(str, Enum):
|
||||
"""Semantic role a token can take after annotation.
|
||||
|
||||
A token starts as ``UNKNOWN`` and may be promoted by the annotator.
|
||||
``str``-backed for cheap comparisons and YAML/JSON interop.
|
||||
|
||||
Roles split into three families:
|
||||
|
||||
- **structural**: TITLE / YEAR / SEASON_EPISODE / GROUP — drive folder
|
||||
and filename naming.
|
||||
- **technical**: RESOLUTION / SOURCE / CODEC / AUDIO_CODEC /
|
||||
AUDIO_CHANNELS / BIT_DEPTH / HDR / EDITION / LANGUAGE — feed
|
||||
``tech_string`` and metadata fields.
|
||||
- **meta**: SITE_TAG (stripped pre-tokenize), SEPARATOR (kept for the
|
||||
assemble step if a release uses spaces that need preservation in the
|
||||
title), UNKNOWN (residual, contributes to the SHITTY score penalty).
|
||||
"""
|
||||
|
||||
UNKNOWN = "unknown"
|
||||
|
||||
# Structural
|
||||
TITLE = "title"
|
||||
YEAR = "year"
|
||||
SEASON_EPISODE = "season_episode"
|
||||
GROUP = "group"
|
||||
|
||||
# Technical
|
||||
RESOLUTION = "resolution"
|
||||
SOURCE = "source"
|
||||
CODEC = "codec"
|
||||
AUDIO_CODEC = "audio_codec"
|
||||
AUDIO_CHANNELS = "audio_channels"
|
||||
BIT_DEPTH = "bit_depth"
|
||||
HDR = "hdr"
|
||||
EDITION = "edition"
|
||||
LANGUAGE = "language"
|
||||
DISTRIBUTOR = "distributor"
|
||||
|
||||
# Meta
|
||||
SITE_TAG = "site_tag"
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class Token:
|
||||
"""An atomic token from a release name.
|
||||
|
||||
``text`` is the substring exactly as it appeared after tokenization
|
||||
(case preserved — uppercase comparisons happen at match time).
|
||||
``index`` is the 0-based position in the tokenized stream, used by
|
||||
downstream stages to enforce ordering invariants.
|
||||
|
||||
``role`` defaults to :attr:`TokenRole.UNKNOWN`. The annotator returns
|
||||
new :class:`Token` instances with the role set rather than mutating
|
||||
(the dataclass is frozen). ``extra`` carries role-specific payload
|
||||
when the token text alone isn't enough (e.g. a ``codec-GROUP`` token
|
||||
annotated as CODEC may record the group name in ``extra["group"]``).
|
||||
"""
|
||||
|
||||
text: str
|
||||
index: int
|
||||
role: TokenRole = TokenRole.UNKNOWN
|
||||
extra: dict[str, str] = field(default_factory=dict)
|
||||
|
||||
def with_role(self, role: TokenRole, **extra: str) -> Token:
|
||||
"""Return a copy of this token with ``role`` (and optional ``extra``)."""
|
||||
merged = {**self.extra, **extra} if extra else self.extra
|
||||
return Token(text=self.text, index=self.index, role=role, extra=merged)
|
||||
|
||||
@property
|
||||
def is_annotated(self) -> bool:
|
||||
return self.role is not TokenRole.UNKNOWN
|
||||
@@ -1,10 +0,0 @@
|
||||
"""Domain ports for the release domain.
|
||||
|
||||
Protocol-based abstractions that decouple ``parse_release`` and
|
||||
``ParsedRelease`` from any concrete knowledge-base loader. The
|
||||
infrastructure layer provides the adapter that satisfies this contract.
|
||||
"""
|
||||
|
||||
from .knowledge import ReleaseKnowledge
|
||||
|
||||
__all__ = ["ReleaseKnowledge"]
|
||||
@@ -1,91 +0,0 @@
|
||||
"""ReleaseKnowledge port — the read-only query surface that
|
||||
``parse_release`` and ``ParsedRelease`` need from the release knowledge
|
||||
base, expressed as a structural Protocol so the domain never imports any
|
||||
concrete loader.
|
||||
|
||||
The concrete YAML-backed implementation lives in
|
||||
``alfred/infrastructure/knowledge/release_kb.py``. Tests can supply any
|
||||
object that satisfies this shape (e.g. a simple dataclass).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import TYPE_CHECKING, Protocol
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from ..parser.schema import GroupSchema
|
||||
|
||||
|
||||
class ReleaseKnowledge(Protocol):
|
||||
"""Read-only snapshot of release-name parsing knowledge."""
|
||||
|
||||
# --- Token sets used by the tokenizer / matchers ---
|
||||
|
||||
resolutions: set[str]
|
||||
sources: set[str]
|
||||
codecs: set[str]
|
||||
distributors: set[str]
|
||||
language_tokens: set[str]
|
||||
forbidden_chars: set[str]
|
||||
hdr_extra: set[str]
|
||||
|
||||
# --- Structured knowledge (loaded from YAML as dicts) ---
|
||||
|
||||
audio: dict
|
||||
video_meta: dict
|
||||
editions: dict
|
||||
media_type_tokens: dict
|
||||
|
||||
# --- Tokenizer separators ---
|
||||
|
||||
separators: list[str]
|
||||
|
||||
# --- Parse scoring (Phase A) ---
|
||||
#
|
||||
# ``scoring`` is a dict with three keys:
|
||||
# - ``weights``: dict[field_name, int] field weight contribution
|
||||
# - ``penalties``: {"unknown_token": int, "max_unknown_penalty": int}
|
||||
# - ``thresholds``: {"shitty_min": int} SHITTY vs PATH_OF_PAIN cutoff
|
||||
#
|
||||
# Concrete values come from ``alfred/knowledge/release/scoring.yaml``.
|
||||
# The loader fills in safe defaults so this dict is always populated.
|
||||
|
||||
scoring: dict
|
||||
|
||||
# --- ffprobe → scene-token translation tables (consumed by
|
||||
# ``application.release.enrich_from_probe``). Domain parsing itself
|
||||
# doesn't touch these — exposed on the same KB to keep release
|
||||
# knowledge in a single ownership point.
|
||||
#
|
||||
# Shape:
|
||||
# - ``video_codec``: dict[str, str] ffprobe lower → scene token
|
||||
# - ``audio_codec``: dict[str, str] ffprobe lower → scene token
|
||||
# - ``audio_channels``: dict[int, str] channel count → layout ---
|
||||
|
||||
probe_mappings: dict
|
||||
|
||||
# --- File-extension sets (used by application/infra modules that work
|
||||
# directly with filesystem paths, e.g. media-type detection, video
|
||||
# lookup). Domain parsing itself doesn't touch these. ---
|
||||
|
||||
video_extensions: set[str]
|
||||
non_video_extensions: set[str]
|
||||
subtitle_extensions: set[str]
|
||||
metadata_extensions: set[str]
|
||||
|
||||
# --- Filesystem sanitization (Option B: pre-sanitize at parse time) ---
|
||||
|
||||
def sanitize_for_fs(self, text: str) -> str:
|
||||
"""Strip filesystem-forbidden characters from ``text``."""
|
||||
...
|
||||
|
||||
# --- Release group schemas (EASY path) ---
|
||||
|
||||
def group_schema(self, name: str) -> GroupSchema | None:
|
||||
"""Return the parsing schema for the named release group, or
|
||||
``None`` if the group is unknown (caller falls back to SHITTY).
|
||||
|
||||
Lookup is case-insensitive: ``"KONTRAST"``, ``"kontrast"`` and
|
||||
``"Kontrast"`` all resolve to the same schema.
|
||||
"""
|
||||
...
|
||||
@@ -1,121 +0,0 @@
|
||||
"""Release domain — parsing service.
|
||||
|
||||
Thin orchestrator over the annotate-based pipeline in
|
||||
:mod:`alfred.domain.release.parser.pipeline`. Responsibilities:
|
||||
|
||||
* Strip a leading/trailing ``[site.tag]`` and decide ``parse_path``.
|
||||
* Reject malformed names (forbidden characters) → ``parse_path=AI`` so
|
||||
the LLM can clean them up.
|
||||
* Otherwise call the v2 pipeline (tokenize → annotate → assemble) and
|
||||
wrap the result in :class:`ParsedRelease`.
|
||||
* Score the result and decide the road (EASY / SHITTY / PATH_OF_PAIN)
|
||||
via :mod:`alfred.domain.release.parser.scoring`.
|
||||
|
||||
The public entry point is :func:`parse_release`, which returns
|
||||
``(ParsedRelease, ParseReport)``. The report carries the confidence
|
||||
score, the road, and diagnostic info for downstream callers.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from .parser import pipeline as _v2
|
||||
from .parser import scoring as _scoring
|
||||
from .ports import ReleaseKnowledge
|
||||
from .value_objects import MediaTypeToken, ParsedRelease, ParseReport, TokenizationRoute
|
||||
|
||||
|
||||
def parse_release(
|
||||
name: str, kb: ReleaseKnowledge
|
||||
) -> tuple[ParsedRelease, ParseReport]:
|
||||
"""Parse a release name.
|
||||
|
||||
Returns a tuple ``(ParsedRelease, ParseReport)``. The structural VO
|
||||
is unchanged from the previous single-return contract; the report
|
||||
is new and carries the confidence score + road decision.
|
||||
|
||||
Flow:
|
||||
|
||||
1. Strip a leading/trailing ``[site.tag]`` if present (sets
|
||||
``parse_path="sanitized"``).
|
||||
2. If the remainder still contains truly forbidden chars (anything
|
||||
not in the configured separators), short-circuit to
|
||||
``media_type="unknown"`` / ``parse_path="ai"`` and emit a
|
||||
PATH_OF_PAIN report — the LLM handles these.
|
||||
3. Otherwise run the v2 pipeline: tokenize → annotate (EASY when a
|
||||
group schema is known, SHITTY otherwise) → assemble → score.
|
||||
"""
|
||||
parse_path = TokenizationRoute.DIRECT
|
||||
|
||||
# Apostrophes inside titles ("Don't", "L'avare") are common and should
|
||||
# not push the release through the AI fallback. Strip them up front so
|
||||
# both strip_site_tag and tokenize see "Dont" / "Lavare", which is good
|
||||
# enough for token-level matching. The raw name is preserved on the VO.
|
||||
working_name = name
|
||||
if "'" in working_name:
|
||||
working_name = working_name.replace("'", "")
|
||||
parse_path = TokenizationRoute.SANITIZED
|
||||
|
||||
clean, site_tag = _v2.strip_site_tag(working_name)
|
||||
if site_tag is not None:
|
||||
parse_path = TokenizationRoute.SANITIZED
|
||||
|
||||
if not _is_well_formed(clean, kb):
|
||||
parsed = ParsedRelease(
|
||||
raw=name,
|
||||
clean=clean,
|
||||
title=clean,
|
||||
title_sanitized=kb.sanitize_for_fs(clean),
|
||||
year=None,
|
||||
season=None,
|
||||
episode=None,
|
||||
episode_end=None,
|
||||
quality=None,
|
||||
source=None,
|
||||
codec=None,
|
||||
group="UNKNOWN",
|
||||
media_type=MediaTypeToken.UNKNOWN,
|
||||
site_tag=site_tag,
|
||||
parse_path=TokenizationRoute.AI,
|
||||
)
|
||||
report = ParseReport(
|
||||
confidence=0,
|
||||
road=_scoring.Road.PATH_OF_PAIN.value,
|
||||
unknown_tokens=(clean,),
|
||||
missing_critical=("title", "media_type", "year"),
|
||||
)
|
||||
return parsed, report
|
||||
|
||||
tokens, v2_tag = _v2.tokenize(working_name, kb)
|
||||
annotated = _v2.annotate(tokens, kb)
|
||||
fields = _v2.assemble(annotated, v2_tag, name, kb)
|
||||
|
||||
parsed = ParsedRelease(
|
||||
raw=name,
|
||||
clean=clean,
|
||||
parse_path=parse_path,
|
||||
**fields,
|
||||
)
|
||||
|
||||
has_schema = _v2.has_known_schema(tokens, kb)
|
||||
score = _scoring.compute_score(parsed, annotated, kb)
|
||||
road = _scoring.decide_road(score, has_schema, kb)
|
||||
report = ParseReport(
|
||||
confidence=score,
|
||||
road=road.value,
|
||||
unknown_tokens=_scoring.collect_unknown_tokens(annotated),
|
||||
missing_critical=_scoring.collect_missing_critical(parsed),
|
||||
)
|
||||
return parsed, report
|
||||
|
||||
|
||||
def _is_well_formed(name: str, kb: ReleaseKnowledge) -> bool:
|
||||
"""Return True if ``name`` contains no forbidden characters per scene
|
||||
naming rules.
|
||||
|
||||
Characters listed as token separators (spaces, brackets, parens, …)
|
||||
are NOT considered malforming — the tokenizer handles them. Only
|
||||
truly broken chars like ``@``, ``#``, ``!``, ``%`` make a name
|
||||
malformed.
|
||||
"""
|
||||
tokenizable = set(kb.separators)
|
||||
return not any(c in name for c in kb.forbidden_chars if c not in tokenizable)
|
||||
@@ -1,271 +0,0 @@
|
||||
"""Release domain — value objects.
|
||||
|
||||
This module is **pure**: no I/O, no YAML loading, no knowledge-base
|
||||
imports. All knowledge that the parser consumes is injected at runtime
|
||||
via the ``ReleaseKnowledge`` port (see ``ports/knowledge.py``).
|
||||
|
||||
``ParsedRelease`` follows Option B of the snapshot-VO design: filesystem
|
||||
sanitization is performed once at parse time and stored in
|
||||
``title_sanitized``. The builder methods (``show_folder_name``,
|
||||
``episode_filename``, etc.) are therefore pure string-formatting and do
|
||||
**not** need access to any knowledge base — but they require the caller
|
||||
to pass already-sanitized TMDB strings. The use case is responsible for
|
||||
calling ``kb.sanitize_for_fs(tmdb_title)`` before invoking the builders.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from enum import Enum
|
||||
|
||||
from ..shared.exceptions import ValidationError
|
||||
|
||||
|
||||
class MediaTypeToken(str, Enum):
|
||||
"""
|
||||
Canonical values for ``ParsedRelease.media_type``.
|
||||
|
||||
Inherits from ``str`` so existing string-based comparisons (``== "movie"``,
|
||||
JSON serialization, TMDB DTO interop) keep working unchanged. The enum
|
||||
serves both as documentation and as the set of valid values for
|
||||
``__post_init__`` validation.
|
||||
"""
|
||||
|
||||
MOVIE = "movie"
|
||||
TV_SHOW = "tv_show"
|
||||
TV_COMPLETE = "tv_complete"
|
||||
DOCUMENTARY = "documentary"
|
||||
CONCERT = "concert"
|
||||
OTHER = "other"
|
||||
UNKNOWN = "unknown"
|
||||
|
||||
|
||||
class TokenizationRoute(str, Enum):
|
||||
"""How a ``ParsedRelease`` was produced.
|
||||
|
||||
Records the **tokenization route** — i.e. whether the release name
|
||||
was tokenized as-is (``DIRECT``), after a sanitization pass like
|
||||
site-tag stripping or apostrophe removal (``SANITIZED``), or whether
|
||||
structural parsing failed and an LLM rebuild is needed (``AI``).
|
||||
|
||||
This is **orthogonal** to :class:`~alfred.domain.release.parser.scoring.Road`
|
||||
(EASY / SHITTY / PATH_OF_PAIN), which captures parser confidence and
|
||||
is recorded on :class:`ParseReport`. Both can vary independently —
|
||||
a SANITIZED name can still land on the EASY road if a group schema
|
||||
matches the tokens after stripping.
|
||||
|
||||
``str``-backed for the same reasons as :class:`MediaTypeToken`."""
|
||||
|
||||
DIRECT = "direct"
|
||||
SANITIZED = "sanitized"
|
||||
AI = "ai"
|
||||
|
||||
|
||||
def _strip_episode_from_normalized(normalized: str) -> str:
|
||||
"""
|
||||
Remove all episode parts (Exx) from a normalized release name, keeping Sxx.
|
||||
|
||||
Oz.S03E01.1080p... → Oz.S03.1080p...
|
||||
Archer.S14E09E10E11.1080p... → Archer.S14.1080p...
|
||||
"""
|
||||
tokens = normalized.split(".")
|
||||
result = []
|
||||
for tok in tokens:
|
||||
upper = tok.upper()
|
||||
# Token is SxxExx... — keep only the Sxx part
|
||||
if len(upper) >= 3 and upper[0] == "S" and upper[1:3].isdigit():
|
||||
result.append(tok[:3]) # "S" + two digits
|
||||
else:
|
||||
result.append(tok)
|
||||
return ".".join(result)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ParseReport:
|
||||
"""Diagnostic report attached to a :class:`ParsedRelease`.
|
||||
|
||||
``parse_release`` returns ``(ParsedRelease, ParseReport)``. The
|
||||
report describes *how confident* the parser is in the result and
|
||||
*which road* produced it. It is intentionally separate from
|
||||
``ParsedRelease`` so the structural VO stays free of meta-concerns
|
||||
about its own quality.
|
||||
|
||||
Fields:
|
||||
|
||||
- ``confidence``: integer 0–100 (see :func:`parser.scoring.compute_score`).
|
||||
- ``road``: ``"easy"`` / ``"shitty"`` / ``"path_of_pain"`` — distinct
|
||||
from ``ParsedRelease.parse_path`` (which describes the
|
||||
tokenization route, not the confidence tier).
|
||||
- ``unknown_tokens``: tokens that finished annotation with role
|
||||
UNKNOWN, in order of appearance.
|
||||
- ``missing_critical``: names of critical structural fields the
|
||||
parser couldn't fill (subset of ``{"title", "media_type", "year"}``).
|
||||
"""
|
||||
|
||||
confidence: int
|
||||
road: str # one of parser.scoring.Road values
|
||||
unknown_tokens: tuple[str, ...] = ()
|
||||
missing_critical: tuple[str, ...] = ()
|
||||
|
||||
def __post_init__(self) -> None:
|
||||
if not (0 <= self.confidence <= 100):
|
||||
raise ValidationError(
|
||||
f"ParseReport.confidence out of range: {self.confidence}"
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ParsedRelease:
|
||||
"""Structured representation of a parsed release name.
|
||||
|
||||
``title_sanitized`` carries the filesystem-safe form of ``title`` (computed
|
||||
by the parser at construction time using the injected knowledge base).
|
||||
Builder methods rely on it being already-sanitized — see module docstring.
|
||||
|
||||
Frozen: enrichment passes (``detect_media_type``, ``enrich_from_probe``)
|
||||
return a **new** ``ParsedRelease`` via ``dataclasses.replace`` rather
|
||||
than mutating in place. ``languages`` is a tuple for the same reason.
|
||||
"""
|
||||
|
||||
raw: str # original release name (untouched)
|
||||
clean: str # raw minus site_tag and apostrophes — used by season_folder_name()
|
||||
title: str # show/movie title (dots, no year/season/tech)
|
||||
title_sanitized: str # title with filesystem-forbidden chars stripped
|
||||
year: int | None # movie year or show start year (from TMDB)
|
||||
season: int | None # season number (None for movies)
|
||||
episode: int | None # first episode number (None if season-pack)
|
||||
episode_end: int | None # last episode for multi-ep (None otherwise)
|
||||
quality: str | None # 1080p, 2160p, …
|
||||
source: str | None # WEBRip, BluRay, …
|
||||
codec: str | None # x265, HEVC, …
|
||||
group: str # release group, "UNKNOWN" if missing
|
||||
media_type: MediaTypeToken = MediaTypeToken.UNKNOWN
|
||||
site_tag: str | None = (
|
||||
None # site watermark stripped from name, e.g. "TGx", "OxTorrent.vc"
|
||||
)
|
||||
parse_path: TokenizationRoute = TokenizationRoute.DIRECT
|
||||
languages: tuple[str, ...] = () # ("MULTI", "VFF"), ("FRENCH",), …
|
||||
audio_codec: str | None = None # "DTS-HD.MA", "DDP", "EAC3", …
|
||||
audio_channels: str | None = None # "5.1", "7.1", "2.0", …
|
||||
bit_depth: str | None = None # "10bit", "8bit", …
|
||||
hdr_format: str | None = None # "DV", "HDR10", "DV.HDR10", …
|
||||
edition: str | None = None # "UNRATED", "EXTENDED", "DIRECTORS.CUT", …
|
||||
distributor: str | None = None # "NF", "AMZN", "DSNP", … (streaming origin)
|
||||
|
||||
def __post_init__(self) -> None:
|
||||
if not self.raw:
|
||||
raise ValidationError("ParsedRelease.raw cannot be empty")
|
||||
if not self.group:
|
||||
raise ValidationError("ParsedRelease.group cannot be empty")
|
||||
if self.year is not None and not (1888 <= self.year <= 2100):
|
||||
raise ValidationError(
|
||||
f"ParsedRelease.year out of range: {self.year}"
|
||||
)
|
||||
if self.season is not None and not (0 <= self.season <= 100):
|
||||
raise ValidationError(
|
||||
f"ParsedRelease.season out of range: {self.season}"
|
||||
)
|
||||
if self.episode is not None and not (0 <= self.episode <= 9999):
|
||||
raise ValidationError(
|
||||
f"ParsedRelease.episode out of range: {self.episode}"
|
||||
)
|
||||
if self.episode_end is not None:
|
||||
if not (0 <= self.episode_end <= 9999):
|
||||
raise ValidationError(
|
||||
f"ParsedRelease.episode_end out of range: {self.episode_end}"
|
||||
)
|
||||
if self.episode is not None and self.episode_end < self.episode:
|
||||
raise ValidationError(
|
||||
f"ParsedRelease.episode_end ({self.episode_end}) < "
|
||||
f"episode ({self.episode})"
|
||||
)
|
||||
if not isinstance(self.media_type, MediaTypeToken):
|
||||
raise ValidationError(
|
||||
f"ParsedRelease.media_type must be a MediaTypeToken, "
|
||||
f"got {type(self.media_type).__name__}: {self.media_type!r}"
|
||||
)
|
||||
if not isinstance(self.parse_path, TokenizationRoute):
|
||||
raise ValidationError(
|
||||
f"ParsedRelease.parse_path must be a TokenizationRoute, "
|
||||
f"got {type(self.parse_path).__name__}: {self.parse_path!r}"
|
||||
)
|
||||
|
||||
@property
|
||||
def is_season_pack(self) -> bool:
|
||||
return self.season is not None and self.episode is None
|
||||
|
||||
@property
|
||||
def tech_string(self) -> str:
|
||||
"""``quality.source.codec`` joined by dots, skipping ``None`` parts.
|
||||
|
||||
Derived on every access so it stays in sync with the underlying
|
||||
fields — no manual refresh needed after enrichment.
|
||||
"""
|
||||
return ".".join(p for p in (self.quality, self.source, self.codec) if p)
|
||||
|
||||
def show_folder_name(self, tmdb_title_safe: str, tmdb_year: int) -> str:
|
||||
"""
|
||||
Build the series root folder name.
|
||||
|
||||
Format: {Title}.{Year}.{Tech}-{Group}
|
||||
Example: Oz.1997.1080p.WEBRip.x265-KONTRAST
|
||||
|
||||
``tmdb_title_safe`` must already be filesystem-safe (the caller is
|
||||
expected to have run it through ``kb.sanitize_for_fs``).
|
||||
"""
|
||||
title_part = tmdb_title_safe.replace(" ", ".")
|
||||
tech = self.tech_string or "Unknown"
|
||||
return f"{title_part}.{tmdb_year}.{tech}-{self.group}"
|
||||
|
||||
def season_folder_name(self) -> str:
|
||||
"""
|
||||
Build the season subfolder name = normalized release name (no episode).
|
||||
|
||||
Example: Oz.S03.1080p.WEBRip.x265-KONTRAST
|
||||
For a single-episode release we still strip the episode token so the
|
||||
folder can hold the whole season.
|
||||
"""
|
||||
return _strip_episode_from_normalized(self.clean)
|
||||
|
||||
def episode_filename(self, tmdb_episode_title_safe: str | None, ext: str) -> str:
|
||||
"""
|
||||
Build the episode filename.
|
||||
|
||||
Format: {Title}.{SxxExx}.{EpisodeTitle}.{Tech}-{Group}.{ext}
|
||||
Example: Oz.S01E01.The.Routine.1080p.WEBRip.x265-KONTRAST.mkv
|
||||
|
||||
``tmdb_episode_title_safe`` must already be filesystem-safe; pass
|
||||
``None`` to omit the episode title segment.
|
||||
"""
|
||||
title_part = self.title_sanitized
|
||||
s = f"S{self.season:02d}" if self.season is not None else ""
|
||||
e = f"E{self.episode:02d}" if self.episode is not None else ""
|
||||
se = s + e
|
||||
|
||||
ep_title = ""
|
||||
if tmdb_episode_title_safe:
|
||||
ep_title = "." + tmdb_episode_title_safe.replace(" ", ".")
|
||||
|
||||
tech = self.tech_string or "Unknown"
|
||||
ext_clean = ext.lstrip(".")
|
||||
return f"{title_part}.{se}{ep_title}.{tech}-{self.group}.{ext_clean}"
|
||||
|
||||
def movie_folder_name(self, tmdb_title_safe: str, tmdb_year: int) -> str:
|
||||
"""
|
||||
Build the movie folder name.
|
||||
|
||||
Format: {Title}.{Year}.{Tech}-{Group}
|
||||
Example: Inception.2010.1080p.BluRay.x265-GROUP
|
||||
"""
|
||||
return self.show_folder_name(tmdb_title_safe, tmdb_year)
|
||||
|
||||
def movie_filename(
|
||||
self, tmdb_title_safe: str, tmdb_year: int, ext: str
|
||||
) -> str:
|
||||
"""
|
||||
Build the movie filename (same as folder name + extension).
|
||||
|
||||
Example: Inception.2010.1080p.BluRay.x265-GROUP.mkv
|
||||
"""
|
||||
ext_clean = ext.lstrip(".")
|
||||
return f"{self.movie_folder_name(tmdb_title_safe, tmdb_year)}.{ext_clean}"
|
||||
@@ -1,7 +1,7 @@
|
||||
"""Shared kernel - Common domain concepts used across subdomains."""
|
||||
|
||||
from .exceptions import DomainException, ValidationError
|
||||
from .value_objects import FilePath, FileSize, ImdbId, Language
|
||||
from .value_objects import FilePath, FileSize, ImdbId
|
||||
|
||||
__all__ = [
|
||||
"DomainException",
|
||||
@@ -9,5 +9,4 @@ __all__ = [
|
||||
"ImdbId",
|
||||
"FilePath",
|
||||
"FileSize",
|
||||
"Language",
|
||||
]
|
||||
|
||||
@@ -1,267 +0,0 @@
|
||||
"""Media — file-level track types (video/audio/subtitle) and MediaInfo container.
|
||||
|
||||
These are the **container-view** dataclasses, populated from ffprobe output and
|
||||
used across the project to describe the content of a media file.
|
||||
|
||||
Not to be confused with ``alfred.domain.subtitles.entities.SubtitleScanResult``
|
||||
which models a subtitle being **scanned/matched** (with confidence, raw tokens,
|
||||
file path, etc.). The two coexist by design — they describe the same real-world
|
||||
concept seen from two different bounded contexts.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
|
||||
from .value_objects import Language
|
||||
|
||||
__all__ = [
|
||||
"AudioTrack",
|
||||
"MediaInfo",
|
||||
"MediaWithTracks",
|
||||
"SubtitleTrack",
|
||||
"VideoTrack",
|
||||
"track_lang_matches",
|
||||
]
|
||||
|
||||
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
# Track types — one frozen dataclass per stream kind
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class AudioTrack:
|
||||
"""A single audio track as reported by ffprobe."""
|
||||
|
||||
index: int
|
||||
codec: str | None # aac, ac3, eac3, dts, truehd, flac, …
|
||||
channels: int | None # 2, 6 (5.1), 8 (7.1), …
|
||||
channel_layout: str | None # stereo, 5.1, 7.1, …
|
||||
language: str | None # ISO 639-2: fre, eng, und, …
|
||||
is_default: bool = False
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class SubtitleTrack:
|
||||
"""A single embedded subtitle track as reported by ffprobe."""
|
||||
|
||||
index: int
|
||||
codec: str | None # subrip, ass, hdmv_pgs_subtitle, …
|
||||
language: str | None # ISO 639-2: fre, eng, und, …
|
||||
is_default: bool = False
|
||||
is_forced: bool = False
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class VideoTrack:
|
||||
"""A single video track as reported by ffprobe.
|
||||
|
||||
A media file typically has one video track but can have several (alt
|
||||
camera angles, attached thumbnail images reported as still-image streams,
|
||||
etc.), hence the list[VideoTrack] on MediaInfo.
|
||||
"""
|
||||
|
||||
index: int
|
||||
codec: str | None # h264, hevc, av1, …
|
||||
width: int | None
|
||||
height: int | None
|
||||
is_default: bool = False
|
||||
|
||||
@property
|
||||
def resolution(self) -> str | None:
|
||||
"""
|
||||
Best-effort resolution string: 2160p, 1080p, 720p, …
|
||||
|
||||
Width takes priority over height to handle widescreen/cinema crops
|
||||
(e.g. 1920×960 scope → 1080p, not 720p). Falls back to height when
|
||||
width is unavailable.
|
||||
"""
|
||||
match (self.width, self.height):
|
||||
case (None, None):
|
||||
return None
|
||||
case (w, h) if w is not None:
|
||||
match True:
|
||||
case _ if w >= 3840:
|
||||
return "2160p"
|
||||
case _ if w >= 1920:
|
||||
return "1080p"
|
||||
case _ if w >= 1280:
|
||||
return "720p"
|
||||
case _ if w >= 720:
|
||||
return "576p"
|
||||
case _ if w >= 640:
|
||||
return "480p"
|
||||
case _:
|
||||
return f"{h}p" if h else f"{w}w"
|
||||
case (None, h):
|
||||
match True:
|
||||
case _ if h >= 2160:
|
||||
return "2160p"
|
||||
case _ if h >= 1080:
|
||||
return "1080p"
|
||||
case _ if h >= 720:
|
||||
return "720p"
|
||||
case _ if h >= 576:
|
||||
return "576p"
|
||||
case _ if h >= 480:
|
||||
return "480p"
|
||||
case _:
|
||||
return f"{h}p"
|
||||
|
||||
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
# MediaInfo — assembles video/audio/subtitle tracks for a media file
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class MediaInfo:
|
||||
"""
|
||||
File-level media metadata extracted by ffprobe — immutable snapshot.
|
||||
|
||||
Symmetric design: every stream type is a tuple of typed track objects
|
||||
(immutable on purpose — a MediaInfo is a frozen view of one ffprobe run,
|
||||
not a mutable collection to append to).
|
||||
Backwards-compatible flat accessors (``resolution``, ``width``, …) read
|
||||
from the first video track when present.
|
||||
"""
|
||||
|
||||
video_tracks: tuple[VideoTrack, ...] = field(default_factory=tuple)
|
||||
audio_tracks: tuple[AudioTrack, ...] = field(default_factory=tuple)
|
||||
subtitle_tracks: tuple[SubtitleTrack, ...] = field(default_factory=tuple)
|
||||
|
||||
# File-level (from ffprobe ``format`` block, not from any single stream)
|
||||
duration_seconds: float | None = None
|
||||
bitrate_kbps: int | None = None
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────────
|
||||
# Video conveniences — read the first video track
|
||||
# ──────────────────────────────────────────────────────────────────────
|
||||
|
||||
@property
|
||||
def primary_video(self) -> VideoTrack | None:
|
||||
return self.video_tracks[0] if self.video_tracks else None
|
||||
|
||||
@property
|
||||
def width(self) -> int | None:
|
||||
v = self.primary_video
|
||||
return v.width if v else None
|
||||
|
||||
@property
|
||||
def height(self) -> int | None:
|
||||
v = self.primary_video
|
||||
return v.height if v else None
|
||||
|
||||
@property
|
||||
def video_codec(self) -> str | None:
|
||||
v = self.primary_video
|
||||
return v.codec if v else None
|
||||
|
||||
@property
|
||||
def resolution(self) -> str | None:
|
||||
v = self.primary_video
|
||||
return v.resolution if v else None
|
||||
|
||||
# ──────────────────────────────────────────────────────────────────────
|
||||
# Audio conveniences
|
||||
# ──────────────────────────────────────────────────────────────────────
|
||||
|
||||
@property
|
||||
def audio_languages(self) -> list[str]:
|
||||
"""Unique audio languages across all tracks (ISO 639-2)."""
|
||||
seen: set[str] = set()
|
||||
result: list[str] = []
|
||||
for track in self.audio_tracks:
|
||||
if track.language and track.language not in seen:
|
||||
seen.add(track.language)
|
||||
result.append(track.language)
|
||||
return result
|
||||
|
||||
@property
|
||||
def is_multi_audio(self) -> bool:
|
||||
"""True if more than one audio language is present."""
|
||||
return len(self.audio_languages) > 1
|
||||
|
||||
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
# Language matching — shared helper + mixin
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def track_lang_matches(track_lang: str | None, query: str | Language) -> bool:
|
||||
"""
|
||||
Match a track's language string against a query (contract "C+").
|
||||
|
||||
* ``Language`` query → matches if the track string is any known
|
||||
representation of that Language (delegates to ``Language.matches``).
|
||||
Powerful, cross-format mode.
|
||||
* ``str`` query → case-insensitive direct comparison against
|
||||
``track_lang``. Simple, no normalization, no registry lookup.
|
||||
|
||||
Callers needing cross-format resolution (``"fr"`` ↔ ``"fre"`` ↔
|
||||
``"french"``) should resolve their string through a ``LanguageRegistry``
|
||||
once and pass the resulting ``Language``.
|
||||
"""
|
||||
if track_lang is None:
|
||||
return False
|
||||
if isinstance(query, Language):
|
||||
return query.matches(track_lang)
|
||||
if isinstance(query, str):
|
||||
return track_lang.lower().strip() == query.lower().strip()
|
||||
return False
|
||||
|
||||
|
||||
class MediaWithTracks:
|
||||
"""
|
||||
Mixin providing audio/subtitle helpers for entities with track collections.
|
||||
|
||||
Hosts must expose two attributes:
|
||||
|
||||
* ``audio_tracks: tuple[AudioTrack, ...]``
|
||||
* ``subtitle_tracks: tuple[SubtitleTrack, ...]``
|
||||
|
||||
The helpers follow the "C+" matching contract: pass a :class:`Language`
|
||||
for cross-format matching, or a ``str`` for case-insensitive comparison.
|
||||
"""
|
||||
|
||||
# These attributes are provided by the host entity (Movie, Episode, …).
|
||||
# Declared here only for type-checkers and to make the contract explicit.
|
||||
audio_tracks: tuple[AudioTrack, ...]
|
||||
subtitle_tracks: tuple[SubtitleTrack, ...]
|
||||
|
||||
# ── Audio helpers ──────────────────────────────────────────────────────
|
||||
|
||||
def has_audio_in(self, lang: str | Language) -> bool:
|
||||
"""True if at least one audio track is in the given language."""
|
||||
return any(track_lang_matches(t.language, lang) for t in self.audio_tracks)
|
||||
|
||||
def audio_languages(self) -> list[str]:
|
||||
"""Unique audio languages across all tracks, in track order."""
|
||||
seen: set[str] = set()
|
||||
result: list[str] = []
|
||||
for t in self.audio_tracks:
|
||||
if t.language and t.language not in seen:
|
||||
seen.add(t.language)
|
||||
result.append(t.language)
|
||||
return result
|
||||
|
||||
# ── Subtitle helpers ───────────────────────────────────────────────────
|
||||
|
||||
def has_subtitles_in(self, lang: str | Language) -> bool:
|
||||
"""True if at least one subtitle track is in the given language."""
|
||||
return any(track_lang_matches(t.language, lang) for t in self.subtitle_tracks)
|
||||
|
||||
def has_forced_subs(self) -> bool:
|
||||
"""True if at least one subtitle track is flagged as forced."""
|
||||
return any(t.is_forced for t in self.subtitle_tracks)
|
||||
|
||||
def subtitle_languages(self) -> list[str]:
|
||||
"""Unique subtitle languages across all tracks, in track order."""
|
||||
seen: set[str] = set()
|
||||
result: list[str] = []
|
||||
for t in self.subtitle_tracks:
|
||||
if t.language and t.language not in seen:
|
||||
seen.add(t.language)
|
||||
result.append(t.language)
|
||||
return result
|
||||
@@ -1,19 +0,0 @@
|
||||
"""Ports — Protocol interfaces the domain depends on.
|
||||
|
||||
Adapters live in ``alfred/infrastructure/`` and implement these protocols.
|
||||
Domain code never imports infrastructure; it accepts a port via constructor
|
||||
injection and calls it. Tests can pass in-memory fakes that satisfy the
|
||||
Protocol without going through real I/O.
|
||||
"""
|
||||
|
||||
from .filesystem_scanner import FileEntry, FilesystemScanner
|
||||
from .language_repository import LanguageRepository
|
||||
from .media_prober import MediaProber, SubtitleStreamInfo
|
||||
|
||||
__all__ = [
|
||||
"FileEntry",
|
||||
"FilesystemScanner",
|
||||
"LanguageRepository",
|
||||
"MediaProber",
|
||||
"SubtitleStreamInfo",
|
||||
]
|
||||
@@ -1,59 +0,0 @@
|
||||
"""FilesystemScanner port — abstracts filesystem inspection.
|
||||
|
||||
The domain never calls ``Path.iterdir``, ``Path.is_file``, ``Path.stat`` or
|
||||
``open()`` directly. It asks the scanner for a ``FileEntry`` snapshot and
|
||||
reasons from there. One scan = one I/O round-trip; no callbacks back to disk.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
from typing import Protocol
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class FileEntry:
|
||||
"""Frozen snapshot of one filesystem entry, taken at scan time.
|
||||
|
||||
The entry carries enough metadata for the domain to classify and order
|
||||
files without re-querying the OS. ``size_kb`` is ``None`` for directories
|
||||
and for files whose size could not be read.
|
||||
"""
|
||||
|
||||
path: Path
|
||||
is_file: bool
|
||||
is_dir: bool
|
||||
size_kb: float | None
|
||||
|
||||
@property
|
||||
def name(self) -> str:
|
||||
return self.path.name
|
||||
|
||||
@property
|
||||
def stem(self) -> str:
|
||||
return self.path.stem
|
||||
|
||||
@property
|
||||
def suffix(self) -> str:
|
||||
return self.path.suffix
|
||||
|
||||
|
||||
class FilesystemScanner(Protocol):
|
||||
"""Read-only filesystem inspection."""
|
||||
|
||||
def scan_dir(self, path: Path) -> list[FileEntry]:
|
||||
"""Return sorted entries directly inside ``path``.
|
||||
|
||||
Returns an empty list when ``path`` is not a directory or is
|
||||
unreadable. Adapters must not raise.
|
||||
"""
|
||||
...
|
||||
|
||||
def stat(self, path: Path) -> FileEntry | None:
|
||||
"""Stat a single path; ``None`` when it doesn't exist or is unreadable."""
|
||||
...
|
||||
|
||||
def read_text(self, path: Path, encoding: str = "utf-8") -> str | None:
|
||||
"""Read a text file in one go; ``None`` on any error."""
|
||||
...
|
||||
@@ -1,36 +0,0 @@
|
||||
"""LanguageRepository port — abstracts canonical language lookup.
|
||||
|
||||
The adapter (typically loading from ISO 639 YAML knowledge) maps a wide
|
||||
range of raw forms (codes, English/native names, aliases) onto the
|
||||
canonical :class:`Language` value object. Domain code accepts the port
|
||||
via constructor injection; tests can pass a small in-memory fake.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Protocol
|
||||
|
||||
from alfred.domain.shared.value_objects import Language
|
||||
|
||||
|
||||
class LanguageRepository(Protocol):
|
||||
"""Canonical language lookup."""
|
||||
|
||||
def from_iso(self, code: str) -> Language | None:
|
||||
"""Look up by canonical ISO 639-2/B code (case-insensitive)."""
|
||||
...
|
||||
|
||||
def from_any(self, raw: str) -> Language | None:
|
||||
"""Look up by any known representation: ISO code, name, alias.
|
||||
|
||||
Case-insensitive. Returns ``None`` when the raw form is unknown.
|
||||
"""
|
||||
...
|
||||
|
||||
def all(self) -> list[Language]:
|
||||
"""Return all known languages, in a stable order."""
|
||||
...
|
||||
|
||||
def __contains__(self, raw: str) -> bool: ...
|
||||
|
||||
def __len__(self) -> int: ...
|
||||
@@ -1,52 +0,0 @@
|
||||
"""MediaProber port — abstracts media stream inspection (e.g. ffprobe).
|
||||
|
||||
The adapter (typically wrapping ffprobe) maps low-level container metadata
|
||||
into the small set of stream attributes the domain reasons about. Replacing
|
||||
ffprobe with another tool only requires a new adapter — domain stays put.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
from typing import TYPE_CHECKING, Protocol
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from alfred.domain.shared.media import MediaInfo
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class SubtitleStreamInfo:
|
||||
"""A single embedded subtitle stream, as seen by the prober.
|
||||
|
||||
``language`` is the raw language tag emitted by the container (typically
|
||||
ISO 639-2 like ``"fre"``, ``"eng"``); may be empty/None when the stream
|
||||
has no language tag. The domain resolves it to a canonical ``Language``
|
||||
via the knowledge base.
|
||||
"""
|
||||
|
||||
language: str | None
|
||||
is_hearing_impaired: bool
|
||||
is_forced: bool
|
||||
|
||||
|
||||
class MediaProber(Protocol):
|
||||
"""Inspect a media file's stream metadata."""
|
||||
|
||||
def list_subtitle_streams(self, video: Path) -> list[SubtitleStreamInfo]:
|
||||
"""Return all subtitle streams in ``video``.
|
||||
|
||||
Returns an empty list when the file is missing, unreadable, or has
|
||||
no subtitle streams. Adapters must not raise.
|
||||
"""
|
||||
...
|
||||
|
||||
def probe(self, video: Path) -> MediaInfo | None:
|
||||
"""Return the full :class:`MediaInfo` for ``video``, or ``None``.
|
||||
|
||||
Covers all stream families (video, audio, subtitle) plus
|
||||
file-level duration / bitrate. ``None`` signals that ffprobe is
|
||||
unavailable or the file can't be read — adapters must not
|
||||
raise.
|
||||
"""
|
||||
...
|
||||
@@ -1,7 +1,5 @@
|
||||
"""Shared value objects used across multiple domains."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
@@ -45,21 +43,41 @@ class ImdbId:
|
||||
@dataclass(frozen=True)
|
||||
class FilePath:
|
||||
"""
|
||||
Value object representing a file path.
|
||||
Value object representing a file path with validation.
|
||||
|
||||
Accepts either ``str`` or :class:`pathlib.Path` at construction;
|
||||
the value is normalized to ``Path`` in ``__post_init__``.
|
||||
Ensures the path is valid and optionally checks existence.
|
||||
"""
|
||||
|
||||
value: Path
|
||||
|
||||
def __post_init__(self) -> None:
|
||||
if isinstance(self.value, Path):
|
||||
return
|
||||
if isinstance(self.value, str):
|
||||
object.__setattr__(self, "value", Path(self.value))
|
||||
return
|
||||
raise ValidationError(f"Path must be str or Path, got {type(self.value)}")
|
||||
def __init__(self, path: str | Path):
|
||||
"""
|
||||
Initialize FilePath.
|
||||
|
||||
Args:
|
||||
path: String or Path object representing the file path
|
||||
"""
|
||||
if isinstance(path, str):
|
||||
path_obj = Path(path)
|
||||
elif isinstance(path, Path):
|
||||
path_obj = path
|
||||
else:
|
||||
raise ValidationError(f"Path must be str or Path, got {type(path)}")
|
||||
|
||||
# Use object.__setattr__ because dataclass is frozen
|
||||
object.__setattr__(self, "value", path_obj)
|
||||
|
||||
def exists(self) -> bool:
|
||||
"""Check if the path exists."""
|
||||
return self.value.exists()
|
||||
|
||||
def is_file(self) -> bool:
|
||||
"""Check if the path is a file."""
|
||||
return self.value.is_file()
|
||||
|
||||
def is_dir(self) -> bool:
|
||||
"""Check if the path is a directory."""
|
||||
return self.value.is_dir()
|
||||
|
||||
def __str__(self) -> str:
|
||||
return str(self.value)
|
||||
@@ -113,127 +131,3 @@ class FileSize:
|
||||
|
||||
def __repr__(self) -> str:
|
||||
return f"FileSize({self.bytes})"
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class Language:
|
||||
"""
|
||||
Canonical language value object.
|
||||
|
||||
The primary identifier is the ISO 639-2/B code (3 letters, bibliographic form,
|
||||
e.g. "fre", "eng", "ger"). This is what ffprobe emits and the project-wide
|
||||
canonical form. All other representations (ISO 639-1 code, ISO 639-2/T
|
||||
variant, english/native names, common spellings) live in ``aliases`` and are
|
||||
used by ``matches()`` for case-insensitive lookup.
|
||||
|
||||
Equality and hashing are based solely on ``iso`` so two Language objects with
|
||||
the same canonical code are interchangeable regardless of aliases.
|
||||
"""
|
||||
|
||||
iso: str
|
||||
english_name: str
|
||||
native_name: str
|
||||
aliases: tuple[str, ...] = ()
|
||||
|
||||
def __post_init__(self):
|
||||
if not isinstance(self.iso, str) or not self.iso:
|
||||
raise ValidationError(
|
||||
f"Language.iso must be a non-empty string, got {self.iso!r}"
|
||||
)
|
||||
if len(self.iso) != 3:
|
||||
raise ValidationError(
|
||||
f"Language.iso must be a 3-letter ISO 639-2/B code, got {self.iso!r}"
|
||||
)
|
||||
if self.iso != self.iso.lower():
|
||||
raise ValidationError(
|
||||
f"Language.iso must be lowercase, got {self.iso!r} — "
|
||||
f"use Language.from_raw() to construct from arbitrary input"
|
||||
)
|
||||
for alias in self.aliases:
|
||||
if not isinstance(alias, str) or alias != alias.lower().strip() or not alias:
|
||||
raise ValidationError(
|
||||
f"Language.aliases must be lowercase non-empty strings, "
|
||||
f"got {alias!r} — use Language.from_raw() to normalize"
|
||||
)
|
||||
|
||||
@classmethod
|
||||
def from_raw(
|
||||
cls,
|
||||
iso: str,
|
||||
english_name: str,
|
||||
native_name: str,
|
||||
aliases: tuple[str, ...] | list[str] = (),
|
||||
) -> Language:
|
||||
"""
|
||||
Construct a Language from arbitrary (possibly un-normalized) input.
|
||||
|
||||
Use this factory when loading from external sources (YAML, user input,
|
||||
third-party APIs) — it lowercases the iso code and normalizes/dedups
|
||||
the alias tuple. The direct constructor is strict and rejects
|
||||
un-normalized input.
|
||||
"""
|
||||
seen: set[str] = set()
|
||||
normalized: list[str] = []
|
||||
for alias in aliases:
|
||||
if not isinstance(alias, str):
|
||||
continue
|
||||
a = alias.lower().strip()
|
||||
if a and a not in seen:
|
||||
seen.add(a)
|
||||
normalized.append(a)
|
||||
return cls(
|
||||
iso=iso.lower(),
|
||||
english_name=english_name,
|
||||
native_name=native_name,
|
||||
aliases=tuple(normalized),
|
||||
)
|
||||
|
||||
def matches(self, raw: str) -> bool:
|
||||
"""
|
||||
True if ``raw`` is any known representation of this language.
|
||||
|
||||
Comparison is case-insensitive and whitespace-trimmed. The match space is
|
||||
the union of the canonical ``iso`` code, the english/native names, and
|
||||
every alias.
|
||||
"""
|
||||
if not isinstance(raw, str):
|
||||
return False
|
||||
needle = raw.lower().strip()
|
||||
if not needle:
|
||||
return False
|
||||
if needle == self.iso:
|
||||
return True
|
||||
if needle == self.english_name.lower():
|
||||
return True
|
||||
if needle == self.native_name.lower():
|
||||
return True
|
||||
return needle in self.aliases
|
||||
|
||||
def __eq__(self, other: object) -> bool:
|
||||
if not isinstance(other, Language):
|
||||
return NotImplemented
|
||||
return self.iso == other.iso
|
||||
|
||||
def __hash__(self) -> int:
|
||||
return hash(self.iso)
|
||||
|
||||
def __str__(self) -> str:
|
||||
return self.iso
|
||||
|
||||
def __repr__(self) -> str:
|
||||
return f"Language({self.iso!r}, {self.english_name!r})"
|
||||
|
||||
|
||||
# Characters allowed in dot-separated folder/filename forms:
|
||||
# alphanumerics, underscores, spaces (about to be replaced with dots),
|
||||
# literal dots, and hyphens. Everything else is stripped.
|
||||
_FS_SAFE_CHARS = re.compile(r"[^\w\s\.\-]")
|
||||
|
||||
|
||||
def to_dot_folder_name(title: str) -> str:
|
||||
"""Sanitize ``title`` for filesystem use and convert spaces to dots.
|
||||
|
||||
Produces e.g. ``Breaking.Bad`` from ``"Breaking Bad"`` or
|
||||
``Spider.Man`` from ``"Spider-Man: No Way Home"``.
|
||||
"""
|
||||
return _FS_SAFE_CHARS.sub("", title).replace(" ", ".")
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
"""Subtitles domain — subtitle identification, classification and placement."""
|
||||
|
||||
from .aggregates import SubtitleRuleSet
|
||||
from .entities import MediaSubtitleMetadata, SubtitleScanResult
|
||||
from .entities import MediaSubtitleMetadata, SubtitleTrack
|
||||
from .exceptions import SubtitleNotFound
|
||||
from .knowledge import KnowledgeLoader, SubtitleKnowledgeBase
|
||||
from .services import PatternDetector, SubtitleIdentifier, SubtitleMatcher
|
||||
from .value_objects import (
|
||||
RuleScope,
|
||||
RuleScopeLevel,
|
||||
ScanStrategy,
|
||||
SubtitleFormat,
|
||||
SubtitleLanguage,
|
||||
@@ -17,9 +17,11 @@ from .value_objects import (
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"SubtitleScanResult",
|
||||
"SubtitleTrack",
|
||||
"MediaSubtitleMetadata",
|
||||
"SubtitleRuleSet",
|
||||
"SubtitleKnowledgeBase",
|
||||
"KnowledgeLoader",
|
||||
"SubtitleIdentifier",
|
||||
"SubtitleMatcher",
|
||||
"PatternDetector",
|
||||
@@ -31,6 +33,5 @@ __all__ = [
|
||||
"TypeDetectionMethod",
|
||||
"SubtitleMatchingRules",
|
||||
"RuleScope",
|
||||
"RuleScopeLevel",
|
||||
"SubtitleNotFound",
|
||||
]
|
||||
|
||||
@@ -4,7 +4,13 @@ from dataclasses import dataclass, field
|
||||
from typing import Any
|
||||
|
||||
from ..shared.value_objects import ImdbId
|
||||
from .value_objects import RuleScope, RuleScopeLevel, SubtitleMatchingRules
|
||||
from .knowledge.base import SubtitleKnowledgeBase
|
||||
from .value_objects import RuleScope, SubtitleMatchingRules
|
||||
|
||||
|
||||
def DEFAULT_RULES() -> SubtitleMatchingRules:
|
||||
"""Load default matching rules from subtitles.yaml (defaults section)."""
|
||||
return SubtitleKnowledgeBase().default_rules()
|
||||
|
||||
|
||||
@dataclass
|
||||
@@ -20,7 +26,7 @@ class SubtitleRuleSet:
|
||||
"""
|
||||
|
||||
scope: RuleScope
|
||||
parent: SubtitleRuleSet | None = None
|
||||
parent: "SubtitleRuleSet | None" = None
|
||||
pinned_to: ImdbId | None = None
|
||||
|
||||
# Deltas — None = inherit
|
||||
@@ -30,26 +36,18 @@ class SubtitleRuleSet:
|
||||
_format_priority: list[str] | None = field(default=None, repr=False)
|
||||
_min_confidence: float | None = field(default=None, repr=False)
|
||||
|
||||
def resolve(self, default_rules: SubtitleMatchingRules) -> SubtitleMatchingRules:
|
||||
def resolve(self) -> SubtitleMatchingRules:
|
||||
"""
|
||||
Walk the parent chain and merge deltas into effective rules.
|
||||
|
||||
``default_rules`` seeds the top of the chain — it is the caller's
|
||||
responsibility to load these from the knowledge base (infrastructure).
|
||||
Keeping the default rules as a parameter keeps the domain free of
|
||||
any I/O dependency.
|
||||
Falls back to DEFAULT_RULES at the top of the chain.
|
||||
"""
|
||||
base = (
|
||||
self.parent.resolve(default_rules) if self.parent else default_rules
|
||||
)
|
||||
base = self.parent.resolve() if self.parent else DEFAULT_RULES()
|
||||
return SubtitleMatchingRules(
|
||||
preferred_languages=self._languages or base.preferred_languages,
|
||||
preferred_formats=self._formats or base.preferred_formats,
|
||||
allowed_types=self._types or base.allowed_types,
|
||||
format_priority=self._format_priority or base.format_priority,
|
||||
min_confidence=self._min_confidence
|
||||
if self._min_confidence is not None
|
||||
else base.min_confidence,
|
||||
min_confidence=self._min_confidence if self._min_confidence is not None else base.min_confidence,
|
||||
)
|
||||
|
||||
def override(
|
||||
@@ -85,14 +83,8 @@ class SubtitleRuleSet:
|
||||
delta["format_priority"] = self._format_priority
|
||||
if self._min_confidence is not None:
|
||||
delta["min_confidence"] = self._min_confidence
|
||||
return {
|
||||
"scope": {
|
||||
"level": self.scope.level.value,
|
||||
"identifier": self.scope.identifier,
|
||||
},
|
||||
"override": delta,
|
||||
}
|
||||
return {"scope": {"level": self.scope.level, "identifier": self.scope.identifier}, "override": delta}
|
||||
|
||||
@classmethod
|
||||
def global_default(cls) -> SubtitleRuleSet:
|
||||
return cls(scope=RuleScope(level=RuleScopeLevel.GLOBAL))
|
||||
def global_default(cls) -> "SubtitleRuleSet":
|
||||
return cls(scope=RuleScope(level="global"))
|
||||
|
||||
@@ -4,26 +4,16 @@ from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
|
||||
from ..shared.value_objects import ImdbId
|
||||
from .value_objects import (
|
||||
SubtitleFormat,
|
||||
SubtitleLanguage,
|
||||
SubtitleType,
|
||||
)
|
||||
from .value_objects import SubtitleFormat, SubtitleLanguage, SubtitleMatchingRules, SubtitleType
|
||||
|
||||
|
||||
@dataclass
|
||||
class SubtitleScanResult:
|
||||
class SubtitleTrack:
|
||||
"""
|
||||
A subtitle observed during a scan — either an external file or an embedded stream.
|
||||
A single subtitle track — either an external file or an embedded stream.
|
||||
|
||||
Unlike ``alfred.domain.shared.media.SubtitleTrack`` (the pure container-view
|
||||
populated from ffprobe), a ``SubtitleScanResult`` carries the **flow state**
|
||||
of the subtitle matching pipeline: language/format are typed value objects
|
||||
that may be ``None`` while classification is in progress, ``confidence``
|
||||
reflects how certain we are, and ``raw_tokens`` holds the filename fragments
|
||||
still under analysis. State evolves: unknown → resolved after user
|
||||
clarification. The name reflects this — it's the **output of a scan pass**,
|
||||
not a value object.
|
||||
State can evolve: unknown → resolved after user clarification.
|
||||
confidence reflects how certain we are about language + type classification.
|
||||
"""
|
||||
|
||||
# Classification (may be None if not yet resolved)
|
||||
@@ -33,15 +23,13 @@ class SubtitleScanResult:
|
||||
|
||||
# Source
|
||||
is_embedded: bool = False
|
||||
file_path: Path | None = None # None if embedded
|
||||
file_path: Path | None = None # None if embedded
|
||||
file_size_kb: float | None = None
|
||||
entry_count: int | None = None # number of subtitle cues in the file
|
||||
entry_count: int | None = None # number of subtitle cues in the file
|
||||
|
||||
# Matching state
|
||||
confidence: float = 0.0 # 0.0 → 1.0, not applicable for embedded
|
||||
raw_tokens: list[str] = field(
|
||||
default_factory=list
|
||||
) # tokens extracted from filename
|
||||
confidence: float = 0.0 # 0.0 → 1.0, not applicable for embedded
|
||||
raw_tokens: list[str] = field(default_factory=list) # tokens extracted from filename
|
||||
|
||||
def is_resolved(self) -> bool:
|
||||
return self.language is not None
|
||||
@@ -55,9 +43,7 @@ class SubtitleScanResult:
|
||||
{lang}.forced.{ext}
|
||||
"""
|
||||
if not self.language or not self.format:
|
||||
raise ValueError(
|
||||
"Cannot compute destination_name: language or format missing"
|
||||
)
|
||||
raise ValueError("Cannot compute destination_name: language or format missing")
|
||||
ext = self.format.extensions[0].lstrip(".")
|
||||
parts = [self.language.code]
|
||||
if self.subtitle_type == SubtitleType.SDH:
|
||||
@@ -69,12 +55,8 @@ class SubtitleScanResult:
|
||||
def __repr__(self) -> str:
|
||||
lang = self.language.code if self.language else "?"
|
||||
fmt = self.format.id if self.format else "?"
|
||||
src = (
|
||||
"embedded"
|
||||
if self.is_embedded
|
||||
else str(self.file_path.name if self.file_path else "?")
|
||||
)
|
||||
return f"SubtitleScanResult({lang}, {self.subtitle_type.value}, {fmt}, src={src}, conf={self.confidence:.2f})"
|
||||
src = "embedded" if self.is_embedded else str(self.file_path.name if self.file_path else "?")
|
||||
return f"SubtitleTrack({lang}, {self.subtitle_type.value}, {fmt}, src={src}, conf={self.confidence:.2f})"
|
||||
|
||||
|
||||
@dataclass
|
||||
@@ -85,15 +67,15 @@ class MediaSubtitleMetadata:
|
||||
"""
|
||||
|
||||
media_id: ImdbId | None
|
||||
media_type: str # "movie" | "tv_show"
|
||||
embedded_tracks: list[SubtitleScanResult] = field(default_factory=list)
|
||||
external_tracks: list[SubtitleScanResult] = field(default_factory=list)
|
||||
media_type: str # "movie" | "tv_show"
|
||||
embedded_tracks: list[SubtitleTrack] = field(default_factory=list)
|
||||
external_tracks: list[SubtitleTrack] = field(default_factory=list)
|
||||
release_group: str | None = None
|
||||
detected_pattern_id: str | None = None # pattern id from knowledge base
|
||||
pattern_confirmed: bool = False
|
||||
|
||||
@property
|
||||
def all_tracks(self) -> list[SubtitleScanResult]:
|
||||
def all_tracks(self) -> list[SubtitleTrack]:
|
||||
return self.embedded_tracks + self.external_tracks
|
||||
|
||||
@property
|
||||
@@ -101,5 +83,5 @@ class MediaSubtitleMetadata:
|
||||
return len(self.embedded_tracks) + len(self.external_tracks)
|
||||
|
||||
@property
|
||||
def unresolved_tracks(self) -> list[SubtitleScanResult]:
|
||||
def unresolved_tracks(self) -> list[SubtitleTrack]:
|
||||
return [t for t in self.external_tracks if t.language is None]
|
||||
|
||||
+13
-49
@@ -1,9 +1,9 @@
|
||||
"""SubtitleKnowledgeBase — parsed, typed view of the loaded knowledge."""
|
||||
|
||||
import logging
|
||||
from functools import cached_property
|
||||
|
||||
from alfred.domain.shared.ports import LanguageRepository
|
||||
from alfred.domain.subtitles.value_objects import (
|
||||
from ..value_objects import (
|
||||
ScanStrategy,
|
||||
SubtitleFormat,
|
||||
SubtitleLanguage,
|
||||
@@ -12,8 +12,6 @@ from alfred.domain.subtitles.value_objects import (
|
||||
SubtitleType,
|
||||
TypeDetectionMethod,
|
||||
)
|
||||
from alfred.infrastructure.knowledge.language_registry import LanguageRegistry
|
||||
|
||||
from .loader import KnowledgeLoader
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
@@ -27,18 +25,11 @@ class SubtitleKnowledgeBase:
|
||||
without restarting.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
loader: KnowledgeLoader | None = None,
|
||||
language_registry: LanguageRepository | None = None,
|
||||
):
|
||||
def __init__(self, loader: KnowledgeLoader | None = None):
|
||||
self._loader = loader or KnowledgeLoader()
|
||||
self._language_registry: LanguageRepository = (
|
||||
language_registry or LanguageRegistry()
|
||||
)
|
||||
self._build()
|
||||
|
||||
def _build(self) -> None: # noqa: PLR0912 — straight-line YAML projection
|
||||
def _build(self) -> None:
|
||||
data = self._loader.subtitles()
|
||||
|
||||
self._formats: dict[str, SubtitleFormat] = {}
|
||||
@@ -49,44 +40,17 @@ class SubtitleKnowledgeBase:
|
||||
description=fdata.get("description", ""),
|
||||
)
|
||||
|
||||
# Languages are sourced primarily from the canonical LanguageRegistry
|
||||
# (alfred/knowledge/iso_languages.yaml — ISO 639-2/B). Subtitle-specific
|
||||
# tokens (VOSTFR, VF, VFF…) are merged on top from subtitles.yaml's
|
||||
# ``language_tokens`` section.
|
||||
subtitle_extras: dict[str, list[str]] = {
|
||||
code: list(tokens or [])
|
||||
for code, tokens in (data.get("language_tokens", {}) or {}).items()
|
||||
}
|
||||
|
||||
self._languages: dict[str, SubtitleLanguage] = {}
|
||||
self._lang_token_map: dict[str, str] = {}
|
||||
|
||||
for language in self._language_registry.all():
|
||||
tokens: list[str] = [language.iso, language.english_name.lower()]
|
||||
if language.native_name.lower() not in tokens:
|
||||
tokens.append(language.native_name.lower())
|
||||
for alias in language.aliases:
|
||||
if alias not in tokens:
|
||||
tokens.append(alias)
|
||||
for extra in subtitle_extras.get(language.iso, []):
|
||||
if extra.lower() not in tokens:
|
||||
tokens.append(extra.lower())
|
||||
|
||||
self._languages[language.iso] = SubtitleLanguage(
|
||||
code=language.iso,
|
||||
tokens=tokens,
|
||||
for code, ldata in data.get("languages", {}).items():
|
||||
self._languages[code] = SubtitleLanguage(
|
||||
code=code,
|
||||
tokens=ldata.get("tokens", []),
|
||||
)
|
||||
for token in tokens:
|
||||
self._lang_token_map[token.lower()] = language.iso
|
||||
|
||||
# Subtitle-specific tokens for languages NOT in the canonical registry
|
||||
# are still honored: register them as a minimal SubtitleLanguage.
|
||||
for code, extras in subtitle_extras.items():
|
||||
if code in self._languages:
|
||||
continue
|
||||
tokens = [code] + [e.lower() for e in extras]
|
||||
self._languages[code] = SubtitleLanguage(code=code, tokens=tokens)
|
||||
for token in tokens:
|
||||
# Build reverse token → language code map
|
||||
self._lang_token_map: dict[str, str] = {}
|
||||
for code, lang in self._languages.items():
|
||||
for token in lang.tokens:
|
||||
self._lang_token_map[token.lower()] = code
|
||||
|
||||
# Build reverse token → type map
|
||||
@@ -98,7 +62,7 @@ class SubtitleKnowledgeBase:
|
||||
|
||||
d = data.get("defaults", {})
|
||||
self._default_rules = SubtitleMatchingRules(
|
||||
preferred_languages=d.get("languages", ["fre", "eng"]),
|
||||
preferred_languages=d.get("languages", ["fra", "eng"]),
|
||||
preferred_formats=d.get("formats", ["srt"]),
|
||||
allowed_types=d.get("types", ["standard", "forced"]),
|
||||
format_priority=d.get("format_priority", ["srt", "ass"]),
|
||||
+4
-8
@@ -5,10 +5,10 @@ from pathlib import Path
|
||||
|
||||
import yaml
|
||||
|
||||
import alfred as _alfred_pkg
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
import alfred as _alfred_pkg
|
||||
|
||||
# Builtin knowledge — anchored on the alfred package itself, not on this file's depth
|
||||
_BUILTIN_ROOT = Path(_alfred_pkg.__file__).parent / "knowledge"
|
||||
|
||||
@@ -84,9 +84,7 @@ class KnowledgeLoader:
|
||||
data = _load_yaml(path)
|
||||
pid = data.get("id", path.stem)
|
||||
if pid in self._cache["patterns"]:
|
||||
self._cache["patterns"][pid] = _merge(
|
||||
self._cache["patterns"][pid], data
|
||||
)
|
||||
self._cache["patterns"][pid] = _merge(self._cache["patterns"][pid], data)
|
||||
else:
|
||||
self._cache["patterns"][pid] = data
|
||||
logger.info(f"KnowledgeLoader: learned new pattern '{pid}'")
|
||||
@@ -102,9 +100,7 @@ class KnowledgeLoader:
|
||||
data = _load_yaml(path)
|
||||
name = data.get("name", path.stem)
|
||||
if name in self._cache["release_groups"]:
|
||||
self._cache["release_groups"][name] = _merge(
|
||||
self._cache["release_groups"][name], data
|
||||
)
|
||||
self._cache["release_groups"][name] = _merge(self._cache["release_groups"][name], data)
|
||||
else:
|
||||
self._cache["release_groups"][name] = data
|
||||
logger.info(f"KnowledgeLoader: learned new release group '{name}'")
|
||||
@@ -1,6 +0,0 @@
|
||||
"""Domain ports for the subtitles domain — Protocol-based abstractions
|
||||
that decouple domain services from concrete infrastructure adapters."""
|
||||
|
||||
from .knowledge import SubtitleKnowledge
|
||||
|
||||
__all__ = ["SubtitleKnowledge"]
|
||||
@@ -1,38 +0,0 @@
|
||||
"""SubtitleKnowledge port — the query surface domain services need from the
|
||||
subtitle knowledge base, expressed as a Protocol so the domain never imports
|
||||
the infrastructure adapter that backs it.
|
||||
|
||||
The concrete implementation lives in
|
||||
``alfred/infrastructure/knowledge/subtitles/base.py`` (the YAML-backed
|
||||
``SubtitleKnowledgeBase``). Tests can supply any object that satisfies this
|
||||
structural contract.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Protocol
|
||||
|
||||
from ..value_objects import SubtitleFormat, SubtitleLanguage, SubtitlePattern, SubtitleType
|
||||
|
||||
|
||||
class SubtitleKnowledge(Protocol):
|
||||
"""Read-only query surface for subtitle knowledge consumed by the domain.
|
||||
|
||||
Only the methods that domain services actually call belong here — anything
|
||||
else (defaults loading, reload, pattern groups, raw dicts) stays on the
|
||||
concrete class and is reserved for the application layer.
|
||||
"""
|
||||
|
||||
def known_extensions(self) -> set[str]: ...
|
||||
|
||||
def format_for_extension(self, ext: str) -> SubtitleFormat | None: ...
|
||||
|
||||
def language_for_token(self, token: str) -> SubtitleLanguage | None: ...
|
||||
|
||||
def is_known_lang_token(self, token: str) -> bool: ...
|
||||
|
||||
def type_for_token(self, token: str) -> SubtitleType | None: ...
|
||||
|
||||
def is_known_type_token(self, token: str) -> bool: ...
|
||||
|
||||
def patterns(self) -> dict[str, SubtitlePattern]: ...
|
||||
@@ -0,0 +1,60 @@
|
||||
"""Subtitle repository interfaces (abstract)."""
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
|
||||
from ..shared.value_objects import ImdbId
|
||||
from .entities import Subtitle
|
||||
from .value_objects import Language
|
||||
|
||||
|
||||
class SubtitleRepository(ABC):
|
||||
"""
|
||||
Abstract repository for subtitle persistence.
|
||||
|
||||
This defines the interface that infrastructure implementations must follow.
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def save(self, subtitle: Subtitle) -> None:
|
||||
"""
|
||||
Save a subtitle to the repository.
|
||||
|
||||
Args:
|
||||
subtitle: Subtitle entity to save
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def find_by_media(
|
||||
self,
|
||||
media_imdb_id: ImdbId,
|
||||
language: Language | None = None,
|
||||
season: int | None = None,
|
||||
episode: int | None = None,
|
||||
) -> list[Subtitle]:
|
||||
"""
|
||||
Find subtitles for a media item.
|
||||
|
||||
Args:
|
||||
media_imdb_id: IMDb ID of the media
|
||||
language: Optional language filter
|
||||
season: Optional season number (for TV shows)
|
||||
episode: Optional episode number (for TV shows)
|
||||
|
||||
Returns:
|
||||
List of matching subtitles
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def delete(self, subtitle: Subtitle) -> bool:
|
||||
"""
|
||||
Delete a subtitle from the repository.
|
||||
|
||||
Args:
|
||||
subtitle: Subtitle to delete
|
||||
|
||||
Returns:
|
||||
True if deleted, False if not found
|
||||
"""
|
||||
pass
|
||||
@@ -0,0 +1,221 @@
|
||||
"""SubtitleScanner — inspects local subtitle files and filters them per user preferences.
|
||||
|
||||
Given a video file path, the scanner:
|
||||
1. Looks for subtitle files in the same directory as the video.
|
||||
2. Optionally also inspects a Subs/ subfolder adjacent to the video.
|
||||
3. Classifies each file (language, SDH, forced) from its filename.
|
||||
4. Filters according to SubtitlePreferences (languages, min_size_kb, keep_sdh, keep_forced).
|
||||
5. Returns a list of SubtitleCandidate — one per file that passes the filter,
|
||||
with the destination filename already computed.
|
||||
|
||||
Filename classification heuristics
|
||||
-----------------------------------
|
||||
We parse the stem of each subtitle file looking for known patterns:
|
||||
|
||||
fr.srt → lang=fr, sdh=False, forced=False
|
||||
fr.sdh.srt → lang=fr, sdh=True
|
||||
fr.hi.srt → lang=fr, sdh=True (hi = hearing-impaired, alias for sdh)
|
||||
fr.forced.srt → lang=fr, forced=True
|
||||
Breaking.Bad.S01E01.French.srt → lang=fr (keyword match)
|
||||
Breaking.Bad.S01E01.VOSTFR.srt → lang=fr (VOSTFR = French forced/foreign subs)
|
||||
|
||||
Output naming convention (matches SubtitlePreferences docstring):
|
||||
{lang}.srt
|
||||
{lang}.sdh.srt
|
||||
{lang}.forced.srt
|
||||
"""
|
||||
|
||||
import logging
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# Subtitle file extensions we handle
|
||||
SUBTITLE_EXTENSIONS = {".srt", ".ass", ".ssa", ".vtt", ".sub"}
|
||||
|
||||
# Language keyword map: lowercase token → ISO 639-1 code
|
||||
_LANG_KEYWORDS: dict[str, str] = {
|
||||
# French
|
||||
"fr": "fr",
|
||||
"fra": "fr",
|
||||
"french": "fr",
|
||||
"francais": "fr",
|
||||
"français": "fr",
|
||||
"vf": "fr",
|
||||
"vff": "fr",
|
||||
"vostfr": "fr",
|
||||
# English
|
||||
"en": "en",
|
||||
"eng": "en",
|
||||
"english": "en",
|
||||
# Spanish
|
||||
"es": "es",
|
||||
"spa": "es",
|
||||
"spanish": "es",
|
||||
"espanol": "es",
|
||||
# German
|
||||
"de": "de",
|
||||
"deu": "de",
|
||||
"ger": "de",
|
||||
"german": "de",
|
||||
# Italian
|
||||
"it": "it",
|
||||
"ita": "it",
|
||||
"italian": "it",
|
||||
# Portuguese
|
||||
"pt": "pt",
|
||||
"por": "pt",
|
||||
"portuguese": "pt",
|
||||
# Dutch
|
||||
"nl": "nl",
|
||||
"nld": "nl",
|
||||
"dutch": "nl",
|
||||
# Japanese
|
||||
"ja": "ja",
|
||||
"jpn": "ja",
|
||||
"japanese": "ja",
|
||||
}
|
||||
|
||||
# Tokens that indicate SDH / hearing-impaired
|
||||
_SDH_TOKENS = {"sdh", "hi", "hearing", "impaired", "cc", "closedcaption"}
|
||||
|
||||
# Tokens that indicate forced subtitles
|
||||
_FORCED_TOKENS = {"forced", "foreign"}
|
||||
|
||||
|
||||
@dataclass
|
||||
class SubtitleCandidate:
|
||||
"""A subtitle file that passed the filter, ready to be placed."""
|
||||
|
||||
source_path: Path
|
||||
language: str # ISO 639-1 code, e.g. "fr"
|
||||
is_sdh: bool
|
||||
is_forced: bool
|
||||
extension: str # e.g. ".srt"
|
||||
|
||||
@property
|
||||
def destination_name(self) -> str:
|
||||
"""
|
||||
Compute the destination filename per naming convention:
|
||||
{lang}.srt
|
||||
{lang}.sdh.srt
|
||||
{lang}.forced.srt
|
||||
"""
|
||||
ext = self.extension.lstrip(".")
|
||||
parts = [self.language]
|
||||
if self.is_sdh:
|
||||
parts.append("sdh")
|
||||
elif self.is_forced:
|
||||
parts.append("forced")
|
||||
return ".".join(parts) + "." + ext
|
||||
|
||||
|
||||
def _classify(path: Path) -> tuple[str | None, bool, bool]:
|
||||
"""
|
||||
Parse a subtitle filename and return (language_code, is_sdh, is_forced).
|
||||
|
||||
Returns (None, False, False) if the language cannot be determined.
|
||||
"""
|
||||
stem = path.stem.lower()
|
||||
# Split on dots, spaces, underscores, hyphens
|
||||
import re
|
||||
tokens = re.split(r"[\.\s_\-]+", stem)
|
||||
|
||||
language: str | None = None
|
||||
is_sdh = False
|
||||
is_forced = False
|
||||
|
||||
for token in tokens:
|
||||
if token in _LANG_KEYWORDS:
|
||||
language = _LANG_KEYWORDS[token]
|
||||
if token in _SDH_TOKENS:
|
||||
is_sdh = True
|
||||
if token in _FORCED_TOKENS:
|
||||
is_forced = True
|
||||
|
||||
return language, is_sdh, is_forced
|
||||
|
||||
|
||||
class SubtitleScanner:
|
||||
"""
|
||||
Scans subtitle files next to a video and filters them per SubtitlePreferences.
|
||||
|
||||
Usage:
|
||||
scanner = SubtitleScanner(prefs)
|
||||
candidates = scanner.scan(video_path)
|
||||
# Each candidate has .source_path and .destination_name
|
||||
"""
|
||||
|
||||
def __init__(self, languages: list[str], min_size_kb: int, keep_sdh: bool, keep_forced: bool):
|
||||
self.languages = [l.lower() for l in languages]
|
||||
self.min_size_kb = min_size_kb
|
||||
self.keep_sdh = keep_sdh
|
||||
self.keep_forced = keep_forced
|
||||
|
||||
def scan(self, video_path: Path) -> list[SubtitleCandidate]:
|
||||
"""
|
||||
Return all subtitle candidates found next to the video that pass the filter.
|
||||
|
||||
Scans:
|
||||
- Same directory as the video (flat siblings)
|
||||
- Subs/ subfolder if present
|
||||
"""
|
||||
candidates: list[SubtitleCandidate] = []
|
||||
search_dirs = [video_path.parent]
|
||||
|
||||
subs_dir = video_path.parent / "Subs"
|
||||
if subs_dir.is_dir():
|
||||
search_dirs.append(subs_dir)
|
||||
logger.debug(f"SubtitleScanner: found Subs/ folder at {subs_dir}")
|
||||
|
||||
for directory in search_dirs:
|
||||
for path in sorted(directory.iterdir()):
|
||||
if not path.is_file():
|
||||
continue
|
||||
if path.suffix.lower() not in SUBTITLE_EXTENSIONS:
|
||||
continue
|
||||
|
||||
candidate = self._evaluate(path)
|
||||
if candidate is not None:
|
||||
candidates.append(candidate)
|
||||
|
||||
logger.info(f"SubtitleScanner: {len(candidates)} candidate(s) found for {video_path.name}")
|
||||
return candidates
|
||||
|
||||
def _evaluate(self, path: Path) -> SubtitleCandidate | None:
|
||||
"""Apply all filters to a single subtitle file. Returns None if it should be dropped."""
|
||||
# Size filter
|
||||
size_kb = path.stat().st_size / 1024
|
||||
if size_kb < self.min_size_kb:
|
||||
logger.debug(f"SubtitleScanner: skip {path.name} (too small: {size_kb:.1f} KB)")
|
||||
return None
|
||||
|
||||
language, is_sdh, is_forced = _classify(path)
|
||||
|
||||
# Language filter
|
||||
if language is None:
|
||||
logger.debug(f"SubtitleScanner: skip {path.name} (language unknown)")
|
||||
return None
|
||||
|
||||
if language not in self.languages:
|
||||
logger.debug(f"SubtitleScanner: skip {path.name} (language '{language}' not in prefs)")
|
||||
return None
|
||||
|
||||
# SDH filter
|
||||
if is_sdh and not self.keep_sdh:
|
||||
logger.debug(f"SubtitleScanner: skip {path.name} (SDH not wanted)")
|
||||
return None
|
||||
|
||||
# Forced filter
|
||||
if is_forced and not self.keep_forced:
|
||||
logger.debug(f"SubtitleScanner: skip {path.name} (forced not wanted)")
|
||||
return None
|
||||
|
||||
return SubtitleCandidate(
|
||||
source_path=path,
|
||||
language=language,
|
||||
is_sdh=is_sdh,
|
||||
is_forced=is_forced,
|
||||
extension=path.suffix.lower(),
|
||||
)
|
||||
@@ -0,0 +1,149 @@
|
||||
"""Subtitle domain services - Business logic."""
|
||||
|
||||
import logging
|
||||
|
||||
from ..shared.value_objects import FilePath, ImdbId
|
||||
from .entities import Subtitle
|
||||
from .exceptions import SubtitleNotFound
|
||||
from .repositories import SubtitleRepository
|
||||
from .value_objects import Language, SubtitleFormat
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class SubtitleService:
|
||||
"""
|
||||
Domain service for subtitle-related business logic.
|
||||
|
||||
This service is SHARED between movies and TV shows domains.
|
||||
Both can use this service to manage subtitles.
|
||||
"""
|
||||
|
||||
def __init__(self, repository: SubtitleRepository):
|
||||
"""
|
||||
Initialize subtitle service.
|
||||
|
||||
Args:
|
||||
repository: Subtitle repository for persistence
|
||||
"""
|
||||
self.repository = repository
|
||||
|
||||
def add_subtitle(self, subtitle: Subtitle) -> None:
|
||||
"""
|
||||
Add a subtitle to the library.
|
||||
|
||||
Args:
|
||||
subtitle: Subtitle entity to add
|
||||
"""
|
||||
self.repository.save(subtitle)
|
||||
logger.info(
|
||||
f"Added subtitle: {subtitle.language.value} for {subtitle.media_imdb_id}"
|
||||
)
|
||||
|
||||
def find_subtitles_for_movie(
|
||||
self, imdb_id: ImdbId, languages: list[Language] | None = None
|
||||
) -> list[Subtitle]:
|
||||
"""
|
||||
Find subtitles for a movie.
|
||||
|
||||
Args:
|
||||
imdb_id: IMDb ID of the movie
|
||||
languages: Optional list of languages to filter by
|
||||
|
||||
Returns:
|
||||
List of matching subtitles
|
||||
"""
|
||||
if languages:
|
||||
all_subtitles = []
|
||||
for lang in languages:
|
||||
subs = self.repository.find_by_media(imdb_id, language=lang)
|
||||
all_subtitles.extend(subs)
|
||||
return all_subtitles
|
||||
else:
|
||||
return self.repository.find_by_media(imdb_id)
|
||||
|
||||
def find_subtitles_for_episode(
|
||||
self,
|
||||
imdb_id: ImdbId,
|
||||
season: int,
|
||||
episode: int,
|
||||
languages: list[Language] | None = None,
|
||||
) -> list[Subtitle]:
|
||||
"""
|
||||
Find subtitles for a TV show episode.
|
||||
|
||||
Args:
|
||||
imdb_id: IMDb ID of the TV show
|
||||
season: Season number
|
||||
episode: Episode number
|
||||
languages: Optional list of languages to filter by
|
||||
|
||||
Returns:
|
||||
List of matching subtitles
|
||||
"""
|
||||
if languages:
|
||||
all_subtitles = []
|
||||
for lang in languages:
|
||||
subs = self.repository.find_by_media(
|
||||
imdb_id, language=lang, season=season, episode=episode
|
||||
)
|
||||
all_subtitles.extend(subs)
|
||||
return all_subtitles
|
||||
else:
|
||||
return self.repository.find_by_media(
|
||||
imdb_id, season=season, episode=episode
|
||||
)
|
||||
|
||||
def remove_subtitle(self, subtitle: Subtitle) -> None:
|
||||
"""
|
||||
Remove a subtitle from the library.
|
||||
|
||||
Args:
|
||||
subtitle: Subtitle to remove
|
||||
|
||||
Raises:
|
||||
SubtitleNotFound: If subtitle not found
|
||||
"""
|
||||
if not self.repository.delete(subtitle):
|
||||
raise SubtitleNotFound(f"Subtitle not found: {subtitle}")
|
||||
|
||||
logger.info(f"Removed subtitle: {subtitle}")
|
||||
|
||||
def detect_format_from_file(self, file_path: FilePath) -> SubtitleFormat:
|
||||
"""
|
||||
Detect subtitle format from file extension.
|
||||
|
||||
Args:
|
||||
file_path: Path to subtitle file
|
||||
|
||||
Returns:
|
||||
Detected subtitle format
|
||||
"""
|
||||
extension = file_path.value.suffix
|
||||
return SubtitleFormat.from_extension(extension)
|
||||
|
||||
def validate_subtitle_file(self, file_path: FilePath) -> bool:
|
||||
"""
|
||||
Validate that a file is a valid subtitle file.
|
||||
|
||||
Args:
|
||||
file_path: Path to the file
|
||||
|
||||
Returns:
|
||||
True if valid subtitle file, False otherwise
|
||||
"""
|
||||
if not file_path.exists():
|
||||
logger.warning(f"File does not exist: {file_path}")
|
||||
return False
|
||||
|
||||
if not file_path.is_file():
|
||||
logger.warning(f"Path is not a file: {file_path}")
|
||||
return False
|
||||
|
||||
# Check file extension
|
||||
try:
|
||||
self.detect_format_from_file(file_path)
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.warning(f"Invalid subtitle format: {e}")
|
||||
return False
|
||||
@@ -1,9 +1,13 @@
|
||||
from .identifier import SubtitleIdentifier
|
||||
from .matcher import SubtitleMatcher
|
||||
from .pattern_detector import PatternDetector
|
||||
from .placer import PlacedTrack, PlaceResult, SubtitlePlacer
|
||||
|
||||
__all__ = [
|
||||
"SubtitleIdentifier",
|
||||
"SubtitleMatcher",
|
||||
"PatternDetector",
|
||||
"SubtitlePlacer",
|
||||
"PlacedTrack",
|
||||
"PlaceResult",
|
||||
]
|
||||
|
||||
@@ -2,48 +2,34 @@
|
||||
|
||||
import logging
|
||||
import re
|
||||
import subprocess
|
||||
import json
|
||||
from pathlib import Path
|
||||
|
||||
from ...shared.ports import FilesystemScanner, MediaProber
|
||||
from ..ports import SubtitleKnowledge
|
||||
from ...shared.value_objects import ImdbId
|
||||
from ..entities import MediaSubtitleMetadata, SubtitleScanResult
|
||||
from ..entities import MediaSubtitleMetadata, SubtitleTrack
|
||||
from ..knowledge.base import SubtitleKnowledgeBase
|
||||
from ..value_objects import ScanStrategy, SubtitlePattern, SubtitleType
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _tokenize(name: str) -> list[str]:
|
||||
"""Split a filename stem into lowercase tokens, stripping parentheses."""
|
||||
# Strip parenthesized qualifiers like (simplified), (canada), (brazil)
|
||||
name = re.sub(r"\([^)]*\)", "", name)
|
||||
"""Split a filename stem into lowercase tokens."""
|
||||
return [t.lower() for t in re.split(r"[\.\s_\-]+", name) if t]
|
||||
|
||||
|
||||
def _tokenize_suffix(stem: str, episode_stem: str) -> list[str]:
|
||||
"""
|
||||
For episode_subfolder pattern: the filename is {episode_stem}.{lang_tokens}.
|
||||
Return only the tokens that come after the episode stem portion.
|
||||
Falls back to full tokenization if the stem doesn't start with episode_stem.
|
||||
"""
|
||||
stem_lower = stem.lower()
|
||||
prefix = episode_stem.lower()
|
||||
if stem_lower.startswith(prefix):
|
||||
suffix = stem[len(prefix) :]
|
||||
tokens = _tokenize(suffix)
|
||||
if tokens:
|
||||
return tokens
|
||||
return _tokenize(stem)
|
||||
|
||||
|
||||
def _count_entries(text: str | None) -> int | None:
|
||||
"""Return the entry count of an SRT body by finding the last cue number."""
|
||||
if text is None:
|
||||
return None
|
||||
for line in reversed(text.splitlines()):
|
||||
if line.strip().isdigit():
|
||||
return int(line.strip())
|
||||
return 0
|
||||
def _count_entries(path: Path) -> int:
|
||||
"""Return the entry count of an SRT file by finding the last cue number."""
|
||||
try:
|
||||
with open(path, encoding="utf-8", errors="replace") as f:
|
||||
lines = f.read().splitlines()
|
||||
for line in reversed(lines):
|
||||
if line.strip().isdigit():
|
||||
return int(line.strip())
|
||||
return 0
|
||||
except Exception:
|
||||
return 0
|
||||
|
||||
|
||||
class SubtitleIdentifier:
|
||||
@@ -56,15 +42,8 @@ class SubtitleIdentifier:
|
||||
the caller (use case) decides whether to ask the user for clarification.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
kb: SubtitleKnowledge,
|
||||
prober: MediaProber,
|
||||
scanner: FilesystemScanner,
|
||||
):
|
||||
def __init__(self, kb: SubtitleKnowledgeBase):
|
||||
self.kb = kb
|
||||
self.prober = prober
|
||||
self.scanner = scanner
|
||||
|
||||
def identify(
|
||||
self,
|
||||
@@ -91,119 +70,125 @@ class SubtitleIdentifier:
|
||||
return metadata
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Embedded tracks — via MediaProber
|
||||
# Embedded tracks — ffprobe
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
def _scan_embedded(self, video_path: Path) -> list[SubtitleScanResult]:
|
||||
streams = self.prober.list_subtitle_streams(video_path)
|
||||
def _scan_embedded(self, video_path: Path) -> list[SubtitleTrack]:
|
||||
if not video_path.exists():
|
||||
return []
|
||||
try:
|
||||
result = subprocess.run(
|
||||
[
|
||||
"ffprobe", "-v", "quiet",
|
||||
"-print_format", "json",
|
||||
"-show_streams",
|
||||
"-select_streams", "s",
|
||||
str(video_path),
|
||||
],
|
||||
capture_output=True, text=True, timeout=30,
|
||||
)
|
||||
data = json.loads(result.stdout)
|
||||
except (subprocess.TimeoutExpired, json.JSONDecodeError, FileNotFoundError) as e:
|
||||
logger.debug(f"SubtitleIdentifier: ffprobe failed for {video_path.name}: {e}")
|
||||
return []
|
||||
|
||||
tracks = []
|
||||
for stream in streams:
|
||||
lang = (
|
||||
self.kb.language_for_token(stream.language) if stream.language else None
|
||||
)
|
||||
for stream in data.get("streams", []):
|
||||
tags = stream.get("tags", {})
|
||||
disposition = stream.get("disposition", {})
|
||||
lang_code = tags.get("language", "")
|
||||
title = tags.get("title", "")
|
||||
|
||||
if stream.is_hearing_impaired:
|
||||
lang = self.kb.language_for_token(lang_code) if lang_code else None
|
||||
|
||||
if disposition.get("hearing_impaired"):
|
||||
stype = SubtitleType.SDH
|
||||
elif stream.is_forced:
|
||||
elif disposition.get("forced"):
|
||||
stype = SubtitleType.FORCED
|
||||
else:
|
||||
stype = SubtitleType.STANDARD
|
||||
|
||||
tracks.append(
|
||||
SubtitleScanResult(
|
||||
language=lang,
|
||||
format=None,
|
||||
subtitle_type=stype,
|
||||
is_embedded=True,
|
||||
raw_tokens=[stream.language] if stream.language else [],
|
||||
)
|
||||
)
|
||||
tracks.append(SubtitleTrack(
|
||||
language=lang,
|
||||
format=None,
|
||||
subtitle_type=stype,
|
||||
is_embedded=True,
|
||||
raw_tokens=[lang_code] if lang_code else [],
|
||||
))
|
||||
|
||||
logger.debug(
|
||||
f"SubtitleIdentifier: {len(tracks)} embedded track(s) in {video_path.name}"
|
||||
)
|
||||
logger.debug(f"SubtitleIdentifier: {len(tracks)} embedded track(s) in {video_path.name}")
|
||||
return tracks
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# External tracks — filesystem scan per pattern strategy
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
def _scan_external(
|
||||
self, video_path: Path, pattern: SubtitlePattern
|
||||
) -> list[SubtitleScanResult]:
|
||||
def _scan_external(self, video_path: Path, pattern: SubtitlePattern) -> list[SubtitleTrack]:
|
||||
strategy = pattern.scan_strategy
|
||||
episode_stem: str | None = None
|
||||
|
||||
if strategy == ScanStrategy.ADJACENT:
|
||||
candidates = self._find_adjacent(video_path)
|
||||
elif strategy == ScanStrategy.FLAT:
|
||||
candidates = self._find_flat(video_path, pattern.root_folder or "Subs")
|
||||
elif strategy == ScanStrategy.EPISODE_SUBFOLDER:
|
||||
candidates, episode_stem = self._find_episode_subfolder(
|
||||
video_path, pattern.root_folder or "Subs"
|
||||
)
|
||||
candidates = self._find_episode_subfolder(video_path, pattern.root_folder or "Subs")
|
||||
else:
|
||||
return []
|
||||
|
||||
return self._classify_files(candidates, pattern, episode_stem=episode_stem)
|
||||
return self._classify_files(candidates, pattern)
|
||||
|
||||
def _find_adjacent(self, video_path: Path) -> list:
|
||||
known = self.kb.known_extensions()
|
||||
def _find_adjacent(self, video_path: Path) -> list[Path]:
|
||||
return [
|
||||
entry
|
||||
for entry in self.scanner.scan_dir(video_path.parent)
|
||||
if entry.is_file
|
||||
and entry.suffix.lower() in known
|
||||
and entry.stem != video_path.stem
|
||||
p for p in sorted(video_path.parent.iterdir())
|
||||
if p.is_file() and p.suffix.lower() in self.kb.known_extensions()
|
||||
and p.stem != video_path.stem
|
||||
]
|
||||
|
||||
def _find_flat(self, video_path: Path, root_folder: str) -> list:
|
||||
known = self.kb.known_extensions()
|
||||
# Adjacent first, then release root (one level up)
|
||||
for subs_dir in (
|
||||
video_path.parent / root_folder,
|
||||
video_path.parent.parent / root_folder,
|
||||
):
|
||||
entries = self.scanner.scan_dir(subs_dir)
|
||||
if entries:
|
||||
return [
|
||||
e for e in entries if e.is_file and e.suffix.lower() in known
|
||||
]
|
||||
return []
|
||||
def _find_flat(self, video_path: Path, root_folder: str) -> list[Path]:
|
||||
subs_dir = video_path.parent / root_folder
|
||||
if not subs_dir.is_dir():
|
||||
# Also look at release root (one level up)
|
||||
subs_dir = video_path.parent.parent / root_folder
|
||||
if not subs_dir.is_dir():
|
||||
return []
|
||||
return [
|
||||
p for p in sorted(subs_dir.iterdir())
|
||||
if p.is_file() and p.suffix.lower() in self.kb.known_extensions()
|
||||
]
|
||||
|
||||
def _find_episode_subfolder(
|
||||
self, video_path: Path, root_folder: str
|
||||
) -> tuple[list, str]:
|
||||
"""Look for Subs/{episode_stem}/*.srt — adjacent or one level up."""
|
||||
def _find_episode_subfolder(self, video_path: Path, root_folder: str) -> list[Path]:
|
||||
"""
|
||||
Look for Subs/{episode_stem}/*.srt
|
||||
|
||||
Checks two locations:
|
||||
1. Adjacent to the video: video_path.parent / root_folder / video_path.stem
|
||||
2. Release root (one level up): video_path.parent.parent / root_folder / video_path.stem
|
||||
"""
|
||||
episode_stem = video_path.stem
|
||||
known = self.kb.known_extensions()
|
||||
for subs_dir in (
|
||||
candidates_dirs = [
|
||||
video_path.parent / root_folder / episode_stem,
|
||||
video_path.parent.parent / root_folder / episode_stem,
|
||||
):
|
||||
entries = self.scanner.scan_dir(subs_dir)
|
||||
files = [e for e in entries if e.is_file and e.suffix.lower() in known]
|
||||
if files:
|
||||
logger.debug(
|
||||
f"SubtitleIdentifier: found {len(files)} file(s) in {subs_dir}"
|
||||
)
|
||||
return files, episode_stem
|
||||
return [], episode_stem
|
||||
]
|
||||
for subs_dir in candidates_dirs:
|
||||
if subs_dir.is_dir():
|
||||
files = [
|
||||
p for p in sorted(subs_dir.iterdir())
|
||||
if p.is_file() and p.suffix.lower() in self.kb.known_extensions()
|
||||
]
|
||||
if files:
|
||||
logger.debug(f"SubtitleIdentifier: found {len(files)} file(s) in {subs_dir}")
|
||||
return files
|
||||
return []
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# Classification
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
def _classify_files(
|
||||
self,
|
||||
entries: list,
|
||||
pattern: SubtitlePattern,
|
||||
episode_stem: str | None = None,
|
||||
) -> list[SubtitleScanResult]:
|
||||
tracks = [
|
||||
self._classify_single(entry, episode_stem=episode_stem) for entry in entries
|
||||
]
|
||||
def _classify_files(self, paths: list[Path], pattern: SubtitlePattern) -> list[SubtitleTrack]:
|
||||
tracks = []
|
||||
for path in paths:
|
||||
track = self._classify_single(path)
|
||||
tracks.append(track)
|
||||
|
||||
# Post-process: if multiple tracks share same language but type is ambiguous,
|
||||
# apply size_and_count disambiguation
|
||||
@@ -212,15 +197,9 @@ class SubtitleIdentifier:
|
||||
|
||||
return tracks
|
||||
|
||||
def _classify_single(
|
||||
self, entry, episode_stem: str | None = None
|
||||
) -> SubtitleScanResult:
|
||||
fmt = self.kb.format_for_extension(entry.suffix)
|
||||
tokens = (
|
||||
_tokenize_suffix(entry.stem, episode_stem)
|
||||
if episode_stem
|
||||
else _tokenize(entry.stem)
|
||||
)
|
||||
def _classify_single(self, path: Path) -> SubtitleTrack:
|
||||
fmt = self.kb.format_for_extension(path.suffix)
|
||||
tokens = _tokenize(path.stem)
|
||||
|
||||
language = None
|
||||
subtitle_type = SubtitleType.UNKNOWN
|
||||
@@ -245,29 +224,25 @@ class SubtitleIdentifier:
|
||||
|
||||
if unknown_tokens:
|
||||
logger.debug(
|
||||
f"SubtitleIdentifier: unknown tokens in '{entry.name}': {unknown_tokens}"
|
||||
f"SubtitleIdentifier: unknown tokens in '{path.name}': {unknown_tokens}"
|
||||
)
|
||||
|
||||
# Entry count: only meaningful for SRT files; read text on demand.
|
||||
entry_count: int | None = None
|
||||
if entry.suffix.lower() == ".srt":
|
||||
entry_count = _count_entries(self.scanner.read_text(entry.path))
|
||||
size_kb = path.stat().st_size / 1024 if path.exists() else None
|
||||
entry_count = _count_entries(path) if path.exists() else None
|
||||
|
||||
return SubtitleScanResult(
|
||||
return SubtitleTrack(
|
||||
language=language,
|
||||
format=fmt,
|
||||
subtitle_type=subtitle_type,
|
||||
is_embedded=False,
|
||||
file_path=entry.path,
|
||||
file_size_kb=entry.size_kb,
|
||||
file_path=path,
|
||||
file_size_kb=size_kb,
|
||||
entry_count=entry_count,
|
||||
confidence=confidence,
|
||||
raw_tokens=tokens,
|
||||
)
|
||||
|
||||
def _disambiguate_by_size(
|
||||
self, tracks: list[SubtitleScanResult]
|
||||
) -> list[SubtitleScanResult]:
|
||||
def _disambiguate_by_size(self, tracks: list[SubtitleTrack]) -> list[SubtitleTrack]:
|
||||
"""
|
||||
When multiple tracks share the same language and type is UNKNOWN/STANDARD,
|
||||
the one with the most entries (lines) is SDH, the smallest is FORCED if
|
||||
@@ -275,15 +250,16 @@ class SubtitleIdentifier:
|
||||
|
||||
Only applied when type_detection = size_and_count.
|
||||
"""
|
||||
from itertools import groupby
|
||||
|
||||
# Group by language code
|
||||
lang_groups: dict[str, list[SubtitleScanResult]] = {}
|
||||
lang_groups: dict[str, list[SubtitleTrack]] = {}
|
||||
for track in tracks:
|
||||
key = track.language.code if track.language else "__unknown__"
|
||||
lang_groups.setdefault(key, []).append(track)
|
||||
|
||||
result = []
|
||||
for group in lang_groups.values():
|
||||
for lang_code, group in lang_groups.items():
|
||||
if len(group) == 1:
|
||||
result.extend(group)
|
||||
continue
|
||||
@@ -306,6 +282,6 @@ class SubtitleIdentifier:
|
||||
|
||||
return result
|
||||
|
||||
def _set_type(self, track: SubtitleScanResult, stype: SubtitleType) -> None:
|
||||
def _set_type(self, track: SubtitleTrack, stype: SubtitleType) -> None:
|
||||
"""Mutate track type in-place."""
|
||||
track.subtitle_type = stype
|
||||
|
||||
@@ -2,15 +2,15 @@
|
||||
|
||||
import logging
|
||||
|
||||
from ..entities import SubtitleScanResult
|
||||
from ..value_objects import SubtitleMatchingRules
|
||||
from ..entities import SubtitleTrack
|
||||
from ..value_objects import SubtitleMatchingRules, SubtitleType
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class SubtitleMatcher:
|
||||
"""
|
||||
Filters a list of SubtitleScanResult against effective SubtitleMatchingRules.
|
||||
Filters a list of SubtitleTrack against effective SubtitleMatchingRules.
|
||||
|
||||
Returns matched tracks (pass all filters, confidence >= min_confidence)
|
||||
and unresolved tracks (need user clarification).
|
||||
@@ -21,14 +21,14 @@ class SubtitleMatcher:
|
||||
|
||||
def match(
|
||||
self,
|
||||
tracks: list[SubtitleScanResult],
|
||||
tracks: list[SubtitleTrack],
|
||||
rules: SubtitleMatchingRules,
|
||||
) -> tuple[list[SubtitleScanResult], list[SubtitleScanResult]]:
|
||||
) -> tuple[list[SubtitleTrack], list[SubtitleTrack]]:
|
||||
"""
|
||||
Returns (matched, unresolved).
|
||||
"""
|
||||
matched: list[SubtitleScanResult] = []
|
||||
unresolved: list[SubtitleScanResult] = []
|
||||
matched: list[SubtitleTrack] = []
|
||||
unresolved: list[SubtitleTrack] = []
|
||||
|
||||
for track in tracks:
|
||||
if track.is_embedded:
|
||||
@@ -50,9 +50,7 @@ class SubtitleMatcher:
|
||||
)
|
||||
return matched, unresolved
|
||||
|
||||
def _passes_filters(
|
||||
self, track: SubtitleScanResult, rules: SubtitleMatchingRules
|
||||
) -> bool:
|
||||
def _passes_filters(self, track: SubtitleTrack, rules: SubtitleMatchingRules) -> bool:
|
||||
# Language filter
|
||||
if rules.preferred_languages:
|
||||
if not track.language:
|
||||
@@ -76,14 +74,14 @@ class SubtitleMatcher:
|
||||
|
||||
def _resolve_conflicts(
|
||||
self,
|
||||
tracks: list[SubtitleScanResult],
|
||||
tracks: list[SubtitleTrack],
|
||||
rules: SubtitleMatchingRules,
|
||||
) -> list[SubtitleScanResult]:
|
||||
) -> list[SubtitleTrack]:
|
||||
"""
|
||||
When multiple tracks have same language + type, keep only the best one
|
||||
according to format_priority. If no format_priority applies, keep the first.
|
||||
"""
|
||||
seen: dict[tuple, SubtitleScanResult] = {}
|
||||
seen: dict[tuple, SubtitleTrack] = {}
|
||||
|
||||
for track in tracks:
|
||||
lang = track.language.code if track.language else None
|
||||
@@ -106,8 +104,8 @@ class SubtitleMatcher:
|
||||
|
||||
def _prefer(
|
||||
self,
|
||||
candidate: SubtitleScanResult,
|
||||
existing: SubtitleScanResult,
|
||||
candidate: SubtitleTrack,
|
||||
existing: SubtitleTrack,
|
||||
format_priority: list[str],
|
||||
) -> bool:
|
||||
"""Return True if candidate is preferable to existing."""
|
||||
|
||||
@@ -1,10 +1,11 @@
|
||||
"""PatternDetector — discovers the subtitle structure of a release folder."""
|
||||
|
||||
import json
|
||||
import logging
|
||||
import subprocess
|
||||
from pathlib import Path
|
||||
|
||||
from ...shared.ports import FilesystemScanner, MediaProber
|
||||
from ..ports import SubtitleKnowledge
|
||||
from ..knowledge.base import SubtitleKnowledgeBase
|
||||
from ..value_objects import ScanStrategy, SubtitlePattern
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
@@ -19,15 +20,8 @@ class PatternDetector:
|
||||
a release follows. The result is proposed to the user for confirmation.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
kb: SubtitleKnowledge,
|
||||
prober: MediaProber,
|
||||
scanner: FilesystemScanner,
|
||||
):
|
||||
def __init__(self, kb: SubtitleKnowledgeBase):
|
||||
self.kb = kb
|
||||
self.prober = prober
|
||||
self.scanner = scanner
|
||||
|
||||
def detect(self, release_root: Path, sample_video: Path) -> dict:
|
||||
"""
|
||||
@@ -51,14 +45,29 @@ class PatternDetector:
|
||||
}
|
||||
|
||||
def _has_embedded_subtitles(self, video_path: Path) -> bool:
|
||||
return len(self.prober.list_subtitle_streams(video_path)) > 0
|
||||
"""Run ffprobe to check whether the video has embedded subtitle streams."""
|
||||
try:
|
||||
result = subprocess.run(
|
||||
[
|
||||
"ffprobe", "-v", "quiet",
|
||||
"-print_format", "json",
|
||||
"-show_streams",
|
||||
"-select_streams", "s",
|
||||
str(video_path),
|
||||
],
|
||||
capture_output=True, text=True, timeout=30,
|
||||
)
|
||||
data = json.loads(result.stdout)
|
||||
return len(data.get("streams", [])) > 0
|
||||
except (subprocess.TimeoutExpired, json.JSONDecodeError, FileNotFoundError):
|
||||
return False
|
||||
|
||||
def _inspect(self, release_root: Path, sample_video: Path) -> dict:
|
||||
"""Gather structural facts about the release."""
|
||||
known_exts = self.kb.known_extensions()
|
||||
findings: dict = {
|
||||
"has_subs_folder": False,
|
||||
"subs_strategy": None, # "flat" | "episode_subfolder"
|
||||
"subs_strategy": None, # "flat" | "episode_subfolder"
|
||||
"subs_root": None,
|
||||
"adjacent_subs": False,
|
||||
"has_embedded": self._has_embedded_subtitles(sample_video),
|
||||
@@ -68,59 +77,49 @@ class PatternDetector:
|
||||
}
|
||||
|
||||
# Check for Subs/ folder — adjacent or at release root
|
||||
for subs_candidate in (
|
||||
for subs_candidate in [
|
||||
sample_video.parent / "Subs",
|
||||
release_root / "Subs",
|
||||
):
|
||||
children = self.scanner.scan_dir(subs_candidate)
|
||||
if not children:
|
||||
continue
|
||||
]:
|
||||
if subs_candidate.is_dir():
|
||||
findings["has_subs_folder"] = True
|
||||
findings["subs_root"] = str(subs_candidate)
|
||||
|
||||
findings["has_subs_folder"] = True
|
||||
findings["subs_root"] = str(subs_candidate)
|
||||
# Is it flat or episode_subfolder?
|
||||
children = list(subs_candidate.iterdir())
|
||||
sub_files = [c for c in children if c.is_file() and c.suffix.lower() in known_exts]
|
||||
sub_dirs = [c for c in children if c.is_dir()]
|
||||
|
||||
# Is it flat or episode_subfolder?
|
||||
sub_files = [
|
||||
c for c in children if c.is_file and c.suffix.lower() in known_exts
|
||||
]
|
||||
sub_dirs = [c for c in children if c.is_dir]
|
||||
|
||||
if sub_dirs and not sub_files:
|
||||
findings["subs_strategy"] = "episode_subfolder"
|
||||
# Count files in a sample subfolder
|
||||
sample_files = [
|
||||
f
|
||||
for f in self.scanner.scan_dir(sub_dirs[0].path)
|
||||
if f.is_file and f.suffix.lower() in known_exts
|
||||
]
|
||||
findings["files_per_episode"] = len(sample_files)
|
||||
# Check naming conventions
|
||||
for f in sample_files:
|
||||
parts = f.stem.split("_")
|
||||
if parts[0].isdigit():
|
||||
findings["has_numeric_prefix"] = True
|
||||
if any(
|
||||
self.kb.is_known_lang_token(t.lower())
|
||||
for t in f.stem.replace("_", ".").split(".")
|
||||
):
|
||||
findings["has_lang_tokens"] = True
|
||||
else:
|
||||
findings["subs_strategy"] = "flat"
|
||||
findings["files_per_episode"] = len(sub_files)
|
||||
for f in sub_files:
|
||||
if any(
|
||||
self.kb.is_known_lang_token(t.lower())
|
||||
for t in f.stem.replace("_", ".").split(".")
|
||||
):
|
||||
findings["has_lang_tokens"] = True
|
||||
break
|
||||
if sub_dirs and not sub_files:
|
||||
findings["subs_strategy"] = "episode_subfolder"
|
||||
# Count files in a sample subfolder
|
||||
sample_sub = sub_dirs[0]
|
||||
sample_files = [f for f in sample_sub.iterdir()
|
||||
if f.is_file() and f.suffix.lower() in known_exts]
|
||||
findings["files_per_episode"] = len(sample_files)
|
||||
# Check naming conventions
|
||||
for f in sample_files:
|
||||
stem = f.stem
|
||||
parts = stem.split("_")
|
||||
if parts[0].isdigit():
|
||||
findings["has_numeric_prefix"] = True
|
||||
if any(self.kb.is_known_lang_token(t.lower())
|
||||
for t in stem.replace("_", ".").split(".")):
|
||||
findings["has_lang_tokens"] = True
|
||||
else:
|
||||
findings["subs_strategy"] = "flat"
|
||||
findings["files_per_episode"] = len(sub_files)
|
||||
for f in sub_files:
|
||||
if any(self.kb.is_known_lang_token(t.lower())
|
||||
for t in f.stem.replace("_", ".").split(".")):
|
||||
findings["has_lang_tokens"] = True
|
||||
break
|
||||
|
||||
# Check adjacent subs (next to the video)
|
||||
if not findings["has_subs_folder"]:
|
||||
adjacent = [
|
||||
e
|
||||
for e in self.scanner.scan_dir(sample_video.parent)
|
||||
if e.is_file and e.suffix.lower() in known_exts
|
||||
p for p in sample_video.parent.iterdir()
|
||||
if p.is_file() and p.suffix.lower() in known_exts
|
||||
]
|
||||
if adjacent:
|
||||
findings["adjacent_subs"] = True
|
||||
@@ -158,9 +157,7 @@ class PatternDetector:
|
||||
total += 1
|
||||
if findings.get("has_embedded"):
|
||||
score += 1.0
|
||||
if not findings.get("has_subs_folder") and not findings.get(
|
||||
"adjacent_subs"
|
||||
):
|
||||
if not findings.get("has_subs_folder") and not findings.get("adjacent_subs"):
|
||||
score += 0.5
|
||||
total += 0.5
|
||||
|
||||
@@ -203,6 +200,6 @@ class PatternDetector:
|
||||
parts.append("no external subtitle files found")
|
||||
|
||||
if findings.get("has_embedded"):
|
||||
parts.append("embedded tracks detected")
|
||||
parts.append("embedded tracks detected (ffprobe)")
|
||||
|
||||
return " — ".join(parts) if parts else "nothing found"
|
||||
|
||||
+19
-42
@@ -5,32 +5,11 @@ import os
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
|
||||
from alfred.domain.subtitles.entities import SubtitleScanResult
|
||||
from alfred.domain.subtitles.value_objects import SubtitleType
|
||||
from ..entities import SubtitleTrack
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _build_dest_name(track: SubtitleScanResult, video_stem: str) -> str:
|
||||
"""
|
||||
Build the destination filename for a subtitle track.
|
||||
|
||||
Format: {video_stem}.{lang}.{ext}
|
||||
{video_stem}.{lang}.sdh.{ext}
|
||||
{video_stem}.{lang}.forced.{ext}
|
||||
"""
|
||||
if not track.language or not track.format:
|
||||
raise ValueError("Cannot compute destination name: language or format missing")
|
||||
|
||||
ext = track.format.extensions[0].lstrip(".")
|
||||
parts = [video_stem, track.language.code]
|
||||
if track.subtitle_type == SubtitleType.SDH:
|
||||
parts.append("sdh")
|
||||
elif track.subtitle_type == SubtitleType.FORCED:
|
||||
parts.append("forced")
|
||||
return ".".join(parts) + "." + ext
|
||||
|
||||
|
||||
@dataclass
|
||||
class PlacedTrack:
|
||||
source: Path
|
||||
@@ -41,7 +20,7 @@ class PlacedTrack:
|
||||
@dataclass
|
||||
class PlaceResult:
|
||||
placed: list[PlacedTrack]
|
||||
skipped: list[tuple[SubtitleScanResult, str]] # (track, reason)
|
||||
skipped: list[tuple[SubtitleTrack, str]] # (track, reason)
|
||||
|
||||
@property
|
||||
def placed_count(self) -> int:
|
||||
@@ -54,7 +33,7 @@ class PlaceResult:
|
||||
|
||||
class SubtitlePlacer:
|
||||
"""
|
||||
Hard-links matched SubtitleScanResult files next to a destination video.
|
||||
Hard-links matched SubtitleTrack files next to a destination video.
|
||||
|
||||
Uses the same hard-link strategy as FileManager.copy_file:
|
||||
instant, no data duplication, qBittorrent keeps seeding.
|
||||
@@ -64,11 +43,11 @@ class SubtitlePlacer:
|
||||
|
||||
def place(
|
||||
self,
|
||||
tracks: list[SubtitleScanResult],
|
||||
tracks: list[SubtitleTrack],
|
||||
destination_video: Path,
|
||||
) -> PlaceResult:
|
||||
placed: list[PlacedTrack] = []
|
||||
skipped: list[tuple[SubtitleScanResult, str]] = []
|
||||
skipped: list[tuple[SubtitleTrack, str]] = []
|
||||
|
||||
dest_dir = destination_video.parent
|
||||
|
||||
@@ -78,33 +57,31 @@ class SubtitlePlacer:
|
||||
skipped.append((track, "embedded — no file to place"))
|
||||
continue
|
||||
|
||||
if not track.file_path:
|
||||
skipped.append((track, "source file not set"))
|
||||
if not track.file_path or not track.file_path.exists():
|
||||
skipped.append((track, "source file not found"))
|
||||
continue
|
||||
|
||||
try:
|
||||
dest_name = _build_dest_name(track, destination_video.stem)
|
||||
dest_name = track.destination_name
|
||||
except ValueError as e:
|
||||
skipped.append((track, str(e)))
|
||||
continue
|
||||
|
||||
dest_path = dest_dir / dest_name
|
||||
|
||||
try:
|
||||
os.link(track.file_path, dest_path)
|
||||
placed.append(
|
||||
PlacedTrack(
|
||||
source=track.file_path,
|
||||
destination=dest_path,
|
||||
filename=dest_name,
|
||||
)
|
||||
)
|
||||
logger.info(f"SubtitlePlacer: placed {dest_name}")
|
||||
except FileNotFoundError:
|
||||
skipped.append((track, "source file not found"))
|
||||
except FileExistsError:
|
||||
if dest_path.exists():
|
||||
logger.debug(f"SubtitlePlacer: skip {dest_name} — already exists")
|
||||
skipped.append((track, "destination already exists"))
|
||||
continue
|
||||
|
||||
try:
|
||||
os.link(track.file_path, dest_path)
|
||||
placed.append(PlacedTrack(
|
||||
source=track.file_path,
|
||||
destination=dest_path,
|
||||
filename=dest_name,
|
||||
))
|
||||
logger.info(f"SubtitlePlacer: placed {dest_name}")
|
||||
except OSError as e:
|
||||
logger.warning(f"SubtitlePlacer: failed to place {dest_name}: {e}")
|
||||
skipped.append((track, str(e)))
|
||||
@@ -1,9 +1,9 @@
|
||||
"""Subtitle service utilities."""
|
||||
|
||||
from ..entities import SubtitleScanResult
|
||||
from ..entities import SubtitleTrack
|
||||
|
||||
|
||||
def available_subtitles(tracks: list[SubtitleScanResult]) -> list[SubtitleScanResult]:
|
||||
def available_subtitles(tracks: list[SubtitleTrack]) -> list[SubtitleTrack]:
|
||||
"""
|
||||
Return the distinct subtitle tracks available, deduped by (language, type).
|
||||
|
||||
@@ -11,7 +11,7 @@ def available_subtitles(tracks: list[SubtitleScanResult]) -> list[SubtitleScanRe
|
||||
preferences — e.g. eng, eng.sdh, fra all show up as separate entries.
|
||||
"""
|
||||
seen: set[tuple] = set()
|
||||
result: list[SubtitleScanResult] = []
|
||||
result: list[SubtitleTrack] = []
|
||||
for track in tracks:
|
||||
lang = track.language.code if track.language else None
|
||||
key = (lang, track.subtitle_type)
|
||||
|
||||
@@ -2,15 +2,17 @@
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from enum import Enum
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
|
||||
class ScanStrategy(Enum):
|
||||
"""How to locate subtitle files for a given release."""
|
||||
|
||||
ADJACENT = "adjacent" # .srt next to the video
|
||||
FLAT = "flat" # Subs/*.srt
|
||||
ADJACENT = "adjacent" # .srt next to the video
|
||||
FLAT = "flat" # Subs/*.srt
|
||||
EPISODE_SUBFOLDER = "episode_subfolder" # Subs/{episode_name}/*.srt
|
||||
EMBEDDED = "embedded" # tracks inside the video container
|
||||
EMBEDDED = "embedded" # tracks inside the video container
|
||||
|
||||
|
||||
class TypeDetectionMethod(Enum):
|
||||
@@ -44,7 +46,7 @@ class SubtitleFormat:
|
||||
class SubtitleLanguage:
|
||||
"""A known subtitle language with its recognition tokens."""
|
||||
|
||||
code: str # ISO 639-1
|
||||
code: str # ISO 639-1
|
||||
tokens: list[str] # lowercase
|
||||
|
||||
def matches_token(self, token: str) -> bool:
|
||||
@@ -64,7 +66,7 @@ class SubtitlePattern:
|
||||
id: str
|
||||
description: str
|
||||
scan_strategy: ScanStrategy
|
||||
root_folder: str | None # e.g. "Subs", None for adjacent/embedded
|
||||
root_folder: str | None # e.g. "Subs", None for adjacent/embedded
|
||||
type_detection: TypeDetectionMethod
|
||||
version: str = "1.0"
|
||||
|
||||
@@ -76,27 +78,16 @@ class SubtitleMatchingRules:
|
||||
Only stores actual values — None means "inherited, not overridden at this level".
|
||||
"""
|
||||
|
||||
preferred_languages: list[str] = field(default_factory=list) # ISO 639-1 codes
|
||||
preferred_formats: list[str] = field(default_factory=list) # format ids
|
||||
allowed_types: list[str] = field(default_factory=list) # SubtitleType ids
|
||||
format_priority: list[str] = field(default_factory=list) # ordered format ids
|
||||
preferred_languages: list[str] = field(default_factory=list) # ISO 639-1 codes
|
||||
preferred_formats: list[str] = field(default_factory=list) # format ids
|
||||
allowed_types: list[str] = field(default_factory=list) # SubtitleType ids
|
||||
format_priority: list[str] = field(default_factory=list) # ordered format ids
|
||||
min_confidence: float = 0.7
|
||||
|
||||
|
||||
class RuleScopeLevel(str, Enum):
|
||||
"""At which level a subtitle rule set applies."""
|
||||
|
||||
GLOBAL = "global"
|
||||
RELEASE_GROUP = "release_group"
|
||||
MOVIE = "movie"
|
||||
SHOW = "show"
|
||||
SEASON = "season"
|
||||
EPISODE = "episode"
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class RuleScope:
|
||||
"""At which level a rule set applies."""
|
||||
|
||||
level: RuleScopeLevel
|
||||
identifier: str | None = None # imdb_id, group name, "S01", "S01E03"…
|
||||
level: str # "global" | "release_group" | "movie" | "show" | "season" | "episode"
|
||||
identifier: str | None = None # imdb_id, group name, "S01", "S01E03"…
|
||||
|
||||
@@ -2,22 +2,18 @@
|
||||
|
||||
from .entities import Episode, Season, TVShow
|
||||
from .exceptions import InvalidEpisode, SeasonNotFound, TVShowNotFound
|
||||
from .value_objects import (
|
||||
CollectionStatus,
|
||||
EpisodeNumber,
|
||||
SeasonNumber,
|
||||
ShowStatus,
|
||||
)
|
||||
from .services import TVShowService
|
||||
from .value_objects import EpisodeNumber, SeasonNumber, ShowStatus
|
||||
|
||||
__all__ = [
|
||||
"TVShow",
|
||||
"Season",
|
||||
"Episode",
|
||||
"ShowStatus",
|
||||
"CollectionStatus",
|
||||
"SeasonNumber",
|
||||
"EpisodeNumber",
|
||||
"TVShowNotFound",
|
||||
"InvalidEpisode",
|
||||
"SeasonNotFound",
|
||||
"TVShowService",
|
||||
]
|
||||
|
||||
+118
-346
@@ -1,249 +1,120 @@
|
||||
"""TV Show domain entities.
|
||||
|
||||
This module implements the TVShow aggregate following DDD principles.
|
||||
|
||||
Aggregate ownership::
|
||||
|
||||
TVShow ← aggregate root (the repo returns this)
|
||||
└── seasons: dict[SeasonNumber, Season]
|
||||
└── Season
|
||||
└── episodes: dict[EpisodeNumber, Episode]
|
||||
└── Episode ← file metadata + audio/subtitle tracks
|
||||
|
||||
Rules:
|
||||
|
||||
* ``TVShow`` is the aggregate **root** — the only entity exposed by the
|
||||
repository.
|
||||
* ``Season`` is owned by TVShow. ``Episode`` is owned by Season.
|
||||
* Children do not back-reference the root (no ``show_imdb_id`` on
|
||||
Season/Episode): they are only ever reached *through* TVShow.
|
||||
* Mutation invariants are enforced through aggregate-root methods such as
|
||||
``TVShow.add_episode()`` — never reach into ``show.seasons[...].episodes``
|
||||
to mutate without going through the root, otherwise invariants are not
|
||||
guaranteed.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
"""TV Show domain entities."""
|
||||
|
||||
import re
|
||||
from dataclasses import dataclass, field
|
||||
|
||||
from ..shared.media import AudioTrack, MediaWithTracks, SubtitleTrack
|
||||
from ..shared.value_objects import (
|
||||
FilePath,
|
||||
FileSize,
|
||||
ImdbId,
|
||||
to_dot_folder_name,
|
||||
)
|
||||
from .value_objects import (
|
||||
CollectionStatus,
|
||||
EpisodeNumber,
|
||||
SeasonNumber,
|
||||
ShowStatus,
|
||||
)
|
||||
|
||||
# ════════════════════════════════════════════════════════════════════════════
|
||||
# Episode
|
||||
# ════════════════════════════════════════════════════════════════════════════
|
||||
from ..shared.value_objects import FilePath, FileSize, ImdbId
|
||||
from .value_objects import EpisodeNumber, SeasonNumber, ShowStatus
|
||||
|
||||
|
||||
@dataclass(frozen=True, eq=False)
|
||||
class Episode(MediaWithTracks):
|
||||
@dataclass
|
||||
class TVShow:
|
||||
"""
|
||||
A single episode of a TV show — leaf of the TVShow aggregate.
|
||||
TV Show entity representing a TV show in the media library.
|
||||
|
||||
Carries the file metadata (path, size) and the discovered tracks
|
||||
(audio + subtitle). Track tuples are populated by the ffprobe + subtitle
|
||||
scan pipeline; they may be empty when the episode is known but not yet
|
||||
scanned, or when no file is downloaded yet.
|
||||
|
||||
Frozen: rebuild via ``dataclasses.replace`` to project enrichment results
|
||||
onto a new instance.
|
||||
|
||||
Equality is identity-based within the aggregate: two ``Episode`` instances
|
||||
are equal iff they share the same ``(season_number, episode_number)``,
|
||||
regardless of title/file/track contents. The root TVShow guarantees
|
||||
cross-show uniqueness.
|
||||
This is the main aggregate root for the TV shows domain.
|
||||
Migrated from agent/models/tv_show.py
|
||||
"""
|
||||
|
||||
season_number: SeasonNumber
|
||||
episode_number: EpisodeNumber
|
||||
imdb_id: ImdbId
|
||||
title: str
|
||||
file_path: FilePath | None = None
|
||||
file_size: FileSize | None = None
|
||||
audio_tracks: tuple[AudioTrack, ...] = field(default_factory=tuple)
|
||||
subtitle_tracks: tuple[SubtitleTrack, ...] = field(default_factory=tuple)
|
||||
seasons_count: int
|
||||
status: ShowStatus
|
||||
tmdb_id: int | None = None
|
||||
|
||||
def __post_init__(self) -> None:
|
||||
# Coerce numbers if raw ints were passed
|
||||
if not isinstance(self.season_number, SeasonNumber):
|
||||
if isinstance(self.season_number, int):
|
||||
object.__setattr__(
|
||||
self, "season_number", SeasonNumber(self.season_number)
|
||||
)
|
||||
if not isinstance(self.episode_number, EpisodeNumber):
|
||||
if isinstance(self.episode_number, int):
|
||||
object.__setattr__(
|
||||
self, "episode_number", EpisodeNumber(self.episode_number)
|
||||
def __post_init__(self):
|
||||
"""Validate TV show entity."""
|
||||
# Ensure ImdbId is actually an ImdbId instance
|
||||
if not isinstance(self.imdb_id, ImdbId):
|
||||
if isinstance(self.imdb_id, str):
|
||||
object.__setattr__(self, "imdb_id", ImdbId(self.imdb_id))
|
||||
else:
|
||||
raise ValueError(
|
||||
f"imdb_id must be ImdbId or str, got {type(self.imdb_id)}"
|
||||
)
|
||||
|
||||
def __eq__(self, other: object) -> bool:
|
||||
if not isinstance(other, Episode):
|
||||
return NotImplemented
|
||||
return (
|
||||
self.season_number == other.season_number
|
||||
and self.episode_number == other.episode_number
|
||||
)
|
||||
# Ensure ShowStatus is actually a ShowStatus instance
|
||||
if not isinstance(self.status, ShowStatus):
|
||||
if isinstance(self.status, str):
|
||||
object.__setattr__(self, "status", ShowStatus.from_string(self.status))
|
||||
else:
|
||||
raise ValueError(
|
||||
f"status must be ShowStatus or str, got {type(self.status)}"
|
||||
)
|
||||
|
||||
def __hash__(self) -> int:
|
||||
return hash((self.season_number, self.episode_number))
|
||||
# Validate seasons_count
|
||||
if not isinstance(self.seasons_count, int) or self.seasons_count < 0:
|
||||
raise ValueError(
|
||||
f"seasons_count must be a non-negative integer, got {self.seasons_count}"
|
||||
)
|
||||
|
||||
# Track helpers (has_audio_in / audio_languages / has_subtitles_in /
|
||||
# has_forced_subs / subtitle_languages) come from MediaWithTracks.
|
||||
def is_ongoing(self) -> bool:
|
||||
"""Check if the show is still ongoing."""
|
||||
return self.status == ShowStatus.ONGOING
|
||||
|
||||
# ── Naming ─────────────────────────────────────────────────────────────
|
||||
def is_ended(self) -> bool:
|
||||
"""Check if the show has ended."""
|
||||
return self.status == ShowStatus.ENDED
|
||||
|
||||
def get_filename(self) -> str:
|
||||
"""Suggested filename: ``S01E05.Pilot``."""
|
||||
season_str = f"S{self.season_number.value:02d}"
|
||||
episode_str = f"E{self.episode_number.value:02d}"
|
||||
clean_title = re.sub(r"[^\w\s\-]", "", self.title)
|
||||
clean_title = clean_title.replace(" ", ".")
|
||||
return f"{season_str}{episode_str}.{clean_title}"
|
||||
def get_folder_name(self) -> str:
|
||||
"""
|
||||
Get the folder name for this TV show.
|
||||
|
||||
Format: "Title"
|
||||
Example: "Breaking.Bad"
|
||||
"""
|
||||
# Remove special characters and replace spaces with dots
|
||||
cleaned = re.sub(r"[^\w\s\.\-]", "", self.title)
|
||||
return cleaned.replace(" ", ".")
|
||||
|
||||
def __str__(self) -> str:
|
||||
return f"S{self.season_number.value:02d}E{self.episode_number.value:02d} - {self.title}"
|
||||
return f"{self.title} ({self.status.value}, {self.seasons_count} seasons)"
|
||||
|
||||
def __repr__(self) -> str:
|
||||
return (
|
||||
f"Episode(S{self.season_number.value:02d}E{self.episode_number.value:02d})"
|
||||
)
|
||||
|
||||
|
||||
# ════════════════════════════════════════════════════════════════════════════
|
||||
# Season
|
||||
# ════════════════════════════════════════════════════════════════════════════
|
||||
return f"TVShow(imdb_id={self.imdb_id}, title='{self.title}')"
|
||||
|
||||
|
||||
@dataclass
|
||||
class Season:
|
||||
"""
|
||||
A season of a TV show — owned by ``TVShow``.
|
||||
|
||||
Owns its episodes via the ``episodes`` dict keyed by ``EpisodeNumber``.
|
||||
|
||||
Two TMDB-sourced counts shape the collection logic:
|
||||
|
||||
* ``expected_episodes`` — total episodes planned for the season
|
||||
(``None`` if unknown).
|
||||
* ``aired_episodes`` — episodes **already aired** as of the latest TMDB
|
||||
refresh. ``None`` falls back to ``expected_episodes`` (best-effort).
|
||||
|
||||
The split matters: ``is_complete()`` checks owned against aired, so a season
|
||||
in the middle of broadcasting can be "complete" today and become "partial"
|
||||
later when new episodes air — that is correct behavior.
|
||||
Season entity representing a season of a TV show.
|
||||
"""
|
||||
|
||||
show_imdb_id: ImdbId
|
||||
season_number: SeasonNumber
|
||||
episodes: dict[EpisodeNumber, Episode] = field(default_factory=dict)
|
||||
expected_episodes: int | None = None
|
||||
aired_episodes: int | None = None
|
||||
episode_count: int
|
||||
name: str | None = None
|
||||
|
||||
def __post_init__(self) -> None:
|
||||
def __post_init__(self):
|
||||
"""Validate season entity."""
|
||||
# Ensure ImdbId is actually an ImdbId instance
|
||||
if not isinstance(self.show_imdb_id, ImdbId):
|
||||
if isinstance(self.show_imdb_id, str):
|
||||
object.__setattr__(self, "show_imdb_id", ImdbId(self.show_imdb_id))
|
||||
|
||||
# Ensure SeasonNumber is actually a SeasonNumber instance
|
||||
if not isinstance(self.season_number, SeasonNumber):
|
||||
if isinstance(self.season_number, int):
|
||||
self.season_number = SeasonNumber(self.season_number)
|
||||
object.__setattr__(
|
||||
self, "season_number", SeasonNumber(self.season_number)
|
||||
)
|
||||
|
||||
if self.expected_episodes is not None and self.expected_episodes < 0:
|
||||
# Validate episode_count
|
||||
if not isinstance(self.episode_count, int) or self.episode_count < 0:
|
||||
raise ValueError(
|
||||
f"expected_episodes must be >= 0, got {self.expected_episodes}"
|
||||
f"episode_count must be a non-negative integer, got {self.episode_count}"
|
||||
)
|
||||
if self.aired_episodes is not None and self.aired_episodes < 0:
|
||||
raise ValueError(f"aired_episodes must be >= 0, got {self.aired_episodes}")
|
||||
if (
|
||||
self.expected_episodes is not None
|
||||
and self.aired_episodes is not None
|
||||
and self.aired_episodes > self.expected_episodes
|
||||
):
|
||||
raise ValueError(
|
||||
f"aired_episodes ({self.aired_episodes}) cannot exceed "
|
||||
f"expected_episodes ({self.expected_episodes})"
|
||||
)
|
||||
|
||||
# ── Properties ─────────────────────────────────────────────────────────
|
||||
|
||||
@property
|
||||
def episode_count(self) -> int:
|
||||
"""Number of episodes currently owned in this season."""
|
||||
return len(self.episodes)
|
||||
|
||||
# ── Collection state ───────────────────────────────────────────────────
|
||||
|
||||
def _effective_aired(self) -> int | None:
|
||||
"""``aired_episodes`` if set, else fall back to ``expected_episodes``."""
|
||||
return (
|
||||
self.aired_episodes
|
||||
if self.aired_episodes is not None
|
||||
else self.expected_episodes
|
||||
)
|
||||
|
||||
def is_complete(self) -> bool:
|
||||
"""
|
||||
True if every aired episode is owned.
|
||||
|
||||
Returns False (conservative) when the aired count is unknown — without
|
||||
knowing how many episodes have aired we cannot claim completeness.
|
||||
"""
|
||||
aired = self._effective_aired()
|
||||
if aired is None:
|
||||
return False
|
||||
if aired == 0:
|
||||
# No episode has aired yet → trivially "complete"
|
||||
return True
|
||||
return len(self.episodes) >= aired
|
||||
|
||||
def is_fully_aired(self) -> bool:
|
||||
"""True if all planned episodes have already aired."""
|
||||
if self.expected_episodes is None or self.aired_episodes is None:
|
||||
return False
|
||||
return self.aired_episodes >= self.expected_episodes
|
||||
|
||||
def missing_episodes(self) -> list[EpisodeNumber]:
|
||||
"""
|
||||
List of episode numbers that have aired but are not owned.
|
||||
|
||||
Episodes beyond ``aired_episodes`` are **not** considered missing
|
||||
(they have not aired yet). When the aired count is unknown, returns
|
||||
an empty list — we cannot reason about gaps without a target.
|
||||
"""
|
||||
aired = self._effective_aired()
|
||||
if aired is None or aired <= 0:
|
||||
return []
|
||||
present = {ep.value for ep in self.episodes}
|
||||
return [EpisodeNumber(n) for n in range(1, aired + 1) if n not in present]
|
||||
|
||||
# ── Mutation (called through the aggregate root) ───────────────────────
|
||||
|
||||
def add_episode(self, episode: Episode) -> None:
|
||||
"""
|
||||
Insert an episode into this season. Replaces any episode with the same
|
||||
number — callers wishing to detect conflicts should check beforehand.
|
||||
"""
|
||||
if episode.season_number != self.season_number:
|
||||
raise ValueError(
|
||||
f"Episode season ({episode.season_number}) does not match season "
|
||||
f"({self.season_number})"
|
||||
)
|
||||
self.episodes[episode.episode_number] = episode
|
||||
|
||||
# ── Naming ─────────────────────────────────────────────────────────────
|
||||
|
||||
def is_special(self) -> bool:
|
||||
"""Check if this is the specials season."""
|
||||
return self.season_number.is_special()
|
||||
|
||||
def get_folder_name(self) -> str:
|
||||
"""``Season 01`` or ``Specials`` for season 0."""
|
||||
"""
|
||||
Get the folder name for this season.
|
||||
|
||||
Format: "Season 01" or "Specials" for season 0
|
||||
"""
|
||||
if self.is_special():
|
||||
return "Specials"
|
||||
return f"Season {self.season_number.value:02d}"
|
||||
@@ -254,168 +125,69 @@ class Season:
|
||||
return f"Season {self.season_number.value}"
|
||||
|
||||
def __repr__(self) -> str:
|
||||
return (
|
||||
f"Season(number={self.season_number.value}, episodes={len(self.episodes)})"
|
||||
)
|
||||
|
||||
|
||||
# ════════════════════════════════════════════════════════════════════════════
|
||||
# TVShow — aggregate root
|
||||
# ════════════════════════════════════════════════════════════════════════════
|
||||
return f"Season(show={self.show_imdb_id}, number={self.season_number.value})"
|
||||
|
||||
|
||||
@dataclass
|
||||
class TVShow:
|
||||
class Episode:
|
||||
"""
|
||||
Aggregate root for the TV shows domain.
|
||||
|
||||
Owns its seasons via the ``seasons`` dict keyed by ``SeasonNumber``.
|
||||
All mutations (adding episodes, creating seasons) MUST go through the
|
||||
methods on this class — that is how invariants are preserved.
|
||||
|
||||
Two axes describe the show, kept deliberately orthogonal:
|
||||
|
||||
* ``status`` (``ShowStatus``) — production state (TMDB-sourced).
|
||||
* ``collection_status()`` — what the user owns vs what has aired today.
|
||||
|
||||
A third axis (upcoming/scheduled) will be added later as a separate flag
|
||||
when scheduling support is introduced; for now we make no claim about
|
||||
future episodes.
|
||||
Episode entity representing an episode of a TV show.
|
||||
"""
|
||||
|
||||
imdb_id: ImdbId
|
||||
show_imdb_id: ImdbId
|
||||
season_number: SeasonNumber
|
||||
episode_number: EpisodeNumber
|
||||
title: str
|
||||
status: ShowStatus
|
||||
seasons: dict[SeasonNumber, Season] = field(default_factory=dict)
|
||||
expected_seasons: int | None = None
|
||||
tmdb_id: int | None = None
|
||||
file_path: FilePath | None = None
|
||||
file_size: FileSize | None = None
|
||||
|
||||
def __post_init__(self) -> None:
|
||||
if not isinstance(self.imdb_id, ImdbId):
|
||||
if isinstance(self.imdb_id, str):
|
||||
self.imdb_id = ImdbId(self.imdb_id)
|
||||
else:
|
||||
raise ValueError(
|
||||
f"imdb_id must be ImdbId or str, got {type(self.imdb_id)}"
|
||||
def __post_init__(self):
|
||||
"""Validate episode entity."""
|
||||
# Ensure ImdbId is actually an ImdbId instance
|
||||
if not isinstance(self.show_imdb_id, ImdbId):
|
||||
if isinstance(self.show_imdb_id, str):
|
||||
object.__setattr__(self, "show_imdb_id", ImdbId(self.show_imdb_id))
|
||||
|
||||
# Ensure SeasonNumber is actually a SeasonNumber instance
|
||||
if not isinstance(self.season_number, SeasonNumber):
|
||||
if isinstance(self.season_number, int):
|
||||
object.__setattr__(
|
||||
self, "season_number", SeasonNumber(self.season_number)
|
||||
)
|
||||
|
||||
if not isinstance(self.status, ShowStatus):
|
||||
if isinstance(self.status, str):
|
||||
self.status = ShowStatus.from_string(self.status)
|
||||
else:
|
||||
raise ValueError(
|
||||
f"status must be ShowStatus or str, got {type(self.status)}"
|
||||
# Ensure EpisodeNumber is actually an EpisodeNumber instance
|
||||
if not isinstance(self.episode_number, EpisodeNumber):
|
||||
if isinstance(self.episode_number, int):
|
||||
object.__setattr__(
|
||||
self, "episode_number", EpisodeNumber(self.episode_number)
|
||||
)
|
||||
|
||||
if self.expected_seasons is not None and self.expected_seasons < 0:
|
||||
raise ValueError(
|
||||
f"expected_seasons must be >= 0, got {self.expected_seasons}"
|
||||
)
|
||||
def has_file(self) -> bool:
|
||||
"""Check if the episode has an associated file."""
|
||||
return self.file_path is not None and self.file_path.exists()
|
||||
|
||||
# ── Production-state queries ───────────────────────────────────────────
|
||||
def is_downloaded(self) -> bool:
|
||||
"""Check if the episode is downloaded."""
|
||||
return self.has_file()
|
||||
|
||||
def is_ongoing(self) -> bool:
|
||||
return self.status == ShowStatus.ONGOING
|
||||
|
||||
def is_ended(self) -> bool:
|
||||
return self.status == ShowStatus.ENDED
|
||||
|
||||
# ── Properties ─────────────────────────────────────────────────────────
|
||||
|
||||
@property
|
||||
def seasons_count(self) -> int:
|
||||
"""Number of seasons currently owned (any episode count, even 0)."""
|
||||
return len(self.seasons)
|
||||
|
||||
@property
|
||||
def episode_count(self) -> int:
|
||||
"""Total episodes owned across all seasons."""
|
||||
return sum(s.episode_count for s in self.seasons.values())
|
||||
|
||||
# ── Mutation — the sole entry point for adding content ─────────────────
|
||||
|
||||
def add_episode(self, episode: Episode) -> None:
|
||||
def get_filename(self) -> str:
|
||||
"""
|
||||
Add an episode to the appropriate season, creating the season if needed.
|
||||
Get the suggested filename for this episode.
|
||||
|
||||
This is the **only** sanctioned way to add content to the aggregate —
|
||||
it preserves the invariant that an episode is always reachable through
|
||||
``show.seasons[s].episodes[e]``.
|
||||
Format: "S01E01 - Episode Title.ext"
|
||||
Example: "S01E05 - Pilot.mkv"
|
||||
"""
|
||||
season = self.seasons.get(episode.season_number)
|
||||
if season is None:
|
||||
season = Season(season_number=episode.season_number)
|
||||
self.seasons[episode.season_number] = season
|
||||
season.add_episode(episode)
|
||||
season_str = f"S{self.season_number.value:02d}"
|
||||
episode_str = f"E{self.episode_number.value:02d}"
|
||||
|
||||
def add_season(self, season: Season) -> None:
|
||||
"""
|
||||
Attach a (possibly already populated) Season to the show.
|
||||
# Clean title for filename
|
||||
clean_title = re.sub(r"[^\w\s\-]", "", self.title)
|
||||
clean_title = clean_title.replace(" ", ".")
|
||||
|
||||
Replaces any existing season with the same number.
|
||||
"""
|
||||
self.seasons[season.season_number] = season
|
||||
|
||||
# ── Collection state ───────────────────────────────────────────────────
|
||||
|
||||
def collection_status(self) -> CollectionStatus:
|
||||
"""
|
||||
High-level state of the user's collection for this show.
|
||||
|
||||
* ``EMPTY`` — no episode owned
|
||||
* ``COMPLETE`` — every season is complete relative to its aired count
|
||||
* ``PARTIAL`` — at least one aired episode is missing
|
||||
|
||||
Seasons with an unknown aired count are treated conservatively: if no
|
||||
season has any episode, the show is EMPTY; otherwise the unknown
|
||||
seasons cannot prove completeness, so the show is PARTIAL.
|
||||
"""
|
||||
if self.episode_count == 0:
|
||||
return CollectionStatus.EMPTY
|
||||
|
||||
# Check completeness across all seasons we know about
|
||||
for season in self.seasons.values():
|
||||
if not season.is_complete():
|
||||
return CollectionStatus.PARTIAL
|
||||
|
||||
# We also need to consider whether seasons themselves are missing.
|
||||
# If expected_seasons is known and we have fewer seasons than expected,
|
||||
# the missing seasons may have aired episodes → cannot claim COMPLETE.
|
||||
if (
|
||||
self.expected_seasons is not None
|
||||
and len(self.seasons) < self.expected_seasons
|
||||
):
|
||||
return CollectionStatus.PARTIAL
|
||||
|
||||
return CollectionStatus.COMPLETE
|
||||
|
||||
def is_complete_series(self) -> bool:
|
||||
"""
|
||||
True if the show is finished (ENDED) **and** the collection is complete.
|
||||
|
||||
This is the strongest "I own the entire series, no more to come" claim
|
||||
we can make today, before scheduling/upcoming-episode awareness lands.
|
||||
"""
|
||||
return self.is_ended() and self.collection_status() == CollectionStatus.COMPLETE
|
||||
|
||||
def missing_episodes(self) -> list[tuple[SeasonNumber, EpisodeNumber]]:
|
||||
"""All aired-but-not-owned ``(season, episode)`` pairs across the show."""
|
||||
result: list[tuple[SeasonNumber, EpisodeNumber]] = []
|
||||
for season_number, season in sorted(
|
||||
self.seasons.items(), key=lambda kv: kv[0].value
|
||||
):
|
||||
for ep_number in season.missing_episodes():
|
||||
result.append((season_number, ep_number))
|
||||
return result
|
||||
|
||||
# ── Naming ─────────────────────────────────────────────────────────────
|
||||
|
||||
def get_folder_name(self) -> str:
|
||||
"""Dot-separated folder name (e.g. ``Breaking.Bad``)."""
|
||||
return to_dot_folder_name(self.title)
|
||||
return f"{season_str}{episode_str}.{clean_title}"
|
||||
|
||||
def __str__(self) -> str:
|
||||
return f"{self.title} ({self.status.value}, {self.seasons_count} seasons)"
|
||||
return f"S{self.season_number.value:02d}E{self.episode_number.value:02d} - {self.title}"
|
||||
|
||||
def __repr__(self) -> str:
|
||||
return f"TVShow(imdb_id={self.imdb_id}, title='{self.title}')"
|
||||
return f"Episode(show={self.show_imdb_id}, S{self.season_number.value:02d}E{self.episode_number.value:02d})"
|
||||
|
||||
@@ -1,40 +1,126 @@
|
||||
"""TV Show repository interface.
|
||||
|
||||
A single repository for the aggregate root only — Season and Episode are
|
||||
**inside** the TVShow aggregate and are never persisted independently. The
|
||||
aggregate is always loaded and saved as a whole.
|
||||
"""
|
||||
"""TV Show repository interfaces (abstract)."""
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
|
||||
from ..shared.value_objects import ImdbId
|
||||
from .entities import TVShow
|
||||
from .entities import Episode, Season, TVShow
|
||||
from .value_objects import EpisodeNumber, SeasonNumber
|
||||
|
||||
|
||||
class TVShowRepository(ABC):
|
||||
"""
|
||||
Abstract repository for the TVShow aggregate.
|
||||
Abstract repository for TV show persistence.
|
||||
|
||||
Implementations are responsible for persisting the full aggregate graph
|
||||
(TVShow + all its Seasons + all their Episodes) atomically.
|
||||
This defines the interface that infrastructure implementations must follow.
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def save(self, show: TVShow) -> None:
|
||||
"""Persist the full TVShow aggregate."""
|
||||
"""
|
||||
Save a TV show to the repository.
|
||||
|
||||
Args:
|
||||
show: TVShow entity to save
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def find_by_imdb_id(self, imdb_id: ImdbId) -> TVShow | None:
|
||||
"""Load the full TVShow aggregate by IMDb ID, or None if absent."""
|
||||
"""
|
||||
Find a TV show by its IMDb ID.
|
||||
|
||||
Args:
|
||||
imdb_id: IMDb ID to search for
|
||||
|
||||
Returns:
|
||||
TVShow if found, None otherwise
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def find_all(self) -> list[TVShow]:
|
||||
"""Load all TVShow aggregates."""
|
||||
"""
|
||||
Get all TV shows in the repository.
|
||||
|
||||
Returns:
|
||||
List of all TV shows
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def delete(self, imdb_id: ImdbId) -> bool:
|
||||
"""Remove the aggregate. Returns True if it existed and was deleted."""
|
||||
"""
|
||||
Delete a TV show from the repository.
|
||||
|
||||
Args:
|
||||
imdb_id: IMDb ID of the show to delete
|
||||
|
||||
Returns:
|
||||
True if deleted, False if not found
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def exists(self, imdb_id: ImdbId) -> bool:
|
||||
"""True if the aggregate exists in the store."""
|
||||
"""
|
||||
Check if a TV show exists in the repository.
|
||||
|
||||
Args:
|
||||
imdb_id: IMDb ID to check
|
||||
|
||||
Returns:
|
||||
True if exists, False otherwise
|
||||
"""
|
||||
pass
|
||||
|
||||
|
||||
class SeasonRepository(ABC):
|
||||
"""Abstract repository for season persistence."""
|
||||
|
||||
@abstractmethod
|
||||
def save(self, season: Season) -> None:
|
||||
"""Save a season."""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def find_by_show_and_number(
|
||||
self, show_imdb_id: ImdbId, season_number: SeasonNumber
|
||||
) -> Season | None:
|
||||
"""Find a season by show and season number."""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def find_all_by_show(self, show_imdb_id: ImdbId) -> list[Season]:
|
||||
"""Get all seasons for a show."""
|
||||
pass
|
||||
|
||||
|
||||
class EpisodeRepository(ABC):
|
||||
"""Abstract repository for episode persistence."""
|
||||
|
||||
@abstractmethod
|
||||
def save(self, episode: Episode) -> None:
|
||||
"""Save an episode."""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def find_by_show_season_episode(
|
||||
self,
|
||||
show_imdb_id: ImdbId,
|
||||
season_number: SeasonNumber,
|
||||
episode_number: EpisodeNumber,
|
||||
) -> Episode | None:
|
||||
"""Find an episode by show, season, and episode number."""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def find_all_by_season(
|
||||
self, show_imdb_id: ImdbId, season_number: SeasonNumber
|
||||
) -> list[Episode]:
|
||||
"""Get all episodes for a season."""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def find_all_by_show(self, show_imdb_id: ImdbId) -> list[Episode]:
|
||||
"""Get all episodes for a show."""
|
||||
pass
|
||||
|
||||
@@ -0,0 +1,234 @@
|
||||
"""TV Show domain services - Business logic."""
|
||||
|
||||
import logging
|
||||
import re
|
||||
|
||||
from ..shared.value_objects import ImdbId
|
||||
from .entities import TVShow
|
||||
from .exceptions import (
|
||||
TVShowAlreadyExists,
|
||||
TVShowNotFound,
|
||||
)
|
||||
from .repositories import EpisodeRepository, SeasonRepository, TVShowRepository
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class TVShowService:
|
||||
"""
|
||||
Domain service for TV show-related business logic.
|
||||
|
||||
This service contains business rules that don't naturally fit
|
||||
within a single entity.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
show_repository: TVShowRepository,
|
||||
season_repository: SeasonRepository | None = None,
|
||||
episode_repository: EpisodeRepository | None = None,
|
||||
):
|
||||
"""
|
||||
Initialize TV show service.
|
||||
|
||||
Args:
|
||||
show_repository: TV show repository for persistence
|
||||
season_repository: Optional season repository
|
||||
episode_repository: Optional episode repository
|
||||
"""
|
||||
self.show_repository = show_repository
|
||||
self.season_repository = season_repository
|
||||
self.episode_repository = episode_repository
|
||||
|
||||
def track_show(self, show: TVShow) -> None:
|
||||
"""
|
||||
Start tracking a TV show.
|
||||
|
||||
Args:
|
||||
show: TVShow entity to track
|
||||
|
||||
Raises:
|
||||
TVShowAlreadyExists: If show is already being tracked
|
||||
"""
|
||||
if self.show_repository.exists(show.imdb_id):
|
||||
raise TVShowAlreadyExists(
|
||||
f"TV show with IMDb ID {show.imdb_id} is already tracked"
|
||||
)
|
||||
|
||||
self.show_repository.save(show)
|
||||
logger.info(f"Started tracking TV show: {show.title} ({show.imdb_id})")
|
||||
|
||||
def get_show(self, imdb_id: ImdbId) -> TVShow:
|
||||
"""
|
||||
Get a TV show by IMDb ID.
|
||||
|
||||
Args:
|
||||
imdb_id: IMDb ID of the show
|
||||
|
||||
Returns:
|
||||
TVShow entity
|
||||
|
||||
Raises:
|
||||
TVShowNotFound: If show not found
|
||||
"""
|
||||
show = self.show_repository.find_by_imdb_id(imdb_id)
|
||||
if not show:
|
||||
raise TVShowNotFound(f"TV show with IMDb ID {imdb_id} not found")
|
||||
return show
|
||||
|
||||
def get_all_shows(self) -> list[TVShow]:
|
||||
"""
|
||||
Get all tracked TV shows.
|
||||
|
||||
Returns:
|
||||
List of all TV shows
|
||||
"""
|
||||
return self.show_repository.find_all()
|
||||
|
||||
def get_ongoing_shows(self) -> list[TVShow]:
|
||||
"""
|
||||
Get all ongoing TV shows.
|
||||
|
||||
Returns:
|
||||
List of ongoing TV shows
|
||||
"""
|
||||
all_shows = self.show_repository.find_all()
|
||||
return [show for show in all_shows if show.is_ongoing()]
|
||||
|
||||
def get_ended_shows(self) -> list[TVShow]:
|
||||
"""
|
||||
Get all ended TV shows.
|
||||
|
||||
Returns:
|
||||
List of ended TV shows
|
||||
"""
|
||||
all_shows = self.show_repository.find_all()
|
||||
return [show for show in all_shows if show.is_ended()]
|
||||
|
||||
def update_show(self, show: TVShow) -> None:
|
||||
"""
|
||||
Update an existing TV show.
|
||||
|
||||
Args:
|
||||
show: TVShow entity with updated data
|
||||
|
||||
Raises:
|
||||
TVShowNotFound: If show doesn't exist
|
||||
"""
|
||||
if not self.show_repository.exists(show.imdb_id):
|
||||
raise TVShowNotFound(f"TV show with IMDb ID {show.imdb_id} not found")
|
||||
|
||||
self.show_repository.save(show)
|
||||
logger.info(f"Updated TV show: {show.title} ({show.imdb_id})")
|
||||
|
||||
def untrack_show(self, imdb_id: ImdbId) -> None:
|
||||
"""
|
||||
Stop tracking a TV show.
|
||||
|
||||
Args:
|
||||
imdb_id: IMDb ID of the show to untrack
|
||||
|
||||
Raises:
|
||||
TVShowNotFound: If show not found
|
||||
"""
|
||||
if not self.show_repository.delete(imdb_id):
|
||||
raise TVShowNotFound(f"TV show with IMDb ID {imdb_id} not found")
|
||||
|
||||
logger.info(f"Stopped tracking TV show with IMDb ID: {imdb_id}")
|
||||
|
||||
def parse_episode_from_filename(self, filename: str) -> tuple[int, int] | None:
|
||||
"""
|
||||
Parse season and episode numbers from filename.
|
||||
|
||||
Supports formats:
|
||||
- S01E05
|
||||
- 1x05
|
||||
- Season 1 Episode 5
|
||||
|
||||
Args:
|
||||
filename: Filename to parse
|
||||
|
||||
Returns:
|
||||
Tuple of (season, episode) if found, None otherwise
|
||||
"""
|
||||
filename_lower = filename.lower()
|
||||
|
||||
# Pattern 1: S01E05
|
||||
pattern1 = r"s(\d{1,2})e(\d{1,2})"
|
||||
match = re.search(pattern1, filename_lower)
|
||||
if match:
|
||||
return (int(match.group(1)), int(match.group(2)))
|
||||
|
||||
# Pattern 2: 1x05
|
||||
pattern2 = r"(\d{1,2})x(\d{1,2})"
|
||||
match = re.search(pattern2, filename_lower)
|
||||
if match:
|
||||
return (int(match.group(1)), int(match.group(2)))
|
||||
|
||||
# Pattern 3: Season 1 Episode 5
|
||||
pattern3 = r"season\s*(\d{1,2})\s*episode\s*(\d{1,2})"
|
||||
match = re.search(pattern3, filename_lower)
|
||||
if match:
|
||||
return (int(match.group(1)), int(match.group(2)))
|
||||
|
||||
return None
|
||||
|
||||
def validate_episode_file(self, filename: str) -> bool:
|
||||
"""
|
||||
Validate that a file is a valid episode file.
|
||||
|
||||
Args:
|
||||
filename: Filename to validate
|
||||
|
||||
Returns:
|
||||
True if valid episode file, False otherwise
|
||||
"""
|
||||
# Check file extension
|
||||
valid_extensions = {".mkv", ".mp4", ".avi", ".mov", ".wmv", ".flv", ".webm"}
|
||||
extension = filename[filename.rfind(".") :].lower() if "." in filename else ""
|
||||
|
||||
if extension not in valid_extensions:
|
||||
logger.warning(f"Invalid file extension: {extension}")
|
||||
return False
|
||||
|
||||
# Check if we can parse episode info
|
||||
episode_info = self.parse_episode_from_filename(filename)
|
||||
if not episode_info:
|
||||
logger.warning(f"Could not parse episode info from filename: {filename}")
|
||||
return False
|
||||
|
||||
return True
|
||||
|
||||
def find_next_episode(
|
||||
self, show: TVShow, last_season: int, last_episode: int
|
||||
) -> tuple[int, int] | None:
|
||||
"""
|
||||
Find the next episode to download for a show.
|
||||
|
||||
Args:
|
||||
show: TVShow entity
|
||||
last_season: Last downloaded season number
|
||||
last_episode: Last downloaded episode number
|
||||
|
||||
Returns:
|
||||
Tuple of (season, episode) for next episode, or None if show is complete
|
||||
"""
|
||||
# If show has ended and we've watched all seasons, no next episode
|
||||
if show.is_ended() and last_season >= show.seasons_count:
|
||||
return None
|
||||
|
||||
# Simple logic: next episode in same season, or first episode of next season
|
||||
# This could be enhanced with actual episode counts per season
|
||||
next_episode = last_episode + 1
|
||||
next_season = last_season
|
||||
|
||||
# Assume max 50 episodes per season (could be improved with actual data)
|
||||
if next_episode > 50:
|
||||
next_season += 1
|
||||
next_episode = 1
|
||||
|
||||
# Don't go beyond known seasons
|
||||
if next_season > show.seasons_count:
|
||||
return None
|
||||
|
||||
return (next_season, next_episode)
|
||||
@@ -1,7 +1,5 @@
|
||||
"""TV Show domain value objects."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from enum import Enum
|
||||
|
||||
@@ -9,48 +7,28 @@ from ..shared.exceptions import ValidationError
|
||||
|
||||
|
||||
class ShowStatus(Enum):
|
||||
"""
|
||||
Production status of a TV show (real-world, source of truth = TMDB).
|
||||
|
||||
Describes the **production** state of the show, independently of what
|
||||
the user owns. Orthogonal to ``CollectionStatus``.
|
||||
"""
|
||||
"""Status of a TV show - whether it's still airing or has ended."""
|
||||
|
||||
ONGOING = "ongoing"
|
||||
ENDED = "ended"
|
||||
UNKNOWN = "unknown"
|
||||
|
||||
@classmethod
|
||||
def from_string(cls, status_str: str) -> ShowStatus:
|
||||
def from_string(cls, status_str: str) -> "ShowStatus":
|
||||
"""
|
||||
Parse a production status string into a ShowStatus.
|
||||
Parse status from string.
|
||||
|
||||
Accepts our internal vocabulary ("ongoing", "ended") as well as the
|
||||
statuses returned by TMDB ("Returning Series", "In Production",
|
||||
"Pilot", "Ended", "Canceled"). The mapping is intentionally binary:
|
||||
Args:
|
||||
status_str: Status string (e.g., "ongoing", "ended")
|
||||
|
||||
* ONGOING — any state where new episodes may still ship
|
||||
* ENDED — production has stopped (naturally or cancelled)
|
||||
* UNKNOWN — anything else / unrecognized
|
||||
|
||||
Comparison is case-insensitive and whitespace-trimmed.
|
||||
Returns:
|
||||
ShowStatus enum value
|
||||
"""
|
||||
if not status_str:
|
||||
return cls.UNKNOWN
|
||||
key = status_str.strip().lower()
|
||||
status_map = {
|
||||
# Internal
|
||||
"ongoing": cls.ONGOING,
|
||||
"ended": cls.ENDED,
|
||||
# TMDB
|
||||
"returning series": cls.ONGOING,
|
||||
"in production": cls.ONGOING,
|
||||
"pilot": cls.ONGOING,
|
||||
"planned": cls.ONGOING,
|
||||
"canceled": cls.ENDED,
|
||||
"cancelled": cls.ENDED,
|
||||
}
|
||||
return status_map.get(key, cls.UNKNOWN)
|
||||
return status_map.get(status_str.lower(), cls.UNKNOWN)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
@@ -92,23 +70,6 @@ class SeasonNumber:
|
||||
return self.value
|
||||
|
||||
|
||||
class CollectionStatus(Enum):
|
||||
"""
|
||||
State of the user's **collection** for a TV show (orthogonal to ShowStatus).
|
||||
|
||||
Compares possessed episodes against episodes **already aired** — never
|
||||
against announced/upcoming ones. A returning show with all aired episodes
|
||||
owned is ``COMPLETE``, not ``PARTIAL``, even if more seasons are upcoming.
|
||||
|
||||
Future scheduling info (upcoming seasons, next airing date) will live on
|
||||
the TVShow aggregate as separate flags, not in this enum.
|
||||
"""
|
||||
|
||||
EMPTY = "empty" # 0 episode owned
|
||||
PARTIAL = "partial" # some aired episodes are missing
|
||||
COMPLETE = "complete" # all aired-to-date episodes are owned
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class EpisodeNumber:
|
||||
"""
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user