21 Commits

Author SHA1 Message Date
francwa f6eef59fca refactor: tech debt mini-pass (items 5, 6, 7, 20)
Low-risk cleanup items, no functional change to the parser. The
philosophy remains: keep the parser simple, the AI handles edge cases.

- Extract duplicated 'fs-safe title → dot-folder-name' regex into
  to_dot_folder_name() in domain/shared/value_objects.py. Used by both
  MovieTitle.normalized() and TVShow.get_folder_name() (item #5).
- ParsedRelease.languages now uses field(default_factory=list) instead
  of a manual __post_init__ assigning [] via object.__setattr__ (#6).
- tv_shows/entities.py module docstring: prepend ASCII ownership tree
  for quicker visual scan of the aggregate hierarchy (#7).
- file_extensions.yaml: split subtitle sidecars (.srt/.sub/.idx/.ass/.ssa)
  into a dedicated 'subtitle:' category instead of lumping them under
  'metadata:'. _METADATA_EXTENSIONS at the value_objects.py level remains
  the union of both — detect_media_type behavior unchanged. New loader
  load_subtitle_extensions() exposes the distinct subtitle set for future
  callers in the subtitles domain (#20).

Suite: 1020 passed, 8 skipped.
2026-05-18 16:24:28 +02:00
francwa 273510dff8 test(fixtures): seed PATH OF PAIN bucket with 10 worst-case fixtures
10 pathological release names mined from the real downloads folder.
Each fixture locks in the current parse_release output (including
its silent losses and false positives) so future parser improvements
are intentional, not silent drift.

Cases:
- Khruangbin yt-dlp slug (UTF-8 wide pipe '|', YT ID as group)
- Deutschland 83-86-89 franchise box (group=S03 misdetection)
- Chérie Le BéBé (accented chars preserved, VFF language)
- Jimmy Carr 8-word stand-up special title
- [ OxTorrent.vc ] prefix + XviD codec (site_tag prefix)
- Prodiges S12E01 with episode title + air-date silently lost
- The Prodigy: apostrophe + Blu-ray dash + 1080i + multi-word audio
  = full AI-path degeneration (everything UNKNOWN)
- Sleaford Mods yt-dlp slug (YT ID glued to year)
- Super Mario Bros [FR-EN] (bilingual tag mistaken for group)
- Gilmore Girls Complete S01-S07 (the well-behaved exception:
  COMPLETE token correctly drives tv_complete + REPACK + 10bit)

Also adds shitty + path_of_pain to the per-bucket sanity assertion.

Suite: 1020 passed, 8 skipped.
2026-05-18 15:57:56 +02:00
francwa c1831e3f46 test(fixtures): drop derry_duplicate_naming (was a copy-paste artifact)
The release name mixed two distinct releases — not a real-world case
worth anti-regression. SHITTY bucket now holds 14 fixtures (down from 15).
2026-05-18 15:51:11 +02:00
francwa aa182458b8 test(fixtures): seed SHITTY release bucket with 15 anti-regression cases
Add 15 expected.yaml fixtures under tests/fixtures/releases/shitty/
covering the awkward but real-world release names from the downloads
folder. Each fixture locks in the current parse_release behavior so
future parser changes are intentional, not silent drift.

Cases captured:
- Angel INTEGRALE 3-level hierarchy (tv_complete media_type)
- Buffy custom French title with dots preserved
- Archer S14E09E10E11 multi-episode (E11 lost — tech debt)
- Notre Planète lowercase s01e01
- Vinyl ' - 1x01 - FHD' (stray dash artifact — tech debt)
- Deutschland.83 (year-suffix as part of title)
- Tatortreiniger S01-06 range (falls to movie — tech debt)
- Derry Girls duplicated title
- Jurassic Park bare folder (media_type=unknown)
- La Nuit au Musée bilingual MULTI
- Chérie j'ai agrandi (ASCII-stripped apostrophe, parses fine)
- Honey Don't (unescaped apostrophe — full AI-path degeneration)
- Hook MULTi.SUBS movie with Subs/ folder
- Predator Badlands space separators (group=UNKNOWN — tech debt)
- Westworld S04 Subs.Only (no video file)

Each fixture also captures the future 3-flow routing (library /
torrents / seed_hardlinks) ahead of the organize_media refactor.

Suite: 1011 passed, 8 skipped.
2026-05-18 15:48:41 +02:00
francwa 774f71c8cc chore(gitignore): track CHANGELOG.md explicitly
The blanket *.md ignore was hiding CHANGELOG.md, forcing 'git add -f' on
every update. Allow-list it so the file lives under normal git tracking.
CLAUDE.md stays local (user keeps it personal until a dedicated repo).
2026-05-18 15:39:04 +02:00
francwa 7bc50fd5b8 test: add real-world release fixtures (EASY bucket)
Captures 5 canonical releases from /mnt/testipool/downloads as parametrized
fixtures under tests/fixtures/releases/easy/. Each fixture declares the
release name, expected ParsedRelease fields, original tree, and the future
routing (library / torrents / seed_hardlinks) for the upcoming organize_media
refactor.

Today only the 'parsed' section is asserted; tree is materialized into a
tmp_path to catch typos. Routing is captured ahead of the planner work — it
becomes verifiable once organize_media lands.

Cases: back_in_action (movie), slow_horses_single_ep (TV single),
foundation_season_pack (S02 + .nfo noise), long_walk_with_noise (movie +
KONTRAST.TOP.txt), sinners_yts (YTS bracket-heavy + Subs/ dir).

Also tracks CHANGELOG.md under [Unreleased] / Added.
2026-05-18 15:36:19 +02:00
francwa f17abdbaec chore: cleanup — remove shims, fix ruff warnings, ignore noisy rules
- Removed backward-compat shims _sanitise_for_fs / _strip_episode_from_normalised
  in domain/release/value_objects.py (zero callers).
- Fixed ruff warnings across the codebase:
    * PLW1510: explicit check=False on subprocess.run calls
    * PLC0415: promoted lazy imports to module top where no cycle exists
      (manage_subtitles, placer, qbittorrent/client, file_manager)
    * E402: fixed module-level import ordering in language_registry.py and
      subtitles/knowledge/loader.py
    * F841 / B007: removed unused locals (identifier.py)
    * C416: replaced unnecessary set comprehension with set() in
      release/knowledge.py
- Ruff config: ignore PLR0911/PLR0912 globally (noisy on mappers and
  orchestrator use-cases) and PLW0603 (intentional for the memory singleton).
- Updated tech debt memory: P1 done, ShowStatus actually complete (was a
  stale note).
2026-05-18 00:02:45 +02:00
francwa 1d50b63af2 Merge branch 'dev/sprint-cleanup'
Multi-week sprint: ISO 639-2/B language unification, release parser
unification + data-driven tokenizer, removal of fossil services
(movies/tv_shows/subtitles), subtitle services split into a package,
MediaInfo split, test suite expansion (990 passing).

See CHANGELOG.md [Unreleased] for the user-facing summary.
2026-05-17 23:42:05 +02:00
francwa 891ba502a2 chore: apply pre-commit auto-fixes (trim trailing whitespace, EOF) 2026-05-17 23:41:54 +02:00
francwa e07c9ec77b chore: sprint cleanup — language unification, parser unification, fossils removal
Several weeks of work accumulated without being committed. Grouped here for
clarity; see CHANGELOG.md [Unreleased] for the user-facing summary.

Highlights
----------

P1 #2 — ISO 639-2/B canonical migration
- New Language VO + LanguageRegistry (alfred/domain/shared/knowledge/).
- iso_languages.yaml as single source of truth for language codes.
- SubtitleKnowledgeBase now delegates lookup to LanguageRegistry; subtitles.yaml
  only declares subtitle-specific tokens (vostfr, vf, vff, …).
- SubtitlePreferences default → ["fre", "eng"]; subtitle filenames written as
  {iso639_2b}.srt (legacy fr.srt still read via alias).
- Scanner: dropped _LANG_KEYWORDS / _SDH_TOKENS / _FORCED_TOKENS /
  SUBTITLE_EXTENSIONS hardcoded dicts.
- Fixed: 'hi' token no longer marks SDH (conflicted with Hindi alias).
- Added settings.min_movie_size_bytes (was a module constant).

P1 #3 — Release parser unification + data-driven tokenizer
- parse_release() is now the single source of truth for release-name parsing.
- alfred/knowledge/release/separators.yaml declares the token separators used
  by the tokenizer (., space, [, ], (, ), _). New conventions can be added
  without code changes.
- Tokenizer now splits on any configured separator instead of name.split('.').
  Releases like 'The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]' parse via
  the direct path without sanitization fallback.
- Site-tag extraction always runs first; well-formedness only rejects truly
  forbidden chars.
- _parse_season_episode() extended with NxNN / NxNNxNN alt forms.
- Removed dead helpers: _sanitize, _normalize.

Domain cleanup
- Deleted fossil services with zero production callers:
    alfred/domain/movies/services.py
    alfred/domain/tv_shows/services.py
    alfred/domain/subtitles/services.py (replaced by subtitles/services/ package)
    alfred/domain/subtitles/repositories.py
- Split monolithic subtitle services into a package (identifier, matcher,
  placer, pattern_detector, utils) + dedicated knowledge/ package.
- MediaInfo split into dedicated package (alfred/domain/shared/media/:
  audio, video, subtitle, info, matching).

Persistence cleanup
- Removed dead JSON repositories (movie/subtitle/tvshow_repository.py).

Tests
- Major expansion of the test suite organized to mirror the source tree.
- Removed obsolete *_edge_cases test files superseded by structured tests.
- Suite: 990 passed, 8 skipped.

Misc
- .gitignore: exclude env_backup/ and *.bak.
- Adjustments across agent/llm, app.py, application/filesystem, and
  infrastructure/filesystem to align with the new domain layout.
2026-05-17 23:38:00 +02:00
francwa ba6f016d49 feat: generic MetadataStore + read_release_metadata + query_library
- Extract MetadataStore from SubtitleMetadataStore (alfred/infrastructure/metadata/).
  Generic load/save + typed update helpers (update_parse, update_probe, update_tmdb)
  for the per-release .alfred/metadata.yaml.
- SubtitleMetadataStore becomes a thin facade — owns subtitle_history shape,
  delegates I/O to MetadataStore.
- Agent._execute_tool_call auto-persists successful analyze_release / probe_media /
  find_media_imdb_id results to the release's .alfred file. find_media_imdb_id
  follows release_focus when it has no path argument.
- New tools:
  · read_release_metadata(release_path) — cacheable, key=release_path.
    Returns the .alfred content or has_metadata=false.
  · query_library(name) — substring scan across configured library roots.
- Both new tools added to CORE_TOOLS (always visible).
2026-05-15 11:02:25 +02:00
francwa 3c7c6695f2 feat(memory): Phase 1 — STM ToolResultsCache + ReleaseFocus + cache flag in YAML specs
Adds two STM components and a transparent cache hook in the agent loop so
read-only tools don't re-do work the agent already did in this session.

New STM components:
  - ToolResultsCache  — {tool_name: {key: result}}, session-scoped.
    to_dict() exposes only the key inventory (not payloads) to keep the
    prompt cheap.
  - ReleaseFocus      — current_release_path + working_set list, updated
    automatically when a path-keyed inspector runs.

YAML spec layer:
  - New optional 'cache: { key: <param_name> }' block in ToolSpec.
  - Validated at load time: cache.key must be a declared parameter.
  - Surfaced on Tool dataclass as cache_key: str | None.

Agent._execute_tool_call:
  - Pre-exec cache lookup; hit short-circuits and adds _from_cache=true.
  - Post-exec: stores successful results, updates release_focus for
    path-keyed tools, refreshes episodic.last_search_results when
    find_torrent's hit served the response (so get_torrent_by_index
    keeps pointing at the right list).

Cacheable tools (5): analyze_release, probe_media, list_folder,
find_media_imdb_id, find_torrent.
2026-05-15 10:44:14 +02:00
francwa 2db3198ef2 feat(agent): migrate all remaining tools to YAML specs (21/21 covered)
Adds YAML specs for the 14 tools that were still description-from-docstring:

  filesystem:
    - set_path_for_folder, list_folder, analyze_release, probe_media,
      move_media, manage_subtitles, create_seed_links, learn
  api:
    - find_media_imdb_id, find_torrent, get_torrent_by_index,
      add_torrent_to_qbittorrent, add_torrent_by_index
  language:
    - set_language

Each spec follows the established shape (summary / description /
when_to_use / when_not_to_use / next_steps / parameters with
why_needed + example / returns) and the Python function docstring is
slimmed to a one-line pointer.

Registry now reports: 21 tools, 21 with YAML spec, 0 doc-only.
2026-05-14 21:18:43 +02:00
francwa 23a9dd7990 refactor(memory): rename workflow.target -> params, type -> name
The Workflow STM component stored an active workflow as
{type, target, stage, started_at}. Now that start_workflow takes a
workflow_name and a params dict, those keys match what they actually
hold:

  type   -> name    (the YAML workflow name, e.g. media.organize_media)
  target -> params  (the dict passed to start_workflow)

ShortTermMemory.start_workflow parameters renamed accordingly. All
consumers (prompt builder workflow scope + STM context, start/end
workflow tools) updated.
2026-05-14 21:11:23 +02:00
francwa 74a52ba6a3 feat(agent): workflow-scoped tool catalog + start/end_workflow meta-tools
Introduce a scope-aware agent so the LLM never sees the full 21-tool
catalog at once. The system prompt now describes either:
  - idle mode: core noyau (5 tools: set_language, set_path_for_folder,
    list_folder, start_workflow, end_workflow) + a list of available
    workflows with their goals;
  - active mode: the noyau plus the tools declared by the active
    workflow's YAML, with the step plan inlined into the prompt.

Pieces:
- alfred/agent/tools/workflow.py: start_workflow / end_workflow tools
  (with YAML specs under tools/specs/) that drive memory.stm.workflow.
- alfred/agent/prompt.py: CORE_TOOLS constant, visible_tool_names(),
  filtered build_tools_spec() / _format_tools_description(), and a new
  _format_workflow_scope() section in the system prompt.
- alfred/agent/agent.py: WorkflowLoader wired into Agent, defensive
  out-of-scope check in _execute_tool_call.
- alfred/agent/registry.py: registers the two new meta-tools (21 total,
  7 with YAML spec).
- workflows/media.organize_media.yaml: tools/steps list refreshed to
  match the current resolver split (analyze_release, probe_media,
  resolve_*_destination, move_to_destination).
2026-05-14 21:07:36 +02:00
francwa 97adfbda45 refactor(workflows): adopt media.* naming convention
Rename workflow files and their 'name' field with a 'media.' domain
prefix to anticipate future multi-domain expansion (mail.*, calendar.*, ...).

- organize_media -> media.organize_media
- manage_subtitles -> media.manage_subtitles

WorkflowLoader picks them up unchanged (uses data['name']).
2026-05-14 20:55:35 +02:00
francwa 239fce9e4e chore(agent): remove dead parameters.py
The ParameterSchema / REQUIRED_PARAMETERS / get_missing_required_parameters
machinery in alfred/agent/parameters.py was used in early prototypes for
the prompt-required-params check but has been unwired from production for
several refactors. The new YAML tool-spec layer (alfred/agent/tools/specs/)
covers the same need (rich, LLM-facing parameter descriptions) without
the parallel registration plumbing.

Tests in tests/test_config_edge_cases.py still reference the deleted
module — left untouched per the project policy of treating test sync
as a dedicated end-of-week task.
2026-05-14 18:06:34 +02:00
francwa 99c95af64e feat(agent): YAML tool specs as the LLM-facing semantic layer
Introduce a first-class semantic layer for tool descriptions, separated
from Python signatures (which stay the source of truth for types and
required-ness).

New
- alfred/agent/tools/spec.py — ToolSpec / ParameterSpec / ReturnsSpec
  dataclasses with strict YAML validation (ToolSpecError on malformed
  or inconsistent specs). compile_description() builds the rich text
  passed to the LLM as Tool.description, with sections for summary,
  description, when_to_use, when_not_to_use, next_steps, and returns.
  compile_parameter_description() injects the 'why_needed' field next
  to each parameter so the LLM sees the *intent* of each argument.
- alfred/agent/tools/spec_loader.py — discovers tools/specs/*.yaml,
  enforces filename ↔ spec.name match, rejects duplicates.
- alfred/agent/tools/specs/ — one YAML per tool:
    * resolve_season_destination.yaml
    * resolve_episode_destination.yaml
    * resolve_movie_destination.yaml
    * resolve_series_destination.yaml
    * move_to_destination.yaml

Refactor
- alfred/agent/registry.py
    * _create_tool_from_function now takes an optional ToolSpec.
      When provided, the long description + per-parameter descriptions
      come from the spec; types and required-ness still come from the
      Python signature.
    * Cross-validates spec.parameters against the function signature —
      crashes on missing or extra entries.
    * make_tools() loads all specs at startup and hands the right one
      to each tool. Tools without a spec fall back to the old
      docstring-only behaviour, so the 14 not-yet-migrated tools keep
      working unchanged.
    * Adds 'array' and 'object' to the Python→JSON type mapping and
      handles Optional[X] / X | None annotations.

- alfred/agent/tools/filesystem.py
    * Drops the '_tool' suffix on the 4 resolve_* wrappers (option 1:
      alias the use-case imports as _resolve_*). Tool names exposed to
      the LLM now match the underlying use case verbatim.
    * Wrapper docstrings shrink to a one-liner pointing to the YAML
      spec — no more duplicated when_to_use/Args/Returns in Python.

Verified
- make_tools() loads 19 tools (5 with YAML spec, 14 doc-only).
- Compiled descriptions render cleanly with all sections.
2026-05-14 18:06:27 +02:00
francwa b5025bb5f8 refactor(resolve_destination): factor shared series-folder resolution + DTO base
- New _Clarification sentinel and _resolve_series_folder() helper —
  the three TV use cases now share one matching/clarification path
  instead of triplicating the same if/elif/else block.
- New _ResolvedDestinationBase carrying status/question/options/error/
  message plus a _base_dict() helper; the four concrete DTOs only
  declare their own ok-state fields and a slim to_dict().
- No behaviour change: same outputs for ok/needs_clarification/error
  cases (verified by import + DTO smoke tests).
2026-05-14 16:09:33 +02:00
francwa e45465d52d feat: split resolve_destination, persona-driven prompts, qBittorrent relocation
Destination resolution
- Replace the single ResolveDestinationUseCase with four dedicated
  functions, one per release type:
    resolve_season_destination    (pack season, folder move)
    resolve_episode_destination   (single episode, file move)
    resolve_movie_destination     (movie, file move)
    resolve_series_destination    (multi-season pack, folder move)
- Each returns a dedicated DTO carrying only the fields relevant to
  that release type — no more polymorphic ResolvedDestination with
  half the fields unused depending on the case.
- Looser series folder matching: exact computed-name match is reused
  silently; any deviation (different group, multiple candidates) now
  prompts the user with all options including the computed name.

Agent tools
- Four new tools wrapping the use cases above; old resolve_destination
  removed from the registry.
- New move_to_destination tool: create_folder + move, chained — used
  after a resolve_* call to perform the actual relocation.
- Low-level filesystem_operations module (create_folder, move via mv)
  for instant same-FS renames (ZFS).

Prompt & persona
- New PromptBuilder (alfred/agent/prompt.py) replacing prompts.py:
  identity + personality block, situational expressions, memory
  schema, episodic/STM/config context, tool catalogue.
- Per-user expression system: knowledge/users/common.yaml +
  {username}.yaml are merged at runtime; one phrase per situation
  (greeting/success/error/...) is sampled into the system prompt.

qBittorrent integration
- Credentials now come from settings (qbittorrent_url/username/password)
  instead of hardcoded defaults.
- New client methods: find_by_name, set_location, recheck — the trio
  needed to update a torrent's save path and re-verify after a move.
- Host→container path translation settings (qbittorrent_host_path /
  qbittorrent_container_path) for docker-mounted setups.

Subtitles
- Identifier: strip parenthesized qualifiers (simplified, brazil…) at
  tokenization; new _tokenize_suffix used for the episode_subfolder
  pattern so episode-stem tokens no longer pollute language detection.
- Placer: extract _build_dest_name so it can be reused by the new
  dry_run path in ManageSubtitlesUseCase.
- Knowledge: add yue, ell, ind, msa, rus, vie, heb, tam, tel, tha,
  hin, ukr; add 'fre' to fra; add 'simplified'/'traditional' to zho.

Misc
- LTM workspace: add 'trash' folder slot.
- Default LLM provider switched to deepseek.
- testing/debug_release.py: CLI to parse a release, hit TMDB, and
  dry-run the destination resolution end-to-end.
2026-05-14 05:01:59 +02:00
francwa 1723b9fa53 feat: release parser, media type detection, ffprobe integration
Replace the old domain/media release parser with a full rewrite under
domain/release/:
- ParsedRelease with media_type ("movie" | "tv_show" | "tv_complete" |
  "documentary" | "concert" | "other" | "unknown"), site_tag, parse_path,
  languages, audio_codec, audio_channels, bit_depth, hdr_format, edition
- Well-formedness check + sanitize pipeline (_is_well_formed, _sanitize,
  _strip_site_tag) before token-level parsing
- Multi-token sequence matching for audio (DTS-HD.MA, TrueHD.Atmos…),
  HDR (DV.HDR10…) and editions (DIRECTORS.CUT…)
- Knowledge YAML: file_extensions, release_format, languages, audio,
  video, editions, sites/c411

New infrastructure:
- ffprobe.py — single-pass probe returning MediaInfo (video, audio
  tracks, subtitle tracks)
- find_video.py — locate first video file in a release folder

New application helpers:
- detect_media_type — filesystem-based type refinement
- enrich_from_probe — fill missing ParsedRelease fields from MediaInfo

New agent tools:
- analyze_release — parse + detect type + ffprobe in one call
- probe_media — standalone ffprobe for a specific file

New domain value object:
- MediaInfo + AudioTrack + SubtitleTrack (domain/shared/media_info.py)

Testing CLIs:
- recognize_folders_in_downloads.py — full pipeline with colored output
- probe_video.py — display MediaInfo for a video file
2026-05-12 16:14:20 +02:00
238 changed files with 17549 additions and 7850 deletions
+6 -3
View File
@@ -46,9 +46,12 @@ TMDB_BASE_URL=https://api.themoviedb.org/3
# qBittorrent
# → QBITTORRENT_PASSWORD goes in .env.secrets
QBITTORRENT_URL=http://qbittorrent:16140
QBITTORRENT_USERNAME=admin
QBITTORRENT_URL=https://qb.lan.anustart.top
QBITTORRENT_USERNAME=letmein
QBITTORRENT_PORT=16140
# Path translation: host-side prefix → container-side prefix
QBITTORRENT_HOST_PATH=/mnt/testipool
QBITTORRENT_CONTAINER_PATH=/mnt/data
# Meilisearch
# → MEILI_MASTER_KEY goes in .env.secrets
@@ -60,7 +63,7 @@ MEILI_HOST=http://meilisearch:7700
# --- LLM CONFIGURATION ---
# Providers: local, openai, anthropic, deepseek, google, kimi
# → API keys go in .env.secrets
DEFAULT_LLM_PROVIDER=local
DEFAULT_LLM_PROVIDER=deepseek
# Local LLM (Ollama)
#OLLAMA_BASE_URL=http://ollama:11434
+4 -1
View File
@@ -59,6 +59,8 @@ Thumbs.db
# Backup files
*.backup
*.bak
env_backup/
# Application data dir
data/*
@@ -69,7 +71,8 @@ logs/*
# Documentation folder
docs/
# .md files
# .md files (project-level Markdown is brol-y; allow-list the ones we track)
*.md
!CHANGELOG.md
#
+261
View File
@@ -0,0 +1,261 @@
# Changelog
All notable changes to Alfred are documented here.
The format is loosely based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
Alfred is not yet on SemVer — entries are grouped by **dated work blocks** instead
of release numbers. Granularity targets behavioral or API-visible changes; refer
to `git log` for commit-level detail.
Sections used per block: **Added** / **Changed** / **Deprecated** / **Removed** /
**Fixed** / **Internal** (for tech-debt and refactor noise that doesn't affect
callers).
---
## [Unreleased]
### Added
- **Real-world release fixtures** under `tests/fixtures/releases/{easy,shitty,path_of_pain}/`,
each documenting an expected `ParsedRelease` plus the future `routing`
(library / torrents / seed_hardlinks) for the upcoming `organize_media`
refactor. EASY bucket seeded with 5 cases (movie, single-episode, season
pack, movie + noise, YTS bracket-heavy). SHITTY bucket seeded with 15
anti-regression cases covering: 3-level INTEGRALE hierarchy (Angel),
French custom titles (Buffy, La Nuit au Musée, Chérie j'ai agrandi),
multi-episode chain `S14E09E10E11` (Archer, captures E11 loss),
lowercase `s01e01` (Notre Planète), `NxNN` with ` - ` separators
(Vinyl, captures dash artifact), title-with-year-suffix (Deutschland.83),
season-range `S01-06` (Tatortreiniger, captures movie misclassification),
bare folder name (Jurassic Park,
media_type=unknown), apostrophe-in-name (Honey Don't, captures full AI-path
degeneration), SUBS-tag movie (Hook), space separators (Predator Badlands,
captures group=UNKNOWN), subs-only release (Westworld S04).
PATH OF PAIN bucket seeded with 10 worst-case fixtures covering:
UTF-8 wide pipe yt-dlp slug (Khruangbin), 3-show franchise box-set
with double season range and parens-wrapped tech (Deutschland 83-86-89,
captures `group=S03` misdetection), accented chars in title (Chérie
BéBé with VFF), 8-word stand-up comedy title (Jimmy Carr), site-tag
prefix + XviD (OxTorrent), episode title + air-date silently lost
(Prodiges), full-chaos apostrophe + spaces + Blu-ray dash + 1080i +
multi-word audio codec (The Prodigy, full AI-path degeneration),
yt-dlp YouTube ID glued to year (Sleaford Mods), bilingual `[FR-EN]`
tag mistaken for group (Super Mario Bros), COMPLETE + S01-S07 range +
REPACK + HEVC (Gilmore Girls, the well-behaved exception).
Parametrized over `tests/domain/test_release_fixtures.py` for anti-regression.
- **`NxNN` alt season/episode form supported** by `parse_release`. Releases like
`Show.1x05.720p.HDTV.x264-GRP` and `Show.2x07x08.1080p.WEB.x265-GRP` (multi-ep
alt form) now parse as TV shows.
- **`alfred/knowledge/release/separators.yaml`** declares the token separators
used by the release-name tokenizer (`.`, ` `, `[`, `]`, `(`, `)`, `_`). New
conventions can be added without code changes. The canonical `.` is always
present even if missing from YAML.
### Changed
- **`parse_release` tokenizer is now data-driven**: it splits on any character
listed in `separators.yaml` (regex character class) instead of `name.split(".")`.
This makes YTS-style releases (`The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]`),
space-separated names (`Inception 2010 1080p BluRay x264-GROUP`), and
underscore-separated names parse correctly via the direct path — no more
fallback through sanitization.
- **`parse_release` flow simplified**: site-tag extraction always runs first
(so `parse_path == "sanitized"` now reliably indicates a stripped `[tag]`),
then well-formedness is checked only against truly forbidden chars
(anything not in the configured separator set).
- **ISO 639-2/B is now the canonical language code project-wide** (was a mix of
639-1 and 639-2/T):
- `SubtitlePreferences.languages` default is now `["fre", "eng"]` (was
`["fr", "en"]`). Old LTM files are not auto-migrated — delete
`data/memory/ltm.json` to regenerate with the new defaults.
- Subtitle output filenames are now `{iso639_2b}.srt` (e.g. `fre.srt`,
`fre.sdh.srt`). Existing `fr.srt` files are still **read** correctly
(recognized as French via alias) but new files are written canonically.
- `Language` value object docstring corrected: it has always stored 639-2/B
(matching what ffprobe emits), not 639-2/T as previously documented.
- **`MovieService.validate_movie_file` minimum size is now configurable** via
`settings.min_movie_size_bytes` (default unchanged: 100 MB). Constructor
accepts an optional `min_movie_size_bytes` override for tests.
- **`SubtitleKnowledgeBase` delegates language lookup to `LanguageRegistry`**
rather than duplicating tokens. `subtitles.yaml` now only declares
subtitle-specific tokens (e.g. `vostfr`, `vf`, `vff`) under a new
`language_tokens` section.
### Removed
- **`alfred/domain/tv_shows/services.py`** and **`alfred/domain/movies/services.py`**
deleted entirely. They held fossil parsers (`parse_episode_filename`,
`extract_movie_metadata`, …) with zero production callers — superseded by
`parse_release` as the single source of truth for release-name parsing.
Associated tests (`tests/domain/test_movies.py`, `tests/domain/test_tv_shows_service.py`)
removed as well.
- `_sanitize` and `_normalize` helpers in `alfred/domain/release/services.py`
the new tokenizer makes them redundant.
- `_LANG_KEYWORDS`, `_SDH_TOKENS`, `_FORCED_TOKENS`, `SUBTITLE_EXTENSIONS`
hardcoded dicts in `alfred/domain/subtitles/scanner.py` — all knowledge now
lives in YAML (CLAUDE.md compliance).
- `_MIN_MOVIE_SIZE_BYTES` module-level constant in
`alfred/domain/movies/services.py` — replaced by the new setting.
- Top-level `languages:` block in `subtitles.yaml` — superseded by
`language_tokens:` (subtitle-specific only) since iso_languages.yaml is the
canonical source.
### Fixed
- **`hi` token no longer marks a subtitle as SDH** (it conflicted with the
ISO 639-1 alias for Hindi). SDH is now detected only via `sdh`, `cc`, and
`hearing` tokens.
- `SubtitleKnowledgeBase` default rules used `"fra"` while
`iso_languages.yaml` exposes French as `"fre"` — preferred languages
defaults now match the canonical form.
### Internal
- **`to_dot_folder_name(title)` helper** in
`alfred/domain/shared/value_objects.py` — extracts the
`re.sub(r"[^\w\s\.\-]", "", title).replace(" ", ".")` pattern that was
duplicated between `MovieTitle.normalized()` and `TVShow.get_folder_name()`.
- **`ParsedRelease.languages` uses `field(default_factory=list)`** instead of
a manual `__post_init__` that assigned `[]` via `object.__setattr__`.
- **`file_extensions.yaml` splits subtitle sidecars (`.srt`, `.sub`, `.idx`,
`.ass`, `.ssa`) into a dedicated `subtitle:` category** instead of lumping
them under `metadata:`. The `_METADATA_EXTENSIONS` set used by
`detect_media_type` remains the union of both (same behavior — subtitles
are still ignored when deciding the media type of a folder), but a new
`load_subtitle_extensions()` loader is now available for the subtitles
domain. Sematic clarity, no functional change.
- **`tv_shows/entities.py` module docstring** now shows the aggregate
ownership as an ASCII tree before the rule text — quicker visual scan
of the DDD structure.
- Removed backward-compat shims `_sanitise_for_fs` /
`_strip_episode_from_normalised` from `domain/release/value_objects.py`
(zero callers).
- Cleaned ruff warnings across the codebase: `subprocess.run` calls now pass
explicit `check=False` (PLW1510); lazy imports promoted to module top where
there was no cycle (PLC0415 in `manage_subtitles.py`, `placer.py`,
`qbittorrent/client.py`, `file_manager.py`); fixed module-level import
ordering (E402) in `language_registry.py` and `subtitles/knowledge/loader.py`;
removed unused locals (F841 / B007); replaced unnecessary set comprehension
with `set()` in `release/knowledge.py` (C416).
- Ruff config: ignore `PLR0911` / `PLR0912` (too-many-returns / too-many-branches)
globally — noisy on parser mappers and orchestrator use-cases where early-return
validation is essential complexity. Ignore `PLW0603` for the documented memory
singleton (`infrastructure/persistence/context.py`).
---
## [2026-05-17] — TVShow & Movie aggregate refactor
Multi-phase refonte of the TV show domain into a real DDD aggregate, with
matching parity work on `Movie`, a language knowledge system, and the
`shared/media` restructure that supports both.
### Added
- **Language knowledge system** (`alfred/knowledge/iso_languages.yaml` + 42
languages including `und` for undetermined).
- `Language` value object (frozen dataclass) with `iso`, `english_name`,
`native_name`, `aliases`, and a `matches(raw)` cross-format helper.
- `LanguageRegistry` loader (`alfred/domain/shared/knowledge/`) merging
builtin + learned YAML. Not a singleton — the application layer
instantiates it.
- ISO 639-2/B is the canonical key; aliases cover 639-1, 639-2/T, English
name, native name, and common spellings.
- **`VideoTrack`** dataclass (`alfred/domain/shared/media/video.py`) with a
`resolution` property using width-priority bucket detection (handles
cinema/scope crops like 1920×960 → 1080p).
- **`shared/media/matching.py`** — `track_lang_matches` helper shared by
`Episode` and `Movie`. Implements the **"C+" contract** for language helpers:
- `Language` query → cross-format match via `Language.matches()`
- `str` query → case-insensitive direct comparison (no normalization)
- **TVShow aggregate composition**:
- `TVShow.seasons: dict[SeasonNumber, Season]`
- `Season.episodes: dict[EpisodeNumber, Episode]`
- `Season.expected_episodes` / `Season.aired_episodes` (split so collection
state can compare "owned vs aired today" without confusing in-flight
seasons with future ones)
- **Aggregate methods on `TVShow`**:
- `add_episode(ep)` — sole sanctioned mutation entry point (creates the
season if missing)
- `add_season(season)` — replaces a season wholesale
- `collection_status()``CollectionStatus.{EMPTY, PARTIAL, COMPLETE}`
- `is_complete_series()` — true iff `ENDED + COMPLETE`
- `missing_episodes()` — flat list of all aired-but-not-owned
`(season, episode)` pairs
- **`CollectionStatus`** enum (orthogonal to `ShowStatus`).
- **Episode track helpers** (`has_audio_in`, `has_subtitles_in`,
`has_forced_subs`, `audio_languages`, `subtitle_languages`), driven by
`Episode.audio_tracks` / `Episode.subtitle_tracks`.
- **Movie aggregate parity** — `Movie` now carries `audio_tracks` /
`subtitle_tracks` and exposes the same helpers as `Episode` (same C+
contract).
- **`CHANGELOG.md`** (this file).
### Changed
- **`shared/media_info.py` exploded into `shared/media/{audio,video,subtitle,info,matching}.py`.**
`MediaInfo` is now symmetric: every stream type is a `list[Track]`. Flat
accessors (`width`, `height`, `video_codec`, `resolution`) remain as
properties that read the first video track.
- **`MediaInfo.duration_seconds` / `bitrate_kbps`** moved from `VideoTrack` to
`MediaInfo` (file-level — they come from the ffprobe `format` block, not a
stream). Files without a video stream now correctly expose duration.
- **`ShowStatus.from_string`** extended to map TMDB strings (`Returning
Series`, `In Production`, `Pilot`, `Planned`, `Canceled`, `Cancelled`).
Comparison is whitespace-trimmed and case-insensitive.
- **`Season` / `Episode`** dropped their `show_imdb_id` back-references. They
are owned by `TVShow` and reached only through it.
- **`TVShow.seasons_count` and `episode_count`** are now `@property` (computed
from the dict) instead of stored ints.
- **`TVShowService.parse_episode_from_filename`** rewritten in string
operations (no regex). Supports `S01E05` / `s1e5` and `1x05` / `01x5` forms.
- **`TVShowService.find_next_episode`** now drives off
`show.missing_episodes()` instead of the hardcoded "max 50 episodes per
season" heuristic.
- **`TVShowService` constructor** no longer takes `season_repository` /
`episode_repository` — the aggregate persists in one block via
`TVShowRepository` only.
- **`SubtitleTrack` in `alfred.domain.subtitles.entities` renamed to
`SubtitleCandidate`.** Coexists with the `shared.media.SubtitleTrack`
ffprobe-view dataclass (different bounded contexts, kept separate
intentionally).
- **`tv_shows/services.py` `_VIDEO_EXTENSIONS`** now loaded from
`knowledge/release/file_extensions.yaml` via `load_video_extensions()`
(single source of truth).
- **`CLAUDE.md`** updated with three new policy sections:
- "Tests" — small updates OK during normal work, no mass-update sprees
- "Backwards-compatibility shims" — prefer clean migration over shims
- "Regex" — not forbidden, use judgment when string ops would be fragile
### Removed
- **Legacy `Season N Episode N` filename form** in
`TVShowService.parse_episode_from_filename`. It never appears in the release
names Alfred handles, and supporting it forced a regex.
- **`SeasonRepository` and `EpisodeRepository`** — only the aggregate root has
a repository (DDD rule: one repo per aggregate).
- **`shared/media_info.py`** compatibility shim — callers updated.
- **`SubtitleTrack` compatibility alias** in `subtitles.entities` — callers
updated to `SubtitleCandidate`.
### Fixed
- **`MediaInfo.duration_seconds` returns `None` on audio-only files** instead
of crashing through `primary_video.duration_seconds` (see the duration/bitrate
move under **Changed**).
- **`MediaOrganizer`** (`infrastructure/filesystem/organizer.py`) no longer
passes the removed `show_imdb_id` / `episode_count` kwargs when constructing
a `Season` for folder-name generation.
### Internal
- Test suite rewritten where the aggregate redesign broke fixtures:
`tests/domain/test_tv_shows.py` (69 tests), `tests/domain/test_media_info.py`
(rewritten for `VideoTrack`), `tests/application/test_enrich_from_probe.py`
(helper added), `tests/infrastructure/test_filesystem_extras.py` (fixtures),
`tests/domain/test_tv_shows_service.py` (find_next_episode driven by real
aggregate state).
- Subtitle services internal migration: `matcher.py`, `utils.py`, `placer.py`,
`identifier.py` updated to import `SubtitleCandidate`.
- Suite status at end of block: **1066 passed, 8 skipped, 0 failed**.
+145 -8
View File
@@ -3,13 +3,16 @@
import json
import logging
from collections.abc import AsyncGenerator
from pathlib import Path
from typing import Any
from alfred.infrastructure.metadata import MetadataStore
from alfred.infrastructure.persistence import get_memory
from alfred.settings import settings
from .prompts import PromptBuilder
from .prompt import PromptBuilder
from .registry import Tool, make_tools
from .workflows import WorkflowLoader
logger = logging.getLogger(__name__)
@@ -33,8 +36,8 @@ class Agent:
self.settings = settings
self.llm = llm
self.tools: dict[str, Tool] = make_tools(settings)
self.prompt_builder = PromptBuilder(self.tools)
self.settings = settings
self.workflow_loader = WorkflowLoader()
self.prompt_builder = PromptBuilder(self.tools, self.workflow_loader)
self.max_tool_iterations = max_tool_iterations
def step(self, user_input: str) -> str:
@@ -139,7 +142,7 @@ class Agent:
memory.save()
return final_response
def _execute_tool_call(self, tool_call: dict[str, Any]) -> dict[str, Any]:
def _execute_tool_call(self, tool_call: dict[str, Any]) -> dict[str, Any]: # noqa: PLR0911
"""
Execute a single tool call.
@@ -168,29 +171,163 @@ class Agent:
"available_tools": available,
}
# Defensive: reject calls to tools that are not currently in scope.
visible = set(self.prompt_builder.visible_tool_names())
if tool_name not in visible:
return {
"error": "tool_out_of_scope",
"message": (
f"Tool '{tool_name}' is not available in the current "
"workflow scope. Call end_workflow first or start the "
"appropriate workflow."
),
"available_tools": sorted(visible),
}
tool = self.tools[tool_name]
memory = get_memory()
# Cache lookup — for tools flagged cacheable, short-circuit on hit.
cache_key_value = self._cache_key_for(tool, args)
if cache_key_value is not None:
cached = memory.stm.tool_results.get(tool_name, cache_key_value)
if cached is not None:
logger.info(f"Tool cache HIT: {tool_name}[{cache_key_value}]")
self._post_tool_side_effects(tool_name, args, cached, from_cache=True)
return {**cached, "_from_cache": True}
# Execute tool
try:
result = tool.func(**args)
return result
except KeyboardInterrupt:
# Don't catch KeyboardInterrupt - let it propagate
raise
except TypeError as e:
# Bad arguments
memory = get_memory()
memory.episodic.add_error(tool_name, f"bad_args: {e}")
return {"error": "bad_args", "message": str(e), "tool": tool_name}
except Exception as e:
# Other errors
memory = get_memory()
memory.episodic.add_error(tool_name, str(e))
return {"error": "execution_failed", "message": str(e), "tool": tool_name}
# Persist + side effects only on successful results.
if isinstance(result, dict) and result.get("status") == "ok":
if cache_key_value is not None:
memory.stm.tool_results.put(tool_name, cache_key_value, result)
self._post_tool_side_effects(tool_name, args, result, from_cache=False)
memory.save()
return result
@staticmethod
def _cache_key_for(tool: Tool, args: dict[str, Any]) -> str | None:
"""Return the cache key value for this call, or None if not cacheable."""
if tool.cache_key is None:
return None
value = args.get(tool.cache_key)
if value is None:
return None
return str(value)
def _post_tool_side_effects(
self,
tool_name: str,
args: dict[str, Any],
result: dict[str, Any],
*,
from_cache: bool,
) -> None:
"""
Tool-agnostic side effects applied after a successful run or cache hit.
Today:
- Update release_focus when a path-keyed inspector runs.
- Persist inspector results into the release's `.alfred/metadata.yaml`.
- Refresh episodic.last_search_results on find_torrent cache hits so
get_torrent_by_index keeps pointing at the right list.
"""
memory = get_memory()
tool = self.tools.get(tool_name)
# Release focus: any path-keyed inspector updates current_release_path.
if tool is not None and tool.cache_key in {"source_path"}:
path = args.get(tool.cache_key)
if isinstance(path, str) and path:
memory.stm.release_focus.focus(path)
# Persist inspector results to .alfred/metadata.yaml (skip on cache
# hit — the file is already up to date from the original run).
if not from_cache:
self._maybe_update_alfred(tool_name, args, result)
# Episodic refresh when find_torrent's cache short-circuits the call.
if from_cache and tool_name == "find_torrent":
torrents = result.get("torrents") or []
query = args.get("media_title") or ""
memory.episodic.store_search_results(
query=query, results=torrents, search_type="torrent"
)
def _maybe_update_alfred(
self,
tool_name: str,
args: dict[str, Any],
result: dict[str, Any],
) -> None:
"""
Persist a successful inspector result into the release's
`.alfred/metadata.yaml`. No-op when the release root can't be resolved.
"""
if tool_name not in {"analyze_release", "probe_media", "find_media_imdb_id"}:
return
release_root = self._resolve_release_root(tool_name, args)
if release_root is None:
return
try:
store = MetadataStore(release_root)
if tool_name == "analyze_release":
store.update_parse(result)
elif tool_name == "probe_media":
store.update_probe(result)
elif tool_name == "find_media_imdb_id":
store.update_tmdb(result)
except Exception as e:
logger.warning(
f"Failed to update .alfred for {tool_name} at {release_root}: {e}"
)
@staticmethod
def _resolve_release_root(
tool_name: str,
args: dict[str, Any],
) -> Path | None:
"""
Figure out which release folder owns this call.
- analyze_release / probe_media: derived from source_path
(folder kept as-is, file walked up to its parent).
- find_media_imdb_id: follow the current release focus in STM.
"""
if tool_name in {"analyze_release", "probe_media"}:
raw = args.get("source_path")
if not isinstance(raw, str) or not raw:
return None
path = Path(raw)
return path if path.is_dir() else path.parent
# find_media_imdb_id has no path arg — rely on release focus.
focus = get_memory().stm.release_focus.current_release_path
if not focus:
return None
path = Path(focus)
return path if path.is_dir() else path.parent
async def step_streaming(
self, user_input: str, completion_id: str, created_ts: int, model: str
) -> AsyncGenerator[dict[str, Any], None]:
) -> AsyncGenerator[dict[str, Any]]:
"""
Execute agent step with streaming support for LibreChat.
+79
View File
@@ -0,0 +1,79 @@
"""Expression loader — charge et merge les fichiers YAML d'expressions par user."""
import random
from pathlib import Path
import yaml
_USERS_DIR = Path(__file__).parent.parent / "knowledge" / "users"
def _load_yaml(path: Path) -> dict:
if not path.exists():
return {}
return yaml.safe_load(path.read_text(encoding="utf-8")) or {}
def load_expressions(username: str | None) -> dict:
"""
Charge common.yaml et le merge avec {username}.yaml.
Retourne un dict avec :
- nickname: str (surnom de l'user, ou username en fallback)
- expressions: dict[situation -> list[str]]
"""
common = _load_yaml(_USERS_DIR / "common.yaml")
user_data = _load_yaml(_USERS_DIR / f"{username}.yaml") if username else {}
# Merge expressions : common + user (les phrases user s'ajoutent)
common_exprs: dict[str, list] = common.get("expressions", {})
user_exprs: dict[str, list] = user_data.get("expressions", {})
merged: dict[str, list] = {}
all_situations = set(common_exprs) | set(user_exprs)
for situation in all_situations:
base = list(common_exprs.get(situation, []))
extra = list(user_exprs.get(situation, []))
merged[situation] = base + extra
nickname = user_data.get("user", {}).get("nickname") or username or "mec"
return {
"nickname": nickname,
"expressions": merged,
}
def pick(expressions: dict, situation: str, nickname: str | None = None) -> str:
"""
Pioche une expression aléatoire pour une situation donnée.
Résout {user} avec le nickname si fourni.
Retourne une string vide si la situation n'existe pas.
"""
options = expressions.get("expressions", {}).get(situation, [])
if not options:
return ""
chosen = random.choice(options)
if nickname:
chosen = chosen.replace("{user}", nickname)
return chosen
def build_expressions_context(username: str | None) -> dict:
"""
Point d'entrée principal.
Retourne :
- nickname: str
- samples: dict[situation -> une phrase résolue] — une seule par situation
"""
data = load_expressions(username)
nickname = data["nickname"]
samples = {
situation: pick(data, situation, nickname) for situation in data["expressions"]
}
return {
"nickname": nickname,
"samples": samples,
}
+4 -2
View File
@@ -6,7 +6,8 @@ from typing import Any
import requests
from requests.exceptions import HTTPError, RequestException, Timeout
from alfred.settings import Settings, settings
from alfred.settings import Settings
from alfred.settings import settings as default_settings
from .exceptions import LLMAPIError, LLMConfigurationError
@@ -36,6 +37,7 @@ class DeepSeekClient:
Raises:
LLMConfigurationError: If API key is missing
"""
self.settings = settings or default_settings
self.api_key = api_key or self.settings.deepseek_api_key
self.base_url = base_url or self.settings.deepseek_base_url
self.model = model or self.settings.deepseek_model
@@ -96,7 +98,7 @@ class DeepSeekClient:
payload = {
"model": self.model,
"messages": messages,
"temperature": settings.llm_temperature,
"temperature": self.settings.llm_temperature,
}
# Add tools if provided
+6 -4
View File
@@ -7,6 +7,7 @@ import requests
from requests.exceptions import HTTPError, RequestException, Timeout
from alfred.settings import Settings
from alfred.settings import settings as default_settings
from .exceptions import LLMAPIError, LLMConfigurationError
@@ -46,11 +47,12 @@ class OllamaClient:
Raises:
LLMConfigurationError: If configuration is invalid
"""
self.base_url = base_url or settings.ollama_base_url
self.model = model or settings.ollama_model
self.timeout = timeout or settings.request_timeout
self.settings = settings or default_settings
self.base_url = base_url or self.settings.ollama_base_url
self.model = model or self.settings.ollama_model
self.timeout = timeout or self.settings.request_timeout
self.temperature = (
temperature if temperature is not None else settings.llm_temperature
temperature if temperature is not None else self.settings.llm_temperature
)
if not self.base_url:
-101
View File
@@ -1,101 +0,0 @@
# agent/parameters.py
from collections.abc import Callable
from dataclasses import dataclass
from typing import Any
@dataclass
class ParameterSchema:
"""Describes a required parameter for the agent."""
key: str
description: str
why_needed: str # Explanation for the AI
type: str # "string", "number", "object", etc.
validator: Callable[[Any], bool] | None = None
default: Any = None
required: bool = True
# Define all required parameters
REQUIRED_PARAMETERS = [
ParameterSchema(
key="config",
description="Configuration object containing all folder paths",
why_needed=(
"This contains the paths to all important folders:\n"
"- download_folder: Where downloaded files arrive before being organized\n"
"- tvshow_folder: Where TV show files are organized and stored\n"
"- movie_folder: Where movie files are organized and stored\n"
"- torrent_folder: Where torrent structures are saved for the torrent client"
),
type="object",
validator=lambda x: isinstance(x, dict),
required=True,
default={},
),
ParameterSchema(
key="tv_shows",
description="List of TV shows the user is following",
why_needed=(
"This tracks which TV shows you're following. "
"Each show includes: IMDB ID, title, number of seasons, and status (ongoing or ended)."
),
type="array",
validator=lambda x: isinstance(x, list),
required=False,
default=[],
),
]
def get_parameter_schema(key: str) -> ParameterSchema | None:
"""Get schema for a specific parameter."""
for param in REQUIRED_PARAMETERS:
if param.key == key:
return param
return None
def get_missing_required_parameters(memory_data: dict) -> list[ParameterSchema]:
"""Get list of required parameters that are missing or None."""
missing = []
for param in REQUIRED_PARAMETERS:
if param.required:
value = memory_data.get(param.key)
if value is None:
missing.append(param)
return missing
def format_parameters_for_prompt() -> str:
"""Format parameter descriptions for the AI system prompt."""
lines = ["REQUIRED PARAMETERS:"]
for param in REQUIRED_PARAMETERS:
status = "REQUIRED" if param.required else "OPTIONAL"
lines.append(f"\n- {param.key} ({status}):")
lines.append(f" Description: {param.description}")
lines.append(f" Why needed: {param.why_needed}")
lines.append(f" Type: {param.type}")
return "\n".join(lines)
def validate_parameter(key: str, value: Any) -> tuple[bool, str | None]:
"""
Validate a parameter value against its schema.
Returns:
(is_valid, error_message)
"""
schema = get_parameter_schema(key)
if not schema:
return True, None # Unknown parameters are allowed
if schema.validator:
try:
if not schema.validator(value):
return False, f"Validation failed for {key}"
except Exception as e:
return False, f"Validation error for {key}: {str(e)}"
return True, None
+333
View File
@@ -0,0 +1,333 @@
"""Prompt builder for the agent system."""
import json
from typing import Any
from alfred.infrastructure.persistence import get_memory
from alfred.infrastructure.persistence.memory import MemoryRegistry
from .expressions import build_expressions_context
from .registry import Tool
from .workflows import WorkflowLoader
# Tools that are always available, regardless of workflow scope.
# Kept small on purpose — the noyau is what the agent uses to either
# answer trivially or pivot into a workflow.
CORE_TOOLS: tuple[str, ...] = (
"set_language",
"set_path_for_folder",
"list_folder",
"read_release_metadata",
"query_library",
"start_workflow",
"end_workflow",
)
class PromptBuilder:
"""Builds system prompts for the agent with memory context."""
def __init__(
self,
tools: dict[str, Tool],
workflow_loader: WorkflowLoader | None = None,
):
self.tools = tools
self.workflow_loader = workflow_loader or WorkflowLoader()
self._memory_registry = MemoryRegistry()
def _active_workflow(self, memory) -> dict | None:
"""Return the YAML definition of the active workflow, or None."""
current = memory.stm.workflow.current
if current is None:
return None
return self.workflow_loader.get(current.get("name"))
def visible_tool_names(self) -> list[str]:
"""
Return the names of the tools currently in scope.
- Idle (no workflow): core noyau only. The LLM enters a workflow
via start_workflow to access more tools.
- Workflow active: core noyau + the workflow's declared tools.
"""
memory = get_memory()
visible = set(CORE_TOOLS)
workflow = self._active_workflow(memory)
if workflow is not None:
for name in workflow.get("tools", []):
visible.add(name)
# Only return tools that actually exist in the registry.
return [name for name in self.tools if name in visible]
def _format_identity(self, memory) -> str:
"""Build Alfred's identity and personality section."""
username = memory.stm.get_entity("username")
expr = build_expressions_context(username)
nickname = expr["nickname"]
samples = expr["samples"]
# Format expressions as situational guidance for the LLM
expr_lines = []
situation_labels = {
"greeting": "Salutation",
"success": "Succès",
"working": "En cours",
"error": "Erreur",
"unclear": "Demande floue",
"warning": "Avertissement",
"not_found": "Introuvable",
}
for situation, label in situation_labels.items():
phrase = samples.get(situation, "")
if phrase:
expr_lines.append(f' {label}: "{phrase}"')
expressions_block = "\n".join(expr_lines)
return f"""Tu t'appelles Alfred. Tu es un assistant d'organisation de médiathèque — direct, opérationnel, légèrement impertinent.
PERSONNALITÉ:
- Tu parles franglais : français avec des touches d'anglais quand ça colle mieux
- Tu es sarcastique et n'as pas peur de te moquer si l'user fait une connerie
- Tu envoies chier poliment (mais clairement) quand la demande est trop floue
- Tu ne fais pas de blabla inutile. Si "ok c'est fait" suffit, c'est tout ce que tu dis
- Tu peux jurer (putain, merde, con, ...) — c'est naturel, pas du remplissage
- Jamais de "Great question!" ou de politesse creuse
USER COURANT: {nickname}
EXPRESSIONS À UTILISER (une par situation, naturellement intégrées dans ta réponse) :
{expressions_block}"""
def build_tools_spec(self) -> list[dict[str, Any]]:
"""Build the tool specification for the LLM API (scope-filtered)."""
visible = set(self.visible_tool_names())
tool_specs = []
for tool in self.tools.values():
if tool.name not in visible:
continue
spec = {
"type": "function",
"function": {
"name": tool.name,
"description": tool.description,
"parameters": tool.parameters,
},
}
tool_specs.append(spec)
return tool_specs
def _format_tools_description(self) -> str:
"""Format the currently-visible tools with description + params."""
visible = set(self.visible_tool_names())
visible_tools = [t for t in self.tools.values() if t.name in visible]
if not visible_tools:
return ""
return "\n".join(
f"- {tool.name}: {tool.description}\n"
f" Parameters: {json.dumps(tool.parameters, ensure_ascii=False)}"
for tool in visible_tools
)
def _format_workflow_scope(self, memory) -> str:
"""Describe the current workflow scope so the LLM has a plan."""
workflow = self._active_workflow(memory)
if workflow is None:
available = self.workflow_loader.names()
if not available:
return ""
lines = ["WORKFLOW SCOPE: idle (broad catalog narrowed to core noyau)."]
lines.append(
" Call start_workflow(workflow_name, params) to enter a scope."
)
lines.append(" Available workflows:")
for name in available:
wf = self.workflow_loader.get(name) or {}
desc = (wf.get("description") or "").strip().splitlines()
summary = desc[0] if desc else ""
lines.append(f" - {name}: {summary}")
return "\n".join(lines)
current = memory.stm.workflow.current or {}
lines = [
f"WORKFLOW SCOPE: active — {current.get('name')} "
f"(stage: {current.get('stage')})",
]
params = current.get("params")
if params:
lines.append(f" Params: {params}")
wf_desc = (workflow.get("description") or "").strip()
if wf_desc:
lines.append(f" Goal: {wf_desc}")
steps = workflow.get("steps", [])
if steps:
lines.append(" Steps:")
for step in steps:
step_id = step.get("id", "?")
step_tool = step.get("tool") or (
"ask_user" if step.get("ask_user") else ""
)
lines.append(f" - {step_id} ({step_tool})")
lines.append(" Call end_workflow(reason) when done, cancelled, or off-topic.")
return "\n".join(lines)
def _format_episodic_context(self, memory) -> str:
"""Format episodic memory context for the prompt."""
lines = []
if memory.episodic.last_search_results:
results = memory.episodic.last_search_results
result_list = results.get("results", [])
lines.append(
f"\nLAST SEARCH: '{results.get('query')}' ({len(result_list)} results)"
)
# Show first 5 results
for i, result in enumerate(result_list[:5]):
name = result.get("name", "Unknown")
lines.append(f" {i + 1}. {name}")
if len(result_list) > 5:
lines.append(f" ... and {len(result_list) - 5} more")
if memory.episodic.pending_question:
question = memory.episodic.pending_question
lines.append(f"\nPENDING QUESTION: {question.get('question')}")
lines.append(f" Type: {question.get('type')}")
if question.get("options"):
lines.append(f" Options: {len(question.get('options'))}")
if memory.episodic.active_downloads:
lines.append(f"\nACTIVE DOWNLOADS: {len(memory.episodic.active_downloads)}")
for dl in memory.episodic.active_downloads[:3]:
lines.append(f" - {dl.get('name')}: {dl.get('progress', 0)}%")
if memory.episodic.recent_errors:
lines.append("\nRECENT ERRORS (up to 3):")
for error in memory.episodic.recent_errors[-3:]:
lines.append(
f" - Action '{error.get('action')}' failed: {error.get('error')}"
)
# Unread events
unread = [e for e in memory.episodic.background_events if not e.get("read")]
if unread:
lines.append(f"\nUNREAD EVENTS: {len(unread)}")
for event in unread[:3]:
lines.append(f" - {event.get('type')}: {event.get('data')}")
return "\n".join(lines)
def _format_stm_context(self, memory) -> str:
"""Format short-term memory context for the prompt."""
lines = []
if memory.stm.current_workflow:
workflow = memory.stm.current_workflow
lines.append(
f"CURRENT WORKFLOW: {workflow.get('name')} (stage: {workflow.get('stage')})"
)
if workflow.get("params"):
lines.append(f" Params: {workflow.get('params')}")
if memory.stm.current_topic:
lines.append(f"CURRENT TOPIC: {memory.stm.current_topic}")
if memory.stm.extracted_entities:
lines.append("EXTRACTED ENTITIES:")
for key, value in memory.stm.extracted_entities.items():
lines.append(f" - {key}: {value}")
if memory.stm.language:
lines.append(f"CONVERSATION LANGUAGE: {memory.stm.language}")
return "\n".join(lines)
def _format_memory_schema(self) -> str:
"""Describe available memory components so the agent knows what to read/write and when."""
schema = self._memory_registry.schema()
tier_labels = {
"ltm": "LONG-TERM (persisted)",
"stm": "SHORT-TERM (session)",
"episodic": "EPISODIC (volatile)",
}
lines = ["MEMORY COMPONENTS:"]
for tier, components in schema.items():
if not components:
continue
lines.append(f"\n [{tier_labels.get(tier, tier.upper())}]")
for c in components:
access = c.get("access", "read")
lines.append(f" {c['name']} ({access}): {c['description']}")
for field_name, field_desc in c.get("fields", {}).items():
lines.append(f" · {field_name}: {field_desc}")
return "\n".join(lines)
def _format_config_context(self, memory) -> str:
"""Format configuration context."""
lines = ["CURRENT CONFIGURATION:"]
folders = {
**memory.ltm.workspace.as_dict(),
**memory.ltm.library_paths.to_dict(),
}
if folders:
for key, value in folders.items():
lines.append(f" - {key}: {value}")
else:
lines.append(" (no configuration set)")
return "\n".join(lines)
def build_system_prompt(self) -> str:
"""Build the complete system prompt."""
memory = get_memory()
# Identity + personality
identity = self._format_identity(memory)
# Language instruction
language_instruction = (
"Si la langue de l'user est différente de la langue courante en STM, "
"appelle `set_language` en premier avant de répondre."
)
# Configuration
config_section = self._format_config_context(memory)
# STM context
stm_context = self._format_stm_context(memory)
# Episodic context
episodic_context = self._format_episodic_context(memory)
# Memory schema
memory_schema = self._format_memory_schema()
# Workflow scope (active workflow plan or list of options)
workflow_section = self._format_workflow_scope(memory)
# Available tools (already filtered by scope)
tools_desc = self._format_tools_description()
tools_section = f"\nOUTILS DISPONIBLES:\n{tools_desc}" if tools_desc else ""
rules = """
RÈGLES:
- Utilise les outils pour accomplir les tâches, pas pour décorer
- Si des résultats de recherche sont dispo en mémoire épisodique, référence-les par index
- Confirme toujours avant une opération destructive (move, delete, overwrite)
- Réponses courtes — si c'est fait, dis-le en une ligne
- Si la demande est floue, demande un éclaircissement AVANT de lancer quoi que ce soit
"""
sections = [
identity,
language_instruction,
config_section,
stm_context,
episodic_context,
memory_schema,
workflow_section,
tools_section,
rules,
]
return "\n\n".join(s for s in sections if s and s.strip())
-206
View File
@@ -1,206 +0,0 @@
"""Prompt builder for the agent system."""
import json
from typing import Any
from alfred.infrastructure.persistence import get_memory
from alfred.infrastructure.persistence.memory import MemoryRegistry
from .registry import Tool
class PromptBuilder:
"""Builds system prompts for the agent with memory context."""
def __init__(self, tools: dict[str, Tool]):
self.tools = tools
self._memory_registry = MemoryRegistry()
def build_tools_spec(self) -> list[dict[str, Any]]:
"""Build the tool specification for the LLM API."""
tool_specs = []
for tool in self.tools.values():
spec = {
"type": "function",
"function": {
"name": tool.name,
"description": tool.description,
"parameters": tool.parameters,
},
}
tool_specs.append(spec)
return tool_specs
def _format_tools_description(self) -> str:
"""Format tools with their descriptions and parameters."""
if not self.tools:
return ""
return "\n".join(
f"- {tool.name}: {tool.description}\n"
f" Parameters: {json.dumps(tool.parameters, ensure_ascii=False)}"
for tool in self.tools.values()
)
def _format_episodic_context(self, memory) -> str:
"""Format episodic memory context for the prompt."""
lines = []
if memory.episodic.last_search_results:
results = memory.episodic.last_search_results
result_list = results.get("results", [])
lines.append(
f"\nLAST SEARCH: '{results.get('query')}' ({len(result_list)} results)"
)
# Show first 5 results
for i, result in enumerate(result_list[:5]):
name = result.get("name", "Unknown")
lines.append(f" {i + 1}. {name}")
if len(result_list) > 5:
lines.append(f" ... and {len(result_list) - 5} more")
if memory.episodic.pending_question:
question = memory.episodic.pending_question
lines.append(f"\nPENDING QUESTION: {question.get('question')}")
lines.append(f" Type: {question.get('type')}")
if question.get("options"):
lines.append(f" Options: {len(question.get('options'))}")
if memory.episodic.active_downloads:
lines.append(f"\nACTIVE DOWNLOADS: {len(memory.episodic.active_downloads)}")
for dl in memory.episodic.active_downloads[:3]:
lines.append(f" - {dl.get('name')}: {dl.get('progress', 0)}%")
if memory.episodic.recent_errors:
lines.append("\nRECENT ERRORS (up to 3):")
for error in memory.episodic.recent_errors[-3:]:
lines.append(
f" - Action '{error.get('action')}' failed: {error.get('error')}"
)
# Unread events
unread = [e for e in memory.episodic.background_events if not e.get("read")]
if unread:
lines.append(f"\nUNREAD EVENTS: {len(unread)}")
for event in unread[:3]:
lines.append(f" - {event.get('type')}: {event.get('data')}")
return "\n".join(lines)
def _format_stm_context(self, memory) -> str:
"""Format short-term memory context for the prompt."""
lines = []
if memory.stm.current_workflow:
workflow = memory.stm.current_workflow
lines.append(
f"CURRENT WORKFLOW: {workflow.get('type')} (stage: {workflow.get('stage')})"
)
if workflow.get("target"):
lines.append(f" Target: {workflow.get('target')}")
if memory.stm.current_topic:
lines.append(f"CURRENT TOPIC: {memory.stm.current_topic}")
if memory.stm.extracted_entities:
lines.append("EXTRACTED ENTITIES:")
for key, value in memory.stm.extracted_entities.items():
lines.append(f" - {key}: {value}")
if memory.stm.language:
lines.append(f"CONVERSATION LANGUAGE: {memory.stm.language}")
return "\n".join(lines)
def _format_memory_schema(self) -> str:
"""Describe available memory components so the agent knows what to read/write and when."""
schema = self._memory_registry.schema()
tier_labels = {"ltm": "LONG-TERM (persisted)", "stm": "SHORT-TERM (session)", "episodic": "EPISODIC (volatile)"}
lines = ["MEMORY COMPONENTS:"]
for tier, components in schema.items():
if not components:
continue
lines.append(f"\n [{tier_labels.get(tier, tier.upper())}]")
for c in components:
access = c.get("access", "read")
lines.append(f" {c['name']} ({access}): {c['description']}")
for field_name, field_desc in c.get("fields", {}).items():
lines.append(f" · {field_name}: {field_desc}")
return "\n".join(lines)
def _format_config_context(self, memory) -> str:
"""Format configuration context."""
lines = ["CURRENT CONFIGURATION:"]
folders = {**memory.ltm.workspace.as_dict(), **memory.ltm.library_paths.to_dict()}
if folders:
for key, value in folders.items():
lines.append(f" - {key}: {value}")
else:
lines.append(" (no configuration set)")
return "\n".join(lines)
def build_system_prompt(self) -> str:
"""Build the complete system prompt."""
# Get memory once for all context formatting
memory = get_memory()
# Base instruction
base = "You are a helpful AI assistant for managing a media library."
# Language instruction
language_instruction = (
"Your first task is to determine the user's language from their message "
"and use the `set_language` tool if it's different from the current one. "
"After that, proceed to help the user."
)
# Available tools
tools_desc = self._format_tools_description()
tools_section = f"\nAVAILABLE TOOLS:\n{tools_desc}" if tools_desc else ""
# Memory schema
memory_schema = self._format_memory_schema()
# Configuration
config_section = self._format_config_context(memory)
if config_section:
config_section = f"\n{config_section}"
# STM context
stm_context = self._format_stm_context(memory)
if stm_context:
stm_context = f"\n{stm_context}"
# Episodic context
episodic_context = self._format_episodic_context(memory)
# Important rules
rules = """
IMPORTANT RULES:
- Use tools to accomplish tasks
- When search results are available, reference them by index (e.g., "add_torrent_by_index")
- Always confirm actions with the user before executing destructive operations
- Provide clear, concise responses
"""
# Examples
examples = """
EXAMPLES:
- User: "Find Inception" → Use find_media_imdb_id, then find_torrent
- User: "download the 3rd one" → Use add_torrent_by_index with index=3
- User: "List my downloads" → Use list_folder with folder_type="download"
"""
return f"""{base}
{language_instruction}
{tools_section}
{memory_schema}
{config_section}
{stm_context}
{episodic_context}
{rules}
{examples}
"""
+104 -46
View File
@@ -1,4 +1,4 @@
"""Tool registry - defines and registers all available tools for the agent."""
"""Tool registry defines and registers all available tools for the agent."""
import inspect
import logging
@@ -6,6 +6,9 @@ from collections.abc import Callable
from dataclasses import dataclass
from typing import Any
from .tools.spec import ToolSpec, ToolSpecError
from .tools.spec_loader import load_tool_specs
logger = logging.getLogger(__name__)
@@ -17,51 +20,63 @@ class Tool:
description: str
func: Callable[..., dict[str, Any]]
parameters: dict[str, Any]
cache_key: str | None = None # Parameter name to use as STM cache key.
def _create_tool_from_function(func: Callable) -> Tool:
_PY_TYPE_TO_JSON = {
str: "string",
int: "integer",
float: "number",
bool: "boolean",
list: "array",
dict: "object",
}
def _json_type_for(annotation) -> str:
"""Map a Python type annotation to a JSON Schema 'type' string."""
if annotation is inspect.Parameter.empty:
return "string"
# Strip Optional[X] / X | None to X.
args = getattr(annotation, "__args__", None)
if args:
non_none = [a for a in args if a is not type(None)]
if len(non_none) == 1:
annotation = non_none[0]
return _PY_TYPE_TO_JSON.get(annotation, "string")
def _create_tool_from_function(func: Callable, spec: ToolSpec | None = None) -> Tool:
"""
Create a Tool object from a function.
Create a Tool object from a function, optionally enriched with a spec.
Args:
func: Function to convert to a tool
Returns:
Tool object with metadata extracted from function
Types and required-ness always come from the Python signature (source of
truth for the API contract). When a spec is provided, the description
and per-parameter docs come from the YAML spec instead of the docstring.
"""
sig = inspect.signature(func)
doc = inspect.getdoc(func)
sig_params = {name: p for name, p in sig.parameters.items() if name != "self"}
# Extract description from docstring (first line)
description = doc.strip().split("\n")[0] if doc else func.__name__
# Build JSON schema from function signature
properties = {}
required = []
for param_name, param in sig.parameters.items():
if param_name == "self":
continue
# Map Python types to JSON schema types
param_type = "string" # default
if param.annotation != inspect.Parameter.empty:
if param.annotation is str:
param_type = "string"
elif param.annotation is int:
param_type = "integer"
elif param.annotation is float:
param_type = "number"
elif param.annotation is bool:
param_type = "boolean"
properties[param_name] = {
"type": param_type,
"description": f"Parameter {param_name}",
if spec is not None:
_validate_spec_matches_signature(func.__name__, sig_params, spec)
description = spec.compile_description()
param_descriptions = {
name: spec.compile_parameter_description(name) for name in sig_params
}
else:
doc = inspect.getdoc(func)
description = doc.strip().split("\n")[0] if doc else func.__name__
param_descriptions = {name: f"Parameter {name}" for name in sig_params}
# Add to required if no default value
if param.default == inspect.Parameter.empty:
properties: dict[str, dict[str, Any]] = {}
required: list[str] = []
for param_name, param in sig_params.items():
properties[param_name] = {
"type": _json_type_for(param.annotation),
"description": param_descriptions[param_name],
}
if param.default is inspect.Parameter.empty:
required.append(param_name)
parameters = {
@@ -70,11 +85,38 @@ def _create_tool_from_function(func: Callable) -> Tool:
"required": required,
}
cache_key = spec.cache.key if spec is not None and spec.cache is not None else None
return Tool(
name=func.__name__,
description=description,
func=func,
parameters=parameters,
cache_key=cache_key,
)
def _validate_spec_matches_signature(
func_name: str,
sig_params: dict[str, inspect.Parameter],
spec: ToolSpec,
) -> None:
"""Ensure every signature param has a spec entry and vice versa."""
sig_names = set(sig_params.keys())
spec_names = set(spec.parameters.keys())
missing_in_spec = sig_names - spec_names
if missing_in_spec:
raise ToolSpecError(
f"tool '{func_name}': spec is missing entries for parameter(s) "
f"{sorted(missing_in_spec)}"
)
extra_in_spec = spec_names - sig_names
if extra_in_spec:
raise ToolSpecError(
f"tool '{func_name}': spec has entries for unknown parameter(s) "
f"{sorted(extra_in_spec)} (not in function signature)"
)
@@ -83,22 +125,29 @@ def make_tools(settings) -> dict[str, Tool]:
Create and register all available tools.
Args:
settings: Application settings instance
settings: Application settings instance.
Returns:
Dictionary mapping tool names to Tool objects
Dictionary mapping tool names to Tool objects.
"""
# Import tools here to avoid circular dependencies
from .tools import api as api_tools # noqa: PLC0415
from .tools import filesystem as fs_tools # noqa: PLC0415
from .tools import language as lang_tools # noqa: PLC0415
from .tools import workflow as wf_tools # noqa: PLC0415
# List of all tool functions
tool_functions = [
fs_tools.set_path_for_folder,
fs_tools.list_folder,
fs_tools.resolve_destination,
fs_tools.read_release_metadata,
fs_tools.query_library,
fs_tools.analyze_release,
fs_tools.probe_media,
fs_tools.resolve_season_destination,
fs_tools.resolve_episode_destination,
fs_tools.resolve_movie_destination,
fs_tools.resolve_series_destination,
fs_tools.move_media,
fs_tools.move_to_destination,
fs_tools.manage_subtitles,
fs_tools.create_seed_links,
fs_tools.learn,
@@ -108,13 +157,22 @@ def make_tools(settings) -> dict[str, Tool]:
api_tools.add_torrent_to_qbittorrent,
api_tools.get_torrent_by_index,
lang_tools.set_language,
wf_tools.start_workflow,
wf_tools.end_workflow,
]
# Create Tool objects from functions
tools = {}
specs = load_tool_specs()
tools: dict[str, Tool] = {}
for func in tool_functions:
tool = _create_tool_from_function(func)
spec = specs.get(func.__name__)
tool = _create_tool_from_function(func, spec=spec)
tools[tool.name] = tool
logger.info(f"Registered {len(tools)} tools: {list(tools.keys())}")
with_spec = sum(1 for fn in tool_functions if fn.__name__ in specs)
logger.info(
f"Registered {len(tools)} tools "
f"({with_spec} with YAML spec, {len(tools) - with_spec} doc-only): "
f"{list(tools.keys())}"
)
return tools
+5 -52
View File
@@ -14,15 +14,7 @@ logger = logging.getLogger(__name__)
def find_media_imdb_id(media_title: str) -> dict[str, Any]:
"""
Find the IMDb ID for a given media title using TMDB API.
Args:
media_title: Title of the media to search for.
Returns:
Dict with IMDb ID and media info, or error details.
"""
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/find_media_imdb_id.yaml."""
use_case = SearchMovieUseCase(tmdb_client)
response = use_case.execute(media_title)
result = response.to_dict()
@@ -45,18 +37,7 @@ def find_media_imdb_id(media_title: str) -> dict[str, Any]:
def find_torrent(media_title: str) -> dict[str, Any]:
"""
Find torrents for a given media title using Knaben API.
Results are stored in episodic memory so the user can reference them
by index (e.g., "download the 3rd one").
Args:
media_title: Title of the media to search for.
Returns:
Dict with torrent list or error details.
"""
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/find_torrent.yaml."""
logger.info(f"Searching torrents for: {media_title}")
use_case = SearchTorrentsUseCase(knaben_client)
@@ -76,17 +57,7 @@ def find_torrent(media_title: str) -> dict[str, Any]:
def get_torrent_by_index(index: int) -> dict[str, Any]:
"""
Get a torrent from the last search results by its index.
Allows the user to reference results by number after a search.
Args:
index: 1-based index of the torrent in the search results.
Returns:
Dict with torrent data or error if not found.
"""
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/get_torrent_by_index.yaml."""
logger.info(f"Getting torrent at index: {index}")
memory = get_memory()
@@ -113,15 +84,7 @@ def get_torrent_by_index(index: int) -> dict[str, Any]:
def add_torrent_to_qbittorrent(magnet_link: str) -> dict[str, Any]:
"""
Add a torrent to qBittorrent using a magnet link.
Args:
magnet_link: Magnet link of the torrent to add.
Returns:
Dict with success status or error details.
"""
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/add_torrent_to_qbittorrent.yaml."""
logger.info("Adding torrent to qBittorrent")
use_case = AddTorrentUseCase(qbittorrent_client)
@@ -157,17 +120,7 @@ def add_torrent_to_qbittorrent(magnet_link: str) -> dict[str, Any]:
def add_torrent_by_index(index: int) -> dict[str, Any]:
"""
Add a torrent from the last search results by its index.
Combines get_torrent_by_index and add_torrent_to_qbittorrent.
Args:
index: 1-based index of the torrent in the search results.
Returns:
Dict with success status or error details.
"""
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/add_torrent_by_index.yaml."""
logger.info(f"Adding torrent by index: {index}")
torrent_result = get_torrent_by_index(index)
+261 -126
View File
@@ -3,42 +3,68 @@
from pathlib import Path
from typing import Any
import alfred as _alfred_pkg
import yaml
import alfred as _alfred_pkg
from alfred.application.filesystem import (
CreateSeedLinksUseCase,
ListFolderUseCase,
ManageSubtitlesUseCase,
MoveMediaUseCase,
ResolveDestinationUseCase,
SetFolderPathUseCase,
)
from alfred.infrastructure.filesystem import FileManager
from alfred.application.filesystem.detect_media_type import detect_media_type
from alfred.application.filesystem.enrich_from_probe import enrich_from_probe
from alfred.application.filesystem.resolve_destination import (
resolve_episode_destination as _resolve_episode_destination,
)
from alfred.application.filesystem.resolve_destination import (
resolve_movie_destination as _resolve_movie_destination,
)
from alfred.application.filesystem.resolve_destination import (
resolve_season_destination as _resolve_season_destination,
)
from alfred.application.filesystem.resolve_destination import (
resolve_series_destination as _resolve_series_destination,
)
from alfred.infrastructure.filesystem import FileManager, create_folder, move
from alfred.infrastructure.filesystem.ffprobe import probe
from alfred.infrastructure.filesystem.find_video import find_video_file
from alfred.infrastructure.metadata import MetadataStore
from alfred.infrastructure.persistence import get_memory
_LEARNED_ROOT = Path(_alfred_pkg.__file__).parent.parent / "data" / "knowledge"
def move_media(source: str, destination: str) -> dict[str, Any]:
"""
Move a media file to a destination path.
Copies the file safely first (with integrity check), then deletes the source.
Use this to organise a downloaded file into the media library.
Args:
source: Absolute path to the source file.
destination: Absolute path to the destination file (must not already exist).
Returns:
Dict with status, source, destination, filename, and size — or error details.
"""
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/move_media.yaml."""
file_manager = FileManager()
use_case = MoveMediaUseCase(file_manager)
return use_case.execute(source, destination).to_dict()
def resolve_destination(
def move_to_destination(source: str, destination: str) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/move_to_destination.yaml."""
parent = str(Path(destination).parent)
result = create_folder(parent)
if result["status"] != "ok":
return result
return move(source, destination)
def resolve_season_destination(
release_name: str,
tmdb_title: str,
tmdb_year: int,
confirmed_folder: str | None = None,
) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_season_destination.yaml."""
return _resolve_season_destination(
release_name, tmdb_title, tmdb_year, confirmed_folder
).to_dict()
def resolve_episode_destination(
release_name: str,
source_file: str,
tmdb_title: str,
@@ -46,119 +72,75 @@ def resolve_destination(
tmdb_episode_title: str | None = None,
confirmed_folder: str | None = None,
) -> dict[str, Any]:
"""
Compute the destination path in the media library for a release.
Call this before move_media to get the correct library path. Handles:
- Parsing the release name (quality, codec, group, season/episode)
- Looking up any existing series folder in the library
- Applying group-conflict rules (asks user if ambiguous)
- Building the full destination path with correct naming conventions
Args:
release_name: Raw release folder or file name
(e.g. "Oz.S03.1080p.WEBRip.x265-KONTRAST").
source_file: Absolute path to the source video file (used for extension).
tmdb_title: Canonical show/movie title from TMDB (e.g. "Oz").
tmdb_year: Release/start year from TMDB (e.g. 1997).
tmdb_episode_title: Episode title from TMDB for single-episode releases
(e.g. "The Routine"). Omit for season packs and movies.
confirmed_folder: If a previous call returned needs_clarification, pass
the user-chosen folder name here to proceed.
Returns:
On success: dict with status, library_file, series_folder, season_folder,
series_folder_name, season_folder_name, filename,
is_new_series_folder.
On ambiguity: dict with status="needs_clarification", question, options.
On error: dict with status="error", error, message.
"""
use_case = ResolveDestinationUseCase()
return use_case.execute(
release_name=release_name,
source_file=source_file,
tmdb_title=tmdb_title,
tmdb_year=tmdb_year,
tmdb_episode_title=tmdb_episode_title,
confirmed_folder=confirmed_folder,
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_episode_destination.yaml."""
return _resolve_episode_destination(
release_name,
source_file,
tmdb_title,
tmdb_year,
tmdb_episode_title,
confirmed_folder,
).to_dict()
def create_seed_links(library_file: str, original_download_folder: str) -> dict[str, Any]:
"""
Prepare a torrent subfolder so qBittorrent can keep seeding after a move.
def resolve_movie_destination(
release_name: str,
source_file: str,
tmdb_title: str,
tmdb_year: int,
) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_movie_destination.yaml."""
return _resolve_movie_destination(
release_name, source_file, tmdb_title, tmdb_year
).to_dict()
Hard-links the video file from the library into torrents/<original_folder_name>/,
then copies all remaining files from the original download folder (subtitles,
.nfo, .jpg, .txt, …) so the torrent data is complete.
Call this after move_media when the user wants to keep seeding.
def resolve_series_destination(
release_name: str,
tmdb_title: str,
tmdb_year: int,
confirmed_folder: str | None = None,
) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/resolve_series_destination.yaml."""
return _resolve_series_destination(
release_name, tmdb_title, tmdb_year, confirmed_folder
).to_dict()
Args:
library_file: Absolute path to the video file now in the library.
original_download_folder: Absolute path to the original download folder
(may still contain subs, nfo, and other release files).
Returns:
Dict with status, torrent_subfolder, linked_file, copied_files,
copied_count, skipped — or error details.
"""
def create_seed_links(
library_file: str, original_download_folder: str
) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/create_seed_links.yaml."""
file_manager = FileManager()
use_case = CreateSeedLinksUseCase(file_manager)
return use_case.execute(library_file, original_download_folder).to_dict()
def manage_subtitles(source_video: str, destination_video: str) -> dict[str, Any]:
"""
Place subtitle files alongside an organised video file.
Scans for subtitle files (.srt, .ass, .ssa, .vtt, .sub) next to the source
video, filters them according to the user's SubtitlePreferences (languages,
min size, SDH, forced), and hard-links the passing files next to the
destination video with the correct naming convention:
fr.srt / fr.sdh.srt / fr.forced.srt / en.srt …
Call this right after move_media or copy_media, passing the same source and
destination paths. If no subtitles are found, returns ok with placed_count=0.
Args:
source_video: Absolute path to the original video file (in the download folder).
destination_video: Absolute path to the placed video file (in the library).
Returns:
Dict with status, placed list (source, destination, filename), placed_count,
skipped_count — or error details.
"""
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/manage_subtitles.yaml."""
file_manager = FileManager()
use_case = ManageSubtitlesUseCase(file_manager)
return use_case.execute(source_video, destination_video).to_dict()
def learn(pack: str, category: str, key: str, values: list[str]) -> dict[str, Any]:
"""
Teach Alfred a new token mapping and persist it to the learned knowledge pack.
Use this when a subtitle file contains an unrecognised token — after confirming
with the user what the token means, call learn() to persist it so Alfred
recognises it in future scans.
Args:
pack: Knowledge pack name. Currently only "subtitles" is supported.
category: Category within the pack: "languages", "types", or "formats".
key: The entry key — e.g. ISO 639-1 language code ("es"), type id ("sdh").
values: List of tokens to add — e.g. ["spanish", "espanol", "spa"].
Returns:
Dict with status, added_count, and the updated token list.
"""
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/learn.yaml."""
_VALID_PACKS = {"subtitles"}
_VALID_CATEGORIES = {"languages", "types", "formats"}
if pack not in _VALID_PACKS:
return {"status": "error", "error": "unknown_pack", "message": f"Unknown pack '{pack}'. Valid: {sorted(_VALID_PACKS)}"}
return {
"status": "error",
"error": "unknown_pack",
"message": f"Unknown pack '{pack}'. Valid: {sorted(_VALID_PACKS)}",
}
if category not in _VALID_CATEGORIES:
return {"status": "error", "error": "unknown_category", "message": f"Unknown category '{category}'. Valid: {sorted(_VALID_CATEGORIES)}"}
return {
"status": "error",
"error": "unknown_category",
"message": f"Unknown category '{category}'. Valid: {sorted(_VALID_CATEGORIES)}",
}
learned_path = _LEARNED_ROOT / "subtitles_learned.yaml"
_LEARNED_ROOT.mkdir(parents=True, exist_ok=True)
@@ -180,7 +162,9 @@ def learn(pack: str, category: str, key: str, values: list[str]) -> dict[str, An
tmp = learned_path.with_suffix(".yaml.tmp")
try:
with open(tmp, "w", encoding="utf-8") as f:
yaml.safe_dump(data, f, allow_unicode=True, default_flow_style=False, sort_keys=False)
yaml.safe_dump(
data, f, allow_unicode=True, default_flow_style=False, sort_keys=False
)
tmp.rename(learned_path)
except Exception as e:
tmp.unlink(missing_ok=True)
@@ -197,34 +181,185 @@ def learn(pack: str, category: str, key: str, values: list[str]) -> dict[str, An
def set_path_for_folder(folder_name: str, path_value: str) -> dict[str, Any]:
"""
Set a folder path in the configuration.
Args:
folder_name: Name of folder to set (download, tvshow, movie, torrent).
path_value: Absolute path to the folder.
Returns:
Dict with status or error information.
"""
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/set_path_for_folder.yaml."""
file_manager = FileManager()
use_case = SetFolderPathUseCase(file_manager)
response = use_case.execute(folder_name, path_value)
return response.to_dict()
def analyze_release(release_name: str, source_path: str) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/analyze_release.yaml."""
from alfred.domain.release.services import parse_release # noqa: PLC0415
path = Path(source_path)
parsed = parse_release(release_name)
parsed.media_type = detect_media_type(parsed, path)
probe_used = False
if parsed.media_type not in ("unknown", "other"):
video_file = find_video_file(path)
if video_file:
media_info = probe(video_file)
if media_info:
enrich_from_probe(parsed, media_info)
probe_used = True
return {
"status": "ok",
"media_type": parsed.media_type,
"parse_path": parsed.parse_path,
"title": parsed.title,
"year": parsed.year,
"season": parsed.season,
"episode": parsed.episode,
"episode_end": parsed.episode_end,
"quality": parsed.quality,
"source": parsed.source,
"codec": parsed.codec,
"group": parsed.group,
"languages": parsed.languages,
"audio_codec": parsed.audio_codec,
"audio_channels": parsed.audio_channels,
"bit_depth": parsed.bit_depth,
"hdr_format": parsed.hdr_format,
"edition": parsed.edition,
"site_tag": parsed.site_tag,
"is_season_pack": parsed.is_season_pack,
"probe_used": probe_used,
}
def probe_media(source_path: str) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/probe_media.yaml."""
path = Path(source_path)
if not path.exists():
return {
"status": "error",
"error": "not_found",
"message": f"{source_path} does not exist",
}
media_info = probe(path)
if media_info is None:
return {
"status": "error",
"error": "probe_failed",
"message": "ffprobe failed to read the file",
}
return {
"status": "ok",
"video": {
"codec": media_info.video_codec,
"resolution": media_info.resolution,
"width": media_info.width,
"height": media_info.height,
"duration_seconds": media_info.duration_seconds,
"bitrate_kbps": media_info.bitrate_kbps,
},
"audio_tracks": [
{
"index": t.index,
"codec": t.codec,
"channels": t.channels,
"channel_layout": t.channel_layout,
"language": t.language,
"is_default": t.is_default,
}
for t in media_info.audio_tracks
],
"subtitle_tracks": [
{
"index": t.index,
"codec": t.codec,
"language": t.language,
"is_default": t.is_default,
"is_forced": t.is_forced,
}
for t in media_info.subtitle_tracks
],
"audio_languages": media_info.audio_languages,
"is_multi_audio": media_info.is_multi_audio,
}
def list_folder(folder_type: str, path: str = ".") -> dict[str, Any]:
"""
List contents of a configured folder.
Args:
folder_type: Type of folder to list (download, tvshow, movie, torrent).
path: Relative path within the folder (default: root).
Returns:
Dict with folder contents or error information.
"""
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/list_folder.yaml."""
file_manager = FileManager()
use_case = ListFolderUseCase(file_manager)
response = use_case.execute(folder_type, path)
return response.to_dict()
def read_release_metadata(release_path: str) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/read_release_metadata.yaml."""
path = Path(release_path)
if not path.exists():
return {
"status": "error",
"error": "not_found",
"message": f"{release_path} does not exist",
}
root = path if path.is_dir() else path.parent
store = MetadataStore(root)
if not store.exists():
return {
"status": "ok",
"release_path": str(root),
"has_metadata": False,
"metadata": {},
}
return {
"status": "ok",
"release_path": str(root),
"has_metadata": True,
"metadata": store.load(),
}
def query_library(name: str) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/query_library.yaml."""
needle = name.strip().lower()
if not needle:
return {
"status": "error",
"error": "empty_name",
"message": "name must be a non-empty string",
}
memory = get_memory()
roots = memory.ltm.library_paths.to_dict() or {}
if not roots:
return {
"status": "error",
"error": "no_libraries",
"message": "No library paths configured — call set_path_for_folder first.",
}
matches: list[dict[str, Any]] = []
for collection, root in roots.items():
root_path = Path(root)
if not root_path.is_dir():
continue
for entry in root_path.iterdir():
if not entry.is_dir():
continue
if needle not in entry.name.lower():
continue
store = MetadataStore(entry)
matches.append(
{
"collection": collection,
"name": entry.name,
"path": str(entry),
"has_metadata": store.exists(),
}
)
return {
"status": "ok",
"query": name,
"match_count": len(matches),
"matches": matches,
}
+1 -9
View File
@@ -9,15 +9,7 @@ logger = logging.getLogger(__name__)
def set_language(language: str) -> dict[str, Any]:
"""
Set the conversation language.
Args:
language: Language code (e.g., 'en', 'fr', 'es', 'de')
Returns:
Status dictionary
"""
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/set_language.yaml."""
try:
memory = get_memory()
memory.stm.set_language(language)
+221
View File
@@ -0,0 +1,221 @@
"""
ToolSpec — semantic description of a tool, loaded from YAML.
Each tool exposed to the agent has a matching YAML spec under
alfred/agent/tools/specs/{tool_name}.yaml. The spec carries everything the
LLM needs to decide *when* and *why* to call the tool — separated from the
Python signature, which remains the source of truth for *how* (types,
required-ness).
The YAML structure is documented in the dataclasses below. Loading a spec
validates its shape; missing or unexpected fields raise ToolSpecError.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from pathlib import Path
import yaml
class ToolSpecError(ValueError):
"""Raised when a YAML tool spec is malformed or inconsistent."""
@dataclass(frozen=True)
class ParameterSpec:
"""Semantic description of a single tool parameter."""
description: str # Short: what the value represents.
why_needed: str # Why the tool needs this — drives LLM reasoning.
example: str | None = None # Concrete example value, shown to the LLM.
@classmethod
def from_dict(cls, name: str, data: dict) -> ParameterSpec:
_require(data, "description", f"parameter '{name}'")
_require(data, "why_needed", f"parameter '{name}'")
return cls(
description=str(data["description"]).strip(),
why_needed=str(data["why_needed"]).strip(),
example=str(data["example"]).strip()
if data.get("example") is not None
else None,
)
@dataclass(frozen=True)
class ReturnsSpec:
"""Description of one possible return shape (ok / needs_clarification / error / ...)."""
description: str
fields: dict[str, str] = field(default_factory=dict)
@classmethod
def from_dict(cls, key: str, data: dict) -> ReturnsSpec:
_require(data, "description", f"returns.{key}")
fields = data.get("fields") or {}
if not isinstance(fields, dict):
raise ToolSpecError(
f"returns.{key}.fields must be a dict, got {type(fields).__name__}"
)
return cls(
description=str(data["description"]).strip(),
fields={str(k): str(v).strip() for k, v in fields.items()},
)
@dataclass(frozen=True)
class CacheSpec:
"""Marks a tool as cacheable in STM.tool_results, keyed by one of its parameters."""
key: str # Name of the parameter whose value is the cache key.
@classmethod
def from_dict(cls, data: dict) -> CacheSpec:
_require(data, "key", "cache")
return cls(key=str(data["key"]).strip())
@dataclass(frozen=True)
class ToolSpec:
"""Full semantic spec for one tool."""
name: str
summary: str # One-liner — becomes Tool.description.
description: str # Longer paragraph.
when_to_use: str
when_not_to_use: str | None
next_steps: str | None
parameters: dict[str, ParameterSpec] # name -> ParameterSpec
returns: dict[str, ReturnsSpec] # status_key -> ReturnsSpec
cache: CacheSpec | None = None # If present, tool is cached.
@classmethod
def from_yaml_path(cls, path: Path) -> ToolSpec:
with open(path, encoding="utf-8") as f:
data = yaml.safe_load(f) or {}
if not isinstance(data, dict):
raise ToolSpecError(f"{path}: top-level must be a mapping")
try:
return cls.from_dict(data)
except ToolSpecError as e:
raise ToolSpecError(f"{path}: {e}") from e
@classmethod
def from_dict(cls, data: dict) -> ToolSpec:
_require(data, "name", "spec")
_require(data, "summary", "spec")
_require(data, "description", "spec")
_require(data, "when_to_use", "spec")
params_raw = data.get("parameters") or {}
if not isinstance(params_raw, dict):
raise ToolSpecError("parameters must be a mapping")
parameters = {
pname: ParameterSpec.from_dict(pname, pdata or {})
for pname, pdata in params_raw.items()
}
returns_raw = data.get("returns") or {}
if not isinstance(returns_raw, dict):
raise ToolSpecError("returns must be a mapping")
returns = {
rkey: ReturnsSpec.from_dict(rkey, rdata or {})
for rkey, rdata in returns_raw.items()
}
cache_raw = data.get("cache")
if cache_raw is not None and not isinstance(cache_raw, dict):
raise ToolSpecError("cache must be a mapping")
cache = CacheSpec.from_dict(cache_raw) if cache_raw else None
spec = cls(
name=str(data["name"]).strip(),
summary=str(data["summary"]).strip(),
description=str(data["description"]).strip(),
when_to_use=str(data["when_to_use"]).strip(),
when_not_to_use=_strip_or_none(data.get("when_not_to_use")),
next_steps=_strip_or_none(data.get("next_steps")),
parameters=parameters,
returns=returns,
cache=cache,
)
if cache is not None and cache.key not in parameters:
raise ToolSpecError(
f"cache.key '{cache.key}' is not a declared parameter "
f"(declared: {sorted(parameters)})"
)
return spec
def compile_description(self) -> str:
"""
Build the long description text passed to the LLM as Tool.description.
Layout:
<summary>
<description>
When to use:
<when_to_use>
When NOT to use: (if present)
<when_not_to_use>
Next steps: (if present)
<next_steps>
Returns:
<status>: <description>
· <field>: <desc>
"""
parts = [self.summary, "", self.description]
parts += ["", "When to use:", _indent(self.when_to_use)]
if self.when_not_to_use:
parts += ["", "When NOT to use:", _indent(self.when_not_to_use)]
if self.next_steps:
parts += ["", "Next steps:", _indent(self.next_steps)]
if self.returns:
parts += ["", "Returns:"]
for status, ret in self.returns.items():
parts.append(f" {status}: {ret.description}")
for fname, fdesc in ret.fields.items():
parts.append(f" · {fname}: {fdesc}")
return "\n".join(parts)
def compile_parameter_description(self, name: str) -> str:
"""Build the JSON Schema 'description' field for one parameter."""
p = self.parameters.get(name)
if p is None:
raise ToolSpecError(f"tool '{self.name}': no spec for parameter '{name}'")
text = f"{p.description} (Why: {p.why_needed})"
if p.example:
text += f" Example: {p.example}"
return text
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _require(data: dict, key: str, where: str) -> None:
if data.get(key) is None or (isinstance(data[key], str) and not data[key].strip()):
raise ToolSpecError(f"{where}: missing required field '{key}'")
def _strip_or_none(value) -> str | None:
if value is None:
return None
s = str(value).strip()
return s or None
def _indent(text: str, prefix: str = " ") -> str:
return "\n".join(prefix + line for line in text.splitlines())
+53
View File
@@ -0,0 +1,53 @@
"""
ToolSpecLoader — discover and load all YAML tool specs from a directory.
Convention: one YAML file per tool, named exactly like the Python function
that implements it (e.g. resolve_season_destination.yaml).
"""
from __future__ import annotations
import logging
from pathlib import Path
from .spec import ToolSpec, ToolSpecError
logger = logging.getLogger(__name__)
_DEFAULT_SPECS_DIR = Path(__file__).parent / "specs"
def load_tool_specs(specs_dir: Path | None = None) -> dict[str, ToolSpec]:
"""
Load every {tool}.yaml under specs_dir into a {name -> ToolSpec} mapping.
Args:
specs_dir: Directory to scan. Defaults to alfred/agent/tools/specs/.
Returns:
Mapping from tool name to its parsed ToolSpec.
Raises:
ToolSpecError: if a spec is malformed, or if the filename doesn't
match the 'name' field inside the YAML.
"""
root = specs_dir or _DEFAULT_SPECS_DIR
if not root.exists():
logger.warning(f"Tool specs directory not found: {root}")
return {}
specs: dict[str, ToolSpec] = {}
for path in sorted(root.glob("*.yaml")):
spec = ToolSpec.from_yaml_path(path)
expected_name = path.stem
if spec.name != expected_name:
raise ToolSpecError(
f"{path}: filename stem '{expected_name}' "
f"does not match spec.name '{spec.name}'"
)
if spec.name in specs:
raise ToolSpecError(f"duplicate tool spec name: '{spec.name}'")
specs[spec.name] = spec
logger.info(f"Loaded {len(specs)} tool spec(s) from {root}")
return specs
@@ -0,0 +1,53 @@
name: add_torrent_by_index
summary: >
Pick a torrent from the last find_torrent results by index and add
it to qBittorrent in one call.
description: |
Convenience wrapper that combines get_torrent_by_index +
add_torrent_to_qbittorrent. Looks up the torrent at the given
1-based index, extracts its magnet link, and sends it to
qBittorrent. The result mirrors add_torrent_to_qbittorrent's, with
the chosen torrent's name appended on success.
when_to_use: |
The default action after find_torrent when the user picks a hit by
number ("download the second one"). One call, two side effects:
episodic memory updated + download started.
when_not_to_use: |
- When the user only wants to inspect, not download — use
get_torrent_by_index.
- When the magnet comes from outside the search results — use
add_torrent_to_qbittorrent directly.
next_steps: |
- On status=ok: confirm the download started and end the workflow
if not already ended.
- On status=error (not_found): the index is out of range; show the
available count from episodic memory.
- On status=error (no_magnet): the search result was malformed —
suggest re-running find_torrent.
parameters:
index:
description: 1-based position of the torrent in the last find_torrent results.
why_needed: |
Identifies which torrent to add. Out-of-range indices return
not_found.
example: 3
returns:
ok:
description: Torrent was added to qBittorrent.
fields:
status: "'ok'"
message: Confirmation message.
torrent_name: Name of the torrent that was added.
error:
description: Failed to add.
fields:
error: Short error code (not_found, no_magnet, ...).
message: Human-readable explanation.
@@ -0,0 +1,48 @@
name: add_torrent_to_qbittorrent
summary: >
Send a magnet link to qBittorrent and start the download.
description: |
Adds a torrent to qBittorrent using its WebUI API. On success, the
download is also recorded in episodic memory as an active_download
so the agent can track its progress later, the STM topic is set to
"downloading", and the current workflow is ended (the user typically
leaves the find-and-download scope at this point).
when_to_use: |
When the user provides a raw magnet link, or when chaining manually
after get_torrent_by_index. For the common "user picked search hit
N" case, prefer add_torrent_by_index — one call instead of two.
when_not_to_use: |
- For .torrent files (not supported by this tool — magnet only).
- When qBittorrent is not configured / reachable — the call will
fail and the user has to fix the config first.
next_steps: |
- On status=ok: the workflow is already ended; confirm to the user
that the download has started.
- On status=error: surface the message; common causes are auth
failure or qBittorrent being unreachable.
parameters:
magnet_link:
description: Magnet URI of the torrent to add (magnet:?xt=urn:btih:...).
why_needed: |
The actual payload sent to qBittorrent. Must be a full magnet
URI, not a hash alone.
example: "magnet:?xt=urn:btih:abc123..."
returns:
ok:
description: Torrent accepted by qBittorrent.
fields:
status: "'ok'"
message: Confirmation message.
error:
description: qBittorrent rejected the request or is unreachable.
fields:
error: Short error code.
message: Human-readable explanation.
@@ -0,0 +1,82 @@
name: analyze_release
summary: >
One-shot analyzer that parses a release name, detects its media type
from the folder layout, and enriches the result with ffprobe data.
description: |
Combines three steps in a single call so the agent gets a complete
picture before routing:
1. parse_release(release_name) — extracts title, year, season,
episode, quality, source, codec, group, languages, audio info,
HDR, edition, site tag.
2. detect_media_type(parsed, path) — uses the on-disk layout
(single file vs. folder, presence of S01 dirs, episode count)
to choose: movie / tv_episode / tv_season / tv_complete /
other / unknown.
3. ffprobe enrichment — when the media type is recognised, runs
ffprobe on the first video file found and fills in audio
codec/channels, bit depth, HDR format. Sets probe_used=true.
when_to_use: |
As the very first step of any organize workflow, right after
list_folder, on each release the user wants to handle. The output
drives which resolve_*_destination to call next.
when_not_to_use: |
- When you only need codec/audio info on a specific video file:
use probe_media (no parsing, no media-type detection).
- For releases the user has already analyzed earlier in the same
workflow — the parse is deterministic, no need to re-run.
next_steps: |
- media_type == movie → resolve_movie_destination
- media_type == tv_season → resolve_season_destination
- media_type == tv_episode → resolve_episode_destination
- media_type == tv_complete → resolve_series_destination
- media_type in (other, unknown) → ask the user what to do; do not
auto-route.
cache:
key: source_path
parameters:
release_name:
description: Raw release folder or file name as it appears on disk.
why_needed: |
Source of all the parsed tokens (quality, codec, group, ...).
Don't sanitise it — the parser relies on the exact spelling.
example: Breaking.Bad.S01.1080p.BluRay.x265-GROUP
source_path:
description: Absolute path to the release folder or file on disk.
why_needed: |
Required for layout-based media-type detection and for ffprobe
to find a video file inside the release.
example: /downloads/Breaking.Bad.S01.1080p.BluRay.x265-GROUP
returns:
ok:
description: Release analyzed.
fields:
status: "'ok'"
media_type: "One of: movie, tv_episode, tv_season, tv_complete, other, unknown."
parse_path: "Which parser branch was taken (debug)."
title: Parsed title.
year: Parsed year (int) or null.
season: Season number (int) or null.
episode: Episode number (int) or null.
episode_end: Range end episode (multi-episode releases) or null.
quality: Resolution token (e.g. 1080p, 2160p).
source: Source token (BluRay, WEB-DL, ...).
codec: Video codec token (x264, x265, ...).
group: Release group name or null.
languages: List of detected language tokens.
audio_codec: Audio codec from ffprobe (when probe_used=true).
audio_channels: Audio channel count from ffprobe.
bit_depth: Bit depth from ffprobe.
hdr_format: HDR format from ffprobe (HDR10, DV, ...) or null.
edition: Edition tag (Extended, Director's Cut, ...) or null.
site_tag: Source-site tag if present.
is_season_pack: True when the folder contains a full season.
probe_used: True when ffprobe successfully enriched the result.
@@ -0,0 +1,59 @@
name: create_seed_links
summary: >
Recreate the original torrent folder structure with hard-links so
qBittorrent can keep seeding after the library move.
description: |
Hard-links the library video file back into torrents/<original_folder_name>/
and copies all remaining files from the original download folder
(subtitles, .nfo, .jpg, .txt, …) so the torrent data is complete on
disk. qBittorrent then sees the same content at the location it
expects and can keep seeding without rehashing the whole torrent.
when_to_use: |
Only when the user has confirmed they want to keep seeding after a
move. Call right after manage_subtitles (or after move_media if there
are no subs).
when_not_to_use: |
- When the user explicitly answered "no" to "keep seeding?".
- When the download was not from a torrent (e.g. direct download).
- Before the library file is in place — this tool reads it.
next_steps: |
- After success: optionally call qBittorrent to update the torrent's
save path / force a recheck (not yet covered by a tool).
- End the workflow.
parameters:
library_file:
description: Absolute path to the video file now in the library.
why_needed: |
The source for the hard-link — same inode means qBittorrent sees
identical bytes at the seeding path.
example: /tv_shows/Oz.1997.1080p.WEBRip.x265-KONTRAST/Season 03/Oz.S03E01.mkv
original_download_folder:
description: Absolute path to the original download folder.
why_needed: |
Provides the folder name to recreate under torrents/ and the
auxiliary files (subs, nfo, ...) to copy over.
example: /downloads/Oz.S03.1080p.WEBRip.x265-KONTRAST
returns:
ok:
description: Seeding folder rebuilt.
fields:
status: "'ok'"
torrent_subfolder: Absolute path of the recreated folder under torrents/.
linked_file: Absolute path of the hard-linked video.
copied_files: List of auxiliary files that were copied.
copied_count: Number of auxiliary files copied.
skipped: List of files skipped (already present, unreadable, ...).
error:
description: Failed to rebuild the seeding folder.
fields:
error: Short error code.
message: Human-readable explanation.
@@ -0,0 +1,48 @@
name: end_workflow
summary: >
Leave the current workflow scope and return to the broad-catalog mode.
description: |
Clears the active workflow from STM. After this call the visible tool
catalog returns to the core noyau plus start_workflow, so the agent is
ready to handle a different request.
when_to_use: |
- When all the workflow's steps have completed successfully.
- When the user explicitly cancels the current task.
- When the user changes subject mid-conversation and the active
workflow is no longer relevant.
- When an unrecoverable error makes continuing pointless — explain
in 'reason'.
when_not_to_use: |
- Do not call when there is no active workflow — it will return an
error. Just call start_workflow for the new request instead.
- Do not call mid-step just to "free up tools"; finish the step
or fail it explicitly first.
next_steps: |
- After ending, you can either call start_workflow for a new task or
answer the user directly from the broad catalog.
parameters:
reason:
description: Short reason for ending — completed, cancelled, changed_subject, error, ...
why_needed: |
Recorded in episodic memory for debugging and future audits. A
structured short string is more useful than a long sentence.
example: completed
returns:
ok:
description: Workflow ended; catalog is back to the broad noyau.
fields:
workflow: Name of the workflow that just ended.
reason: The reason that was passed in.
error:
description: Could not end — typically because nothing was active.
fields:
error: Short error code (no_active_workflow).
message: Human-readable explanation.
@@ -0,0 +1,56 @@
name: find_media_imdb_id
summary: >
Search TMDB for a media title and return its canonical title, year,
IMDb id, and TMDB id.
description: |
Looks up a title on TMDB and returns the canonical metadata needed by
the resolve_*_destination tools. On success, the result is also
stashed in short-term memory under "last_media_search" so later steps
in the workflow can read it without re-calling TMDB. The STM topic
is set to "searching_media".
when_to_use: |
Right after analyze_release, before calling resolve_*_destination —
the resolvers need the canonical title + year and refuse to guess
them from the raw release name.
when_not_to_use: |
- When you already have the IMDb id in STM from an earlier step in
the same workflow.
- For torrent search — use find_torrent instead.
next_steps: |
- On status=ok: call the appropriate resolve_*_destination with
tmdb_title and tmdb_year from the result.
- On status=error (not_found): show the error and ask the user for
a more precise title.
cache:
key: media_title
parameters:
media_title:
description: Title to search for. Free-form — TMDB does the matching.
why_needed: |
Drives the TMDB query. Pass a sanitized version (no resolution
tokens, no group name) for best results.
example: Breaking Bad
returns:
ok:
description: Match found.
fields:
status: "'ok'"
title: Canonical title as returned by TMDB.
year: Release year (movies) or first-air year (series).
media_type: "'movie' or 'tv'."
imdb_id: IMDb identifier (ttXXXXXXX) or null.
tmdb_id: TMDB numeric id.
error:
description: No match or API failure.
fields:
error: Short error code (not_found, api_error, ...).
message: Human-readable explanation.
@@ -0,0 +1,52 @@
name: find_torrent
summary: >
Search Knaben for torrents matching a media title; cache results in
episodic memory.
description: |
Queries the Knaben aggregator for up to 10 torrents matching the
given title, then stores the result list in episodic memory under
"last_search_results". The user can then refer to a torrent by
1-based index ("download the 3rd one") via get_torrent_by_index or
add_torrent_by_index. The STM topic is set to "selecting_torrent".
when_to_use: |
When the user wants to download something new — typically the first
step of a "find + download" sub-task. The agent should usually
pre-filter the title (canonical name + year) before searching for
cleaner results.
when_not_to_use: |
- For TMDB metadata lookup — use find_media_imdb_id.
- When a search was already performed in the same session and the
user is just picking from the existing list.
next_steps: |
- Present the indexed results to the user.
- Once chosen: call add_torrent_by_index(N) — that wraps
get_torrent_by_index + add_torrent_to_qbittorrent.
cache:
key: media_title
parameters:
media_title:
description: Title to search for on Knaben. Free-form.
why_needed: |
Drives the search query. Use the canonical title (from
find_media_imdb_id) plus quality preferences for better hits.
example: Inception 2010 1080p
returns:
ok:
description: Search returned a list of torrents.
fields:
status: "'ok'"
torrents: "List of {name, size, seeders, leechers, magnet, ...}, up to 10."
error:
description: Search failed.
fields:
error: Short error code.
message: Human-readable explanation.
@@ -0,0 +1,48 @@
name: get_torrent_by_index
summary: >
Retrieve a torrent from the last find_torrent search by its 1-based
index.
description: |
Reads episodic memory's last_search_results and returns the entry at
the given 1-based position. Pure lookup — does not start a download.
Fails when the search results are missing or the index is out of
range.
when_to_use: |
When the user references a search hit by number ("show me the second
one") but doesn't yet want to download — e.g. inspection, sharing
the magnet, ...
when_not_to_use: |
- When the user wants to start downloading: use add_torrent_by_index
instead (one call instead of two).
- When no search has been performed yet — the result will be
not_found.
next_steps: |
- Display the torrent to the user.
- If they then say "add it", call add_torrent_to_qbittorrent with the
magnet, or add_torrent_by_index with the same index.
parameters:
index:
description: 1-based position in the last find_torrent result list.
why_needed: |
Maps to a specific torrent entry. Out-of-range values return an
error, not a wraparound.
example: 3
returns:
ok:
description: Torrent found at that index.
fields:
status: "'ok'"
torrent: "Full torrent dict (name, size, seeders, leechers, magnet, ...)."
error:
description: No torrent at that index.
fields:
error: Short error code (not_found).
message: Human-readable explanation, e.g. "Search for torrents first."
+76
View File
@@ -0,0 +1,76 @@
name: learn
summary: >
Teach Alfred a new token mapping and persist it to the learned
knowledge pack so future scans recognise it.
description: |
Appends a new token (or list of tokens) to a key inside a knowledge
pack and writes the result to `data/knowledge/<pack>_learned.yaml`.
The change is persisted atomically (write-tmp + rename) so a crash
cannot corrupt the file. Currently only the `subtitles` pack is
supported.
when_to_use: |
When manage_subtitles returns needs_clarification with unresolved
tokens, after confirming with the user what the tokens mean. Call
once per (category, key) — multiple values can be added in a single
call.
when_not_to_use: |
- Without explicit user confirmation of what the token means.
- For knowledge that belongs in the static pack
(alfred/knowledge/<pack>.yaml) — that's editor territory, not
runtime learning.
next_steps: |
- After success: re-run the workflow step that triggered the
clarification (typically manage_subtitles) so the new mapping is
applied.
parameters:
pack:
description: Knowledge pack name. Currently only "subtitles" is supported.
why_needed: |
Decides which `*_learned.yaml` file under data/knowledge/ gets
written. The pack name is namespaced to avoid collisions across
domains.
example: subtitles
category:
description: Category within the pack — "languages", "types", or "formats".
why_needed: |
Different categories use different lookup tables at scan time.
A wrong category silently has no effect.
example: languages
key:
description: Canonical entry id — ISO 639-1 code, type name, format name.
why_needed: |
The destination bucket for the new tokens. Existing tokens under
this key are kept; only new values are appended.
example: es
values:
description: List of token spellings to add.
why_needed: |
Release groups use many spellings for the same language/type;
pass them all in one call instead of multiple round-trips.
example: '["spanish", "espanol", "spa"]'
returns:
ok:
description: Mapping saved.
fields:
status: "'ok'"
pack: Name of the pack that was written to.
category: Category that was updated.
key: Key that was updated.
added_count: Number of values that were actually new (deduplicated).
tokens: Full updated token list for that key.
error:
description: Save failed.
fields:
error: Short error code (unknown_pack, unknown_category, read_failed, write_failed).
message: Human-readable explanation.
+63
View File
@@ -0,0 +1,63 @@
name: list_folder
summary: >
List the contents of a configured folder, optionally below a
relative subpath.
description: |
Reads a folder previously configured via set_path_for_folder and
returns its entries (files + directories). A relative `path` lets you
drill down without re-specifying the absolute root each time. Path
traversal is rejected (no `..`, no absolute paths) so the agent
cannot escape the configured root.
when_to_use: |
- At the start of an organize workflow to discover what's available
in the download folder.
- To browse a library collection ("what tv shows do I have?").
- As a sanity check before any move to confirm the target exists.
when_not_to_use: |
- For folders that are not configured — call set_path_for_folder
first.
- To list arbitrary system paths — this tool is intentionally scoped
to the known roots.
next_steps: |
- After listing the download folder: typically call analyze_release
on a specific entry.
- After listing a library folder: use the result to disambiguate a
destination during resolve_*_destination.
cache:
key: path
parameters:
folder_type:
description: Logical folder key (download, torrent, movie, tv_show, ...).
why_needed: |
Resolves to an absolute root through LTM. Must have been set via
set_path_for_folder beforehand.
example: download
path:
description: Relative subpath inside the root (default ".").
why_needed: |
Lets you drill into a subfolder without expanding the root. No
".." or absolute path is allowed.
example: Breaking.Bad.S01.1080p.BluRay.x265-GROUP
returns:
ok:
description: Listing returned.
fields:
status: "'ok'"
folder_type: The key that was listed.
path: The relative path that was listed.
entries: List of {name, type, size?} for each entry.
error:
description: Could not list the folder.
fields:
error: Short error code (folder_not_configured, path_not_found, path_traversal, ...).
message: Human-readable explanation.
@@ -0,0 +1,67 @@
name: manage_subtitles
summary: >
Detect, filter, and place subtitle tracks next to a video that has just
been organised into the library.
description: |
Scans the source video's surroundings for subtitle files
(.srt, .ass, .ssa, .vtt, .sub), classifies them by language and type
(standard / SDH / forced), filters by the user's SubtitlePreferences
(languages, min size, keep_sdh, keep_forced), and hard-links the
passing files next to the destination video using the convention
`<lang>.<ext>`, `<lang>.sdh.<ext>`, `<lang>.forced.<ext>`.
If no subtitles are found, returns status=ok with placed_count=0 — not
an error.
when_to_use: |
Always after a successful move_media / move_to_destination, before
closing the workflow. Pass the original source path (where subs live)
and the new library path (where they should land).
when_not_to_use: |
- Do not call before the video itself has been moved — the destination
must exist for hard-links to make sense.
- Skip when the user explicitly asks not to handle subtitles.
next_steps: |
- On status=ok: continue with create_seed_links (if seeding) or end
the workflow.
- On status=needs_clarification: ask the user about the unresolved
tokens, then optionally call learn() to teach the new mapping.
parameters:
source_video:
description: Absolute path to the original video file (in the download folder).
why_needed: |
Subtitles typically live next to the source, either as siblings or
in a Subs/ subfolder. The scanner walks from this path.
example: /downloads/Oz.S03.1080p.WEBRip.x265-KONTRAST/Oz.S03E01.mkv
destination_video:
description: Absolute path to the video file in its library location.
why_needed: |
Subtitles are hard-linked next to this file so media players pick
them up automatically.
example: /tv_shows/Oz.1997.1080p.WEBRip.x265-KONTRAST/Season 03/Oz.S03E01.mkv
returns:
ok:
description: Subtitles scanned (and possibly placed).
fields:
status: "'ok'"
placed: List of {source, destination, filename} for each linked file.
placed_count: Number of subtitle files placed.
skipped_count: Number of subtitle files filtered out.
needs_clarification:
description: One or more tokens could not be classified.
fields:
unresolved: List of unrecognised tokens with their context.
question: Human-readable question to relay to the user.
error:
description: Scan or placement failed.
fields:
error: Short error code.
message: Human-readable explanation.
+58
View File
@@ -0,0 +1,58 @@
name: move_media
summary: >
Safely move a media file with copy + integrity check + delete source.
description: |
Copies the source file to the destination with an integrity check,
then deletes the source. Slower than move_to_destination (which is a
plain rename) but safer across filesystems where rename is not atomic
or when you want a checksum verification.
when_to_use: |
Use to move a single file across filesystems or when paranoia about
data integrity is justified — e.g. moving a finished download from a
scratch disk to the main library array.
when_not_to_use: |
- For same-filesystem moves where speed matters: use move_to_destination
(instant rename on ZFS/ext4 within the same dataset).
- For folder-level moves of complete packs: use move_to_destination —
move_media is a single-file operation.
next_steps: |
- After a successful move: call manage_subtitles to place any subtitle
tracks, then create_seed_links if the user wants to keep seeding.
- On error: surface the error code (file_not_found, destination_exists,
integrity_check_failed) and ask the user how to proceed.
parameters:
source:
description: Absolute path to the source video file.
why_needed: |
The file being moved. Typically lives under the downloads folder
after a torrent completes.
example: /downloads/Inception.2010.1080p.BluRay.x265-GROUP/movie.mkv
destination:
description: Absolute path of the destination file — must not already exist.
why_needed: |
Where the file lands in the library. Comes from a resolve_*_destination
call so the naming convention is respected.
example: /movies/Inception.2010.1080p.BluRay.x265-GROUP/Inception.2010.1080p.BluRay.x265-GROUP.mkv
returns:
ok:
description: Move succeeded.
fields:
status: "'ok'"
source: Absolute path of the source (now gone).
destination: Absolute path of the destination (now in place).
filename: Basename of the destination file.
size: Size in bytes.
error:
description: Move failed.
fields:
error: Short error code (file_not_found, destination_exists, integrity_check_failed, ...).
message: Human-readable explanation.
@@ -0,0 +1,55 @@
name: move_to_destination
summary: >
Move a file or folder to a destination, creating parent directories as needed.
description: |
Performs an actual move on disk. Uses the system 'mv' command, so on the
same filesystem (e.g. ZFS) this is an instant rename. Creates the parent
directory of the destination if it doesn't exist yet, then moves. Returns
before/after paths on success, or an error if the destination already
exists or the source can't be moved.
when_to_use: |
Use after one of the resolve_*_destination tools returned status=ok, to
perform the move it described. The 'source' and 'destination' arguments
come directly from the resolved paths.
when_not_to_use: |
- Never move when status was not 'ok' (clarification still pending or
error happened) — that would leave the library in a half-broken state.
- Don't use this for the seed-link step; use create_seed_links for that.
next_steps: |
- After a successful move: call manage_subtitles to place any subtitle
tracks, then create_seed_links to keep qBittorrent seeding.
- On error: surface the message; do not retry blindly — check whether
the destination already exists or the source path is correct.
parameters:
source:
description: Absolute path to the source file or folder to move.
why_needed: |
The thing being moved. Comes from the user's download folder or from
a previous tool's output.
example: /downloads/Oz.S03.1080p.WEBRip.x265-KONTRAST
destination:
description: Absolute path of the destination — must not already exist.
why_needed: |
Where to put the source. Comes from a resolve_*_destination call so
that the path matches the library's naming convention.
example: /tv_shows/Oz.1997.1080p.WEBRip.x265-KONTRAST/Oz.S03.1080p.WEBRip.x265-KONTRAST
returns:
ok:
description: Move succeeded.
fields:
source: Absolute path of the source (now gone).
destination: Absolute path of the destination (now in place).
error:
description: Move failed.
fields:
error: Short error code (source_not_found, destination_exists, mkdir_failed, move_failed).
message: Human-readable explanation of what went wrong.
+56
View File
@@ -0,0 +1,56 @@
name: probe_media
summary: >
Run ffprobe on a single video file and return its technical details.
description: |
Inspects a specific video file with ffprobe and returns codec,
resolution, duration, bitrate, the list of audio tracks (with
language and channel layout), and the list of embedded subtitle
tracks. Independent of any release-name parsing — works on any file
you can point at.
when_to_use: |
- To inspect a file's audio/subtitle tracks before deciding what to
do (e.g. choose a default audio language).
- To verify a video's resolution / codec when the release name is
unreliable.
- As a building block when analyze_release is overkill.
when_not_to_use: |
- For full release routing — analyze_release does parsing + media
type detection + probe in one call.
- On non-video files — ffprobe will return probe_failed.
next_steps: |
- The returned info typically feeds a user-facing decision (e.g.
"this is 7.1 DTS, want to keep it?"); rarely chained directly to
another tool.
cache:
key: source_path
parameters:
source_path:
description: Absolute path to the video file to probe.
why_needed: |
ffprobe needs the exact file (not a folder). For releases use
analyze_release; for a known file path, pass it here.
example: /downloads/Inception.2010.1080p.BluRay.x265-GROUP/movie.mkv
returns:
ok:
description: Probe succeeded.
fields:
status: "'ok'"
video: "Dict with codec, resolution, width, height, duration_seconds, bitrate_kbps."
audio_tracks: "List of {index, codec, channels, channel_layout, language, is_default}."
subtitle_tracks: "List of {index, codec, language, is_default, is_forced}."
audio_languages: List of language codes present in audio tracks.
is_multi_audio: True when more than one audio language is present.
error:
description: Probe failed.
fields:
error: Short error code (not_found, probe_failed).
message: Human-readable explanation.
@@ -0,0 +1,54 @@
name: query_library
summary: >
Find release folders across all configured library roots whose name
contains a substring (case-insensitive).
description: |
Scans every configured library root (movies, tv_shows, …) at depth 1
and returns folders whose name contains the query. For each match,
reports whether a `.alfred/metadata.yaml` exists — handy to spot
releases that have not been inspected yet. Does not recurse into
seasons / episodes; one entry per release folder.
when_to_use: |
- To answer "do I already have X?" without listing whole library
roots one by one.
- To pick the release_path to feed read_release_metadata or any
inspector tool.
when_not_to_use: |
- To list the *whole* library — that scan should live behind a
dedicated tool (not implemented yet).
- To browse a single root — use list_folder instead, it's cheaper
and doesn't open every library.
next_steps: |
- When one match is found: feed its path to read_release_metadata or
analyze_release.
- When several match: surface the indexed list to the user and ask
which one they mean.
parameters:
name:
description: Case-insensitive substring of the release name to look for.
why_needed: |
Library folders are named after the release (Title.Year.... or
Title (Year)). A substring is enough to catch typical user
phrasings ("foundation", "inception 2010").
example: foundation
returns:
ok:
description: Scan completed (possibly zero matches).
fields:
status: "'ok'"
query: The query string as received.
match_count: Number of matching folders.
matches: "List of {collection, name, path, has_metadata}."
error:
description: Scan could not run.
fields:
error: Short error code (no_libraries, empty_name).
message: Human-readable explanation.
@@ -0,0 +1,55 @@
name: read_release_metadata
summary: >
Read the `.alfred/metadata.yaml` file for a release folder.
description: |
Returns whatever has been previously persisted by inspector tools
(analyze_release, probe_media, find_media_imdb_id) and by the subtitle
pipeline. Works for any folder — download or library — as long as the
release has been touched at least once. Missing metadata is not an
error: the tool returns `has_metadata=false` with an empty dict.
when_to_use: |
- Before re-running analyze_release / probe_media on a release you
might have already seen — saves a full re-inspection.
- To answer "what do we know about X?" without scanning.
- To list which releases in a library have no `.alfred` yet (loop +
`has_metadata`).
when_not_to_use: |
- To search a library by name — use query_library.
- When you need a fresh probe/parse — call the inspector directly,
the result will be persisted automatically.
next_steps: |
- If `has_metadata=false`, decide whether to inspect now
(analyze_release / probe_media).
- If `has_metadata=true`, read `metadata.parse`, `metadata.probe`,
`metadata.tmdb` blocks before deciding next actions.
cache:
key: release_path
parameters:
release_path:
description: Absolute path to the release folder (or any file inside it).
why_needed: |
The store lives at `<release_root>/.alfred/metadata.yaml`. A file
path is auto-resolved to its parent folder.
example: /mnt/library/tv_shows/Foundation.2021.1080p.WEBRip.x265-RARBG
returns:
ok:
description: Release inspected (file may or may not exist).
fields:
status: "'ok'"
release_path: Absolute path of the release folder.
has_metadata: True if `.alfred/metadata.yaml` exists.
metadata: Full content of the file, or empty dict.
error:
description: Path does not exist on disk.
fields:
error: Short error code (not_found).
message: Human-readable explanation.
@@ -0,0 +1,93 @@
name: resolve_episode_destination
summary: >
Compute destination paths for a single TV episode file (file move).
description: |
Resolves the target series folder, season subfolder, and full destination
filename for a single-episode release. Returns paths only — does not move
anything. If a series folder with a different name already exists, returns
needs_clarification.
when_to_use: |
Use after analyze_release has identified the release as a single episode
(media_type=tv_show, season AND episode both set). TMDB must already be
queried for the canonical title/year, and optionally the episode title.
when_not_to_use: |
- Season packs (folder containing many episodes): use resolve_season_destination.
- Multi-season packs: use resolve_series_destination.
- Movies: use resolve_movie_destination.
next_steps: |
- On status=ok: call move_to_destination with the source video file and
destination=library_file.
- On status=needs_clarification: present question/options to the user,
then re-call with confirmed_folder set.
- On status=error: surface the message; do not move.
parameters:
release_name:
description: Raw release file name (with extension).
why_needed: |
Drives extraction of quality/source/codec/group, which become part of
the destination filename so each file is self-describing.
example: Oz.S03E01.1080p.WEBRip.x265-KONTRAST.mkv
source_file:
description: Absolute path to the source video file on disk.
why_needed: |
Used to read the source file extension (.mkv, .mp4, .avi…) for the
destination filename — release names don't always carry the extension.
example: /downloads/Oz.S03E01.1080p.WEBRip.x265-KONTRAST/file.mkv
tmdb_title:
description: Canonical show title from TMDB.
why_needed: |
Title prefix for both the series folder and the destination filename;
ensures consistent naming across all episodes of the show.
example: Oz
tmdb_year:
description: Show start year from TMDB.
why_needed: |
Disambiguates remakes/reboots sharing a title; year is part of the
series folder identity.
example: "1997"
tmdb_episode_title:
description: Episode title from TMDB. Optional.
why_needed: |
When present, the destination filename embeds the episode title for
human-readability (e.g. Oz.S01E01.The.Routine...).
example: The Routine
confirmed_folder:
description: Folder name the user picked after needs_clarification.
why_needed: |
Forces the use case to skip detection and use this exact folder name.
example: Oz.1997.1080p.WEBRip.x265-KONTRAST
returns:
ok:
description: Paths resolved; ready to move the episode file.
fields:
series_folder: Absolute path to the series root folder.
season_folder: Absolute path to the season subfolder.
library_file: Absolute path to the destination .mkv file (move target).
series_folder_name: Series folder name for display.
season_folder_name: Season folder name for display.
filename: Destination filename for display.
is_new_series_folder: True if the series folder doesn't exist yet.
needs_clarification:
description: A folder exists with a different name; user must choose.
fields:
question: Human-readable question.
options: List of folder names to pick from.
error:
description: Resolution failed.
fields:
error: Short error code.
message: Human-readable explanation.
@@ -0,0 +1,72 @@
name: resolve_movie_destination
summary: >
Compute destination paths for a movie file (file move).
description: |
Resolves the target movie folder and full destination filename for a movie
release. Returns paths only — does not move anything. Movies do not have
the existing-folder disambiguation problem that TV shows have (each
release lands in its own folder named after the canonical title + year +
tech).
when_to_use: |
Use after analyze_release has identified the release as a movie
(media_type=movie). TMDB must already be queried for the canonical title
and release year.
when_not_to_use: |
- TV shows in any form: use resolve_season_destination /
resolve_episode_destination / resolve_series_destination.
- Documentaries when they're treated as series rather than standalone
films: route them through the TV-show resolvers.
next_steps: |
- On status=ok: call move_to_destination with the source video file and
destination=library_file.
- On status=error: surface the message; do not move.
parameters:
release_name:
description: Raw release folder or file name.
why_needed: |
Drives extraction of quality/source/codec/group/edition tokens, which
become part of both the movie folder and filename so each release is
self-describing on disk.
example: Inception.2010.1080p.BluRay.x265-GROUP
source_file:
description: Absolute path to the source video file on disk.
why_needed: |
Used to read the file extension for the destination filename.
example: /downloads/Inception.2010.1080p.BluRay.x265-GROUP/movie.mkv
tmdb_title:
description: Canonical movie title from TMDB.
why_needed: |
Title prefix for the destination folder/file; ensures the library
uses the canonical title and not a sanitized release-name title.
example: Inception
tmdb_year:
description: Movie release year from TMDB.
why_needed: |
Disambiguates remakes that share a title (Dune 1984 vs Dune 2021)
and locks the folder identity in time.
example: "2010"
returns:
ok:
description: Paths resolved; ready to move.
fields:
movie_folder: Absolute path to the movie folder.
library_file: Absolute path to the destination .mkv file (move target).
movie_folder_name: Folder name for display.
filename: Destination filename for display.
is_new_folder: True if the movie folder doesn't exist yet.
error:
description: Resolution failed.
fields:
error: Short error code (e.g. library_not_set).
message: Human-readable explanation.
@@ -0,0 +1,84 @@
name: resolve_season_destination
summary: >
Compute destination paths for a season pack (folder move) in the TV library.
description: |
Resolves the target series folder and season subfolder for a complete-season
download. Returns the paths only — does not perform any move. If a series
folder for this show already exists in the library with a different name
(different group/quality/source), returns needs_clarification so the user
can decide whether to merge into the existing folder or create a new one.
when_to_use: |
Use after analyze_release has identified the release as a season pack
(media_type=tv_show, season set, episode unset). TMDB must already be
queried so tmdb_title and tmdb_year are canonical values, not raw tokens
from the release name.
when_not_to_use: |
- Single-episode files: use resolve_episode_destination instead.
- Multi-season packs (S01-S05 etc.): use resolve_series_destination.
- Movies: use resolve_movie_destination.
next_steps: |
- On status=ok: call move_to_destination with source=<download folder> and
destination=season_folder.
- On status=needs_clarification: present the question and options to the
user, then re-call this tool with confirmed_folder set to the user's pick.
- On status=error: surface the message to the user; do not move anything.
parameters:
release_name:
description: Raw release folder name as it appears on disk.
why_needed: |
Drives extraction of quality/source/codec/group tokens — these are
embedded in the target folder name (Title.Year.Quality.Source.Codec-GROUP)
to make releases self-describing on the filesystem.
example: Oz.S03.1080p.WEBRip.x265-KONTRAST
tmdb_title:
description: Canonical show title from TMDB.
why_needed: |
Builds the title prefix of the folder name. Must come from TMDB to
avoid typos and variant spellings present in the raw release name.
example: Oz
tmdb_year:
description: Show start year from TMDB.
why_needed: |
Disambiguates shows that share a title across decades (e.g. multiple
remakes of "The Office") and locks the folder identity.
example: "1997"
confirmed_folder:
description: |
Folder name chosen by the user after a previous needs_clarification
response.
why_needed: |
Short-circuits the existing-folder detection and forces the use case
to use this exact folder name, even if it doesn't match the computed
one.
example: Oz.1997.1080p.WEBRip.x265-KONTRAST
returns:
ok:
description: Paths resolved unambiguously; ready to move.
fields:
series_folder: Absolute path to the series root folder.
season_folder: Absolute path to the season subfolder (move target).
series_folder_name: Just the series folder name, for display.
season_folder_name: Just the season folder name, for display.
is_new_series_folder: True if the series folder doesn't exist yet.
needs_clarification:
description: A folder already exists with a different name; ask the user.
fields:
question: Human-readable question for the user.
options: List of folder names the user can pick from.
error:
description: Resolution failed (config missing, invalid release name, etc.).
fields:
error: Short error code (e.g. library_not_set).
message: Human-readable explanation.
@@ -0,0 +1,77 @@
name: resolve_series_destination
summary: >
Compute the destination path for a complete multi-season series pack (folder move).
description: |
Resolves the target series folder for a pack that contains multiple seasons
(e.g. S01-S05 in a single release). Returns only the series folder — the
whole source folder is moved as-is into the library, no per-season
restructuring. If a folder with a different name already exists for this
show, returns needs_clarification.
when_to_use: |
Use after analyze_release has identified the release as a complete-series
pack (media_type=tv_complete, or multi-season indicators). TMDB must
already be queried for canonical title/year.
when_not_to_use: |
- Single-season packs: use resolve_season_destination.
- Single episodes: use resolve_episode_destination.
- Movies: use resolve_movie_destination.
next_steps: |
- On status=ok: call move_to_destination with source=<download folder> and
destination=series_folder.
- On status=needs_clarification: ask the user, re-call with
confirmed_folder set.
- On status=error: surface the message; do not move.
parameters:
release_name:
description: Raw release folder name as it appears on disk.
why_needed: |
Drives extraction of quality/source/codec/group tokens for the target
folder name, even though the multi-season structure inside is kept
as-is.
example: The.Wire.S01-S05.1080p.BluRay.x265-GROUP
tmdb_title:
description: Canonical show title from TMDB.
why_needed: |
Title prefix of the series folder; comes from TMDB to avoid raw
release-name spellings.
example: The Wire
tmdb_year:
description: Show start year from TMDB.
why_needed: |
Disambiguates shows that share a title across eras and locks the
folder identity.
example: "2002"
confirmed_folder:
description: Folder name chosen by the user after needs_clarification.
why_needed: |
Forces the use case to use this exact folder name and skip detection.
example: The.Wire.2002.1080p.BluRay.x265-GROUP
returns:
ok:
description: Path resolved; ready to move the pack.
fields:
series_folder: Absolute path to the destination series folder.
series_folder_name: Folder name for display.
is_new_series_folder: True if the folder doesn't exist yet.
needs_clarification:
description: A folder exists with a different name; ask the user.
fields:
question: Human-readable question.
options: List of folder names to pick from.
error:
description: Resolution failed.
fields:
error: Short error code.
message: Human-readable explanation.
@@ -0,0 +1,47 @@
name: set_language
summary: >
Set the conversation language so all subsequent assistant messages
match it.
description: |
Persists an ISO 639-1 language code in short-term memory under
conversation.language. Read by the prompt builder and any tool that
needs to localise output. Does not validate the code against an ISO
list — the LLM is trusted to pass a sensible value.
when_to_use: |
As the very first call when the user writes in a language different
from the current STM language. Doing it before answering avoids a
mid-reply switch.
when_not_to_use: |
- On every turn — only when the language actually changes.
- To pick a subtitle language — that lives in SubtitlePreferences,
not the conversation language.
next_steps: |
- After success: continue the user's request in the newly set
language.
parameters:
language:
description: ISO 639-1 language code (en, fr, es, de, ...).
why_needed: |
Identifies the target language unambiguously across the UI and
any localisation logic.
example: fr
returns:
ok:
description: Language saved.
fields:
status: "'ok'"
message: Confirmation message.
language: The language code that was saved.
error:
description: Could not save the language.
fields:
status: "'error'"
error: Short error code or exception message.
@@ -0,0 +1,58 @@
name: set_path_for_folder
summary: >
Configure where a known folder lives on disk (download, torrent, or
any library collection).
description: |
Stores an absolute path in long-term memory under a folder key. Two
classes of folders exist:
- Workspace paths: "download", "torrent" — single-valued each, used
by the organize workflows.
- Library paths: any other key (e.g. "movie", "tv_show",
"documentary") — these are the collections you organise into.
The path must exist and be a directory; otherwise the call fails
without changing memory.
when_to_use: |
On first run, or when the user moves a folder, or when introducing a
new library collection (e.g. "set the documentaries folder to ...").
when_not_to_use: |
- For one-off listings — list_folder works without configuration only
if the folder is already set.
- To rename or delete an existing folder — this only sets paths.
next_steps: |
- After success: typical follow-ups are list_folder on the same key,
or starting a workflow that needs the path.
parameters:
folder_name:
description: Logical name of the folder (download, torrent, movie, tv_show, ...).
why_needed: |
The key the agent uses everywhere afterwards. "download" and
"torrent" are reserved for workspace; anything else becomes a
library collection.
example: tv_show
path_value:
description: Absolute path to the folder on disk.
why_needed: |
Must exist and be readable. Stored verbatim in LTM — relative
paths are rejected.
example: /tank/library/tv_shows
returns:
ok:
description: Path saved to long-term memory.
fields:
status: "'ok'"
folder_name: The logical name that was set.
path_value: The absolute path that was saved.
error:
description: Could not set the path.
fields:
error: Short error code (path_not_found, not_a_directory, invalid_path, ...).
message: Human-readable explanation.
@@ -0,0 +1,64 @@
name: start_workflow
summary: >
Enter a workflow scope — narrows the visible tool catalog and gives the
agent a clear multi-step plan to follow.
description: |
Activates a named workflow defined in YAML under agent/workflows/.
Once active, only the workflow's declared tools (plus the core noyau)
are exposed to the LLM, which keeps the decision space small and
focused. The returned plan (description + steps) is the script the
agent should execute until end_workflow is called.
when_to_use: |
Use as the very first action whenever the user request maps to a
known workflow (e.g. "organize Breaking Bad" → media.organize_media).
Pass any parameters you already know (release name, target media,
flags) in 'params' so later steps can read them from STM.
when_not_to_use: |
- Do not start a workflow for purely conversational replies or
one-shot lookups that need a single tool call.
- Do not start a new workflow while one is already active — call
end_workflow first.
next_steps: |
- On status=ok: follow the returned 'steps' list, calling the tools
in order. The visible tool catalog has already been narrowed.
- On status=error (unknown_workflow): surface the available list to
the user and ask which one they meant.
- On status=error (workflow_already_active): either continue the
active workflow or call end_workflow first.
parameters:
workflow_name:
description: Fully-qualified name of the workflow to start (e.g. media.organize_media).
why_needed: |
Identifies which YAML definition to load. Names use the
'domain.action' convention (media.*, mail.*, ...).
example: media.organize_media
params:
description: Initial parameters to seed the workflow with (release name, target, flags).
why_needed: |
Later steps read these from STM instead of asking the user again.
Pass whatever you already extracted from the user's message.
example: '{"release_name": "Breaking.Bad.S01.1080p.BluRay.x265-GROUP", "keep_seeding": true}'
returns:
ok:
description: Workflow activated; catalog has been narrowed.
fields:
workflow: Name of the activated workflow.
description: Human-readable description of what the workflow does.
steps: Ordered list of steps to execute.
tools: Tools that are now visible (in addition to the core noyau).
error:
description: Could not activate the workflow.
fields:
error: Short error code (unknown_workflow, workflow_already_active).
message: Human-readable explanation.
available_workflows: List of valid workflow names (only on unknown_workflow).
active_workflow: Name of the currently active workflow (only on workflow_already_active).
+86
View File
@@ -0,0 +1,86 @@
"""Workflow scoping tools — start_workflow / end_workflow meta-tools.
These tools let the agent enter and leave a workflow scope. While a
workflow is active, the PromptBuilder narrows the visible tool catalog
to the noyau + the workflow's declared tools, so the LLM doesn't have
to reason over the full set.
"""
import logging
from typing import Any
from alfred.infrastructure.persistence import get_memory
from ..workflows import WorkflowLoader
logger = logging.getLogger(__name__)
_loader_cache: list[WorkflowLoader] = []
def _get_loader() -> WorkflowLoader:
"""Lazily build the module-level WorkflowLoader."""
if not _loader_cache:
_loader_cache.append(WorkflowLoader())
return _loader_cache[0]
def start_workflow(workflow_name: str, params: dict) -> dict[str, Any]:
"""See specs/start_workflow.yaml for full description."""
loader = _get_loader()
workflow = loader.get(workflow_name)
if workflow is None:
return {
"status": "error",
"error": "unknown_workflow",
"message": f"Workflow '{workflow_name}' not found",
"available_workflows": loader.names(),
}
memory = get_memory()
current = memory.stm.workflow.current
if current is not None:
return {
"status": "error",
"error": "workflow_already_active",
"message": (
f"Workflow '{current.get('name')}' is already active. "
"Call end_workflow before starting a new one."
),
"active_workflow": current.get("name"),
}
memory.stm.start_workflow(workflow_name, params or {})
memory.save()
logger.info(f"start_workflow: '{workflow_name}' with params={params}")
return {
"status": "ok",
"workflow": workflow_name,
"description": workflow.get("description", ""),
"steps": workflow.get("steps", []),
"tools": workflow.get("tools", []),
}
def end_workflow(reason: str) -> dict[str, Any]:
"""See specs/end_workflow.yaml for full description."""
memory = get_memory()
current = memory.stm.workflow.current
if current is None:
return {
"status": "error",
"error": "no_active_workflow",
"message": "No workflow is currently active.",
}
workflow_name = current.get("name")
memory.stm.end_workflow()
memory.save()
logger.info(f"end_workflow: '{workflow_name}' reason={reason!r}")
return {
"status": "ok",
"workflow": workflow_name,
"reason": reason,
}
+1 -1
View File
@@ -22,7 +22,7 @@ class WorkflowLoader:
Usage:
loader = WorkflowLoader()
all_workflows = loader.all()
workflow = loader.get("organize_media")
workflow = loader.get("media.organize_media")
"""
def __init__(self):
@@ -1,4 +1,4 @@
name: manage_subtitles
name: media.manage_subtitles
description: >
Place subtitle files alongside a video that has just been organised into the library.
Detects the release pattern automatically, identifies and classifies all tracks,
@@ -1,4 +1,4 @@
name: organize_media
name: media.organize_media
description: >
Organise a downloaded series or movie into the media library.
Triggered when the user asks to move/organize a specific title.
@@ -14,9 +14,14 @@ trigger:
tools:
- list_folder
- analyze_release
- probe_media
- find_media_imdb_id
- resolve_destination
- move_media
- resolve_season_destination
- resolve_episode_destination
- resolve_movie_destination
- resolve_series_destination
- move_to_destination
- manage_subtitles
- create_seed_links
@@ -34,22 +39,31 @@ steps:
params:
folder_type: download
- id: analyze
tool: analyze_release
description: >
Parse the release name to detect media_type (movie / tv_season /
tv_episode / tv_complete) and extract season/episode info.
- id: identify_media
tool: find_media_imdb_id
description: Confirm title, type (series/movie), and metadata via TMDB.
description: Confirm canonical title and year via TMDB.
- id: resolve_destination
tool: resolve_destination
description: >
Compute the correct destination path in the library.
Uses the release name + TMDB metadata to build folder and file names.
If multiple series folders exist for this title, returns
needs_clarification and the user must pick one (re-call with confirmed_folder).
Call the resolver that matches media_type from analyze_release:
movie → resolve_movie_destination
tv_season → resolve_season_destination
tv_episode → resolve_episode_destination
tv_complete → resolve_series_destination
If the resolver returns needs_clarification, ask the user and
re-call with confirmed_folder.
- id: move_file
tool: move_media
tool: move_to_destination
description: >
Move the video file to library_file returned by resolve_destination.
Move the video file/folder to the destination returned by the
resolver above.
- id: handle_subtitles
tool: manage_subtitles
@@ -63,7 +77,7 @@ steps:
question: "Do you want to keep seeding this torrent?"
answers:
"yes": { next_step: create_seed_links }
"no": { next_step: update_library }
"no": { next_step: end }
- id: create_seed_links
tool: create_seed_links
@@ -72,10 +86,6 @@ steps:
and copy all remaining files from the original download folder
(subs, nfo, jpg, …) so the torrent stays complete for seeding.
- id: update_library
memory_write: Library
description: Add the entry to the LTM library after a successful move.
naming_convention:
# Resolved by domain entities (Movie, Episode) — not hardcoded here
tv_show: "{title}/Season {season:02d}/{title}.S{season:02d}E{episode:02d}.{ext}"
+19 -1
View File
@@ -37,6 +37,21 @@ logger.info(f"Memory context initialized (path: {memory_path})")
llm_provider = settings.default_llm_provider.lower()
class _UnconfiguredLLM:
"""Placeholder LLM used when no provider could be configured at import time.
Importing the FastAPI app must not fail just because credentials are
absent (e.g. during test collection). Any actual call surfaces a clear
503 error at request time via the handlers below.
"""
def __init__(self, reason: str):
self.reason = reason
def complete(self, *args, **kwargs):
raise LLMAPIError(f"LLM is not configured: {self.reason}")
try:
if llm_provider == "local":
logger.info("Using local Ollama LLM")
@@ -49,8 +64,11 @@ try:
else:
raise ValueError(f"Unknown LLM provider: {llm_provider}")
except LLMConfigurationError as e:
# Degrade gracefully: keep the app importable so tests can patch agent.step
# and so missing credentials surface as a 503 at the endpoint, not as an
# import error.
logger.error(f"Failed to initialize LLM: {e}")
raise
llm = _UnconfiguredLLM(str(e))
# Initialize agent
agent = Agent(
+18 -3
View File
@@ -12,7 +12,16 @@ from .dto import (
from .list_folder import ListFolderUseCase
from .manage_subtitles import ManageSubtitlesUseCase
from .move_media import MoveMediaUseCase
from .resolve_destination import ResolveDestinationUseCase, ResolvedDestination
from .resolve_destination import (
ResolvedEpisodeDestination,
ResolvedMovieDestination,
ResolvedSeasonDestination,
ResolvedSeriesDestination,
resolve_episode_destination,
resolve_movie_destination,
resolve_season_destination,
resolve_series_destination,
)
from .set_folder_path import SetFolderPathUseCase
__all__ = [
@@ -21,8 +30,14 @@ __all__ = [
"CreateSeedLinksUseCase",
"MoveMediaUseCase",
"ManageSubtitlesUseCase",
"ResolveDestinationUseCase",
"ResolvedDestination",
"ResolvedSeasonDestination",
"ResolvedEpisodeDestination",
"ResolvedMovieDestination",
"ResolvedSeriesDestination",
"resolve_season_destination",
"resolve_episode_destination",
"resolve_movie_destination",
"resolve_series_destination",
"SetFolderPathResponse",
"ListFolderResponse",
"CreateSeedLinksResponse",
@@ -0,0 +1,69 @@
"""
detect_media_type — filesystem-based media type refinement.
Enriches a ParsedRelease.media_type with evidence from the actual source path
(file or folder). Called after parse_release() to produce a final classification.
Classification logic:
1. If source_path is a file — check its extension directly.
2. If source_path is a folder — collect all extensions inside (non-recursive
for the first level, then recursive if nothing conclusive found).
3. Decision:
- Any non_video extension AND no video extension → "other"
- Any video extension → keep parsed media_type ("movie" | "tv_show" | "unknown")
- No conclusive extension found → keep parsed media_type as-is
- Mixed (video + non_video) → "unknown"
"""
from __future__ import annotations
from pathlib import Path
from alfred.domain.release.value_objects import (
_METADATA_EXTENSIONS,
_NON_VIDEO_EXTENSIONS,
_VIDEO_EXTENSIONS,
ParsedRelease,
)
def detect_media_type(parsed: ParsedRelease, source_path: Path) -> str:
"""
Return a refined media_type string for the given source_path.
Does not mutate parsed — returns the new media_type value only.
The caller is responsible for updating the ParsedRelease if needed.
"""
extensions = _collect_extensions(source_path)
# Metadata extensions (.nfo, .srt, …) are always present alongside releases
# and must not influence the type decision.
conclusive = extensions - _METADATA_EXTENSIONS
has_video = bool(conclusive & _VIDEO_EXTENSIONS)
has_non_video = bool(conclusive & _NON_VIDEO_EXTENSIONS)
if has_video and has_non_video:
return "unknown"
if has_non_video and not has_video:
return "other"
if has_video:
return parsed.media_type # trust token-level inference
# No conclusive extension — trust token-level inference
return parsed.media_type
def _collect_extensions(path: Path) -> set[str]:
"""Return the set of lowercase extensions found at path (file or folder)."""
if not path.exists():
return set()
if path.is_file():
return {path.suffix.lower()}
# Folder — scan first level only
exts: set[str] = set()
for child in path.iterdir():
if child.is_file():
exts.add(child.suffix.lower())
return exts
+6 -2
View File
@@ -2,7 +2,7 @@
from __future__ import annotations
from dataclasses import dataclass, field
from dataclasses import dataclass
@dataclass
@@ -88,7 +88,11 @@ class PlacedSubtitle:
filename: str
def to_dict(self) -> dict:
return {"source": self.source, "destination": self.destination, "filename": self.filename}
return {
"source": self.source,
"destination": self.destination,
"filename": self.filename,
}
@dataclass
@@ -0,0 +1,82 @@
"""enrich_from_probe — fill missing ParsedRelease fields from MediaInfo."""
from __future__ import annotations
from alfred.domain.release.value_objects import ParsedRelease
from alfred.domain.shared.media import MediaInfo
# Map ffprobe codec names to scene-style codec tokens
_VIDEO_CODEC_MAP = {
"hevc": "x265",
"h264": "x264",
"h265": "x265",
"av1": "AV1",
"vp9": "VP9",
"mpeg4": "XviD",
}
# Map ffprobe audio codec names to scene-style tokens
_AUDIO_CODEC_MAP = {
"eac3": "EAC3",
"ac3": "AC3",
"dts": "DTS",
"truehd": "TrueHD",
"aac": "AAC",
"flac": "FLAC",
"opus": "OPUS",
"mp3": "MP3",
"pcm_s16l": "PCM",
"pcm_s24l": "PCM",
}
# Map channel count to standard layout string
_CHANNEL_MAP = {
8: "7.1",
6: "5.1",
2: "2.0",
1: "1.0",
}
def enrich_from_probe(parsed: ParsedRelease, info: MediaInfo) -> None:
"""
Fill None fields in parsed using data from ffprobe MediaInfo.
Only overwrites fields that are currently None — token-level values
from the release name always take priority.
Mutates parsed in place.
"""
if parsed.quality is None and info.resolution:
parsed.quality = info.resolution
if parsed.codec is None and info.video_codec:
parsed.codec = _VIDEO_CODEC_MAP.get(
info.video_codec.lower(), info.video_codec.upper()
)
if parsed.bit_depth is None and info.video_codec:
# ffprobe exposes bit depth via pix_fmt — not in MediaInfo yet, skip for now
pass
# Audio — use the default track, fallback to first
default_track = next((t for t in info.audio_tracks if t.is_default), None)
track = default_track or (info.audio_tracks[0] if info.audio_tracks else None)
if track:
if parsed.audio_codec is None and track.codec:
parsed.audio_codec = _AUDIO_CODEC_MAP.get(
track.codec.lower(), track.codec.upper()
)
if parsed.audio_channels is None and track.channels:
parsed.audio_channels = _CHANNEL_MAP.get(
track.channels, f"{track.channels}ch"
)
# Languages — merge ffprobe languages with token-level ones
# "und" = undetermined, not useful
if info.audio_languages:
existing = set(parsed.languages)
for lang in info.audio_languages:
if lang.lower() != "und" and lang.upper() not in existing:
parsed.languages.append(lang)
@@ -4,20 +4,29 @@ import logging
from pathlib import Path
from alfred.domain.shared.value_objects import ImdbId
from alfred.domain.subtitles.entities import SubtitleTrack
from alfred.domain.subtitles.entities import SubtitleCandidate
from alfred.domain.subtitles.knowledge.base import SubtitleKnowledgeBase
from alfred.domain.subtitles.knowledge.loader import KnowledgeLoader
from alfred.domain.subtitles.services.identifier import SubtitleIdentifier
from alfred.domain.subtitles.services.matcher import SubtitleMatcher
from alfred.domain.subtitles.services.pattern_detector import PatternDetector
from alfred.domain.subtitles.services.placer import PlacedTrack, SubtitlePlacer
from alfred.domain.subtitles.services.placer import (
PlacedTrack,
SubtitlePlacer,
_build_dest_name,
)
from alfred.domain.subtitles.services.utils import available_subtitles
from alfred.domain.subtitles.value_objects import ScanStrategy
from alfred.infrastructure.persistence.context import get_memory
from alfred.infrastructure.subtitle.metadata_store import SubtitleMetadataStore
from alfred.infrastructure.subtitle.rule_repository import RuleSetRepository
from .dto import AvailableSubtitle, ManageSubtitlesResponse, PlacedSubtitle, UnresolvedTrack
from .dto import (
AvailableSubtitle,
ManageSubtitlesResponse,
PlacedSubtitle,
UnresolvedTrack,
)
logger = logging.getLogger(__name__)
@@ -69,11 +78,12 @@ class ManageSubtitlesUseCase:
season: int | None = None,
episode: int | None = None,
confirmed_pattern_id: str | None = None,
dry_run: bool = False,
) -> ManageSubtitlesResponse:
source_path = Path(source_video)
dest_path = Path(destination_video)
if not source_path.exists():
if not source_path.exists() and not source_path.parent.exists():
return ManageSubtitlesResponse(
status="error",
error="source_not_found",
@@ -108,7 +118,9 @@ class ManageSubtitlesUseCase:
)
if metadata.total_count == 0:
logger.info(f"ManageSubtitles: no subtitle tracks found for {source_path.name}")
logger.info(
f"ManageSubtitles: no subtitle tracks found for {source_path.name}"
)
return ManageSubtitlesResponse(
status="ok",
video_path=destination_video,
@@ -164,6 +176,30 @@ class ManageSubtitlesUseCase:
skipped_count=metadata.total_count,
)
# --- Dry run: skip placement ---
if dry_run:
placed_dtos = []
for t in matched:
if not t.file_path:
continue
try:
filename = _build_dest_name(t, dest_path.stem)
except ValueError:
continue
placed_dtos.append(
PlacedSubtitle(
source=str(t.file_path),
destination=str(dest_path.parent / filename),
filename=filename,
)
)
return ManageSubtitlesResponse(
status="ok",
video_path=destination_video,
placed=placed_dtos,
skipped_count=0,
)
# --- Place ---
placer = SubtitlePlacer()
place_result = placer.place(matched, dest_path)
@@ -229,7 +265,9 @@ class ManageSubtitlesUseCase:
return kb.pattern("adjacent")
def _to_unresolved_dto(track: SubtitleTrack, min_confidence: float = 0.7) -> UnresolvedTrack:
def _to_unresolved_dto(
track: SubtitleCandidate, min_confidence: float = 0.7
) -> UnresolvedTrack:
reason = "unknown_language" if track.language is None else "low_confidence"
return UnresolvedTrack(
raw_tokens=track.raw_tokens,
@@ -241,10 +279,10 @@ def _to_unresolved_dto(track: SubtitleTrack, min_confidence: float = 0.7) -> Unr
def _pair_placed_with_tracks(
placed: list[PlacedTrack],
tracks: list[SubtitleTrack],
) -> list[tuple[PlacedTrack, SubtitleTrack]]:
tracks: list[SubtitleCandidate],
) -> list[tuple[PlacedTrack, SubtitleCandidate]]:
"""
Pair each PlacedTrack with its originating SubtitleTrack by source path.
Pair each PlacedTrack with its originating SubtitleCandidate by source path.
Falls back to positional matching if paths don't align.
"""
track_by_path = {t.file_path: t for t in tracks if t.file_path}
@@ -1,62 +1,131 @@
"""
ResolveDestinationUseCase — compute the library destination path for a release.
Destination resolution — compute library paths for releases.
Steps:
1. Parse the release name
2. Look up TMDB for title + year (+ episode title if single episode)
3. Scan the library for an existing series folder
4. Apply group-conflict rules
5. Return the computed paths (or needs_clarification if ambiguous)
Four distinct use cases, one per release type:
- resolve_season_destination : season pack (folder move)
- resolve_episode_destination : single episode (file move)
- resolve_movie_destination : movie (file move)
- resolve_series_destination : complete series multi-season pack (folder move)
Each returns a dedicated DTO with only the fields that make sense for that type.
"""
from __future__ import annotations
import logging
import re
from dataclasses import dataclass, field
from dataclasses import dataclass
from pathlib import Path
from alfred.domain.media.release_parser import ParsedRelease, parse_release
from alfred.domain.release import parse_release
from alfred.infrastructure.persistence import get_memory
logger = logging.getLogger(__name__)
# Characters forbidden on Windows filesystems (served via NFS)
_WIN_FORBIDDEN = re.compile(r'[?:*"<>|\\]')
def _sanitise(text: str) -> str:
def _sanitize(text: str) -> str:
return _WIN_FORBIDDEN.sub("", text)
def _find_existing_tvshow_folders(
tv_root: Path, tmdb_title: str, tmdb_year: int
) -> list[str]:
"""Return folder names in tv_root that match title + year prefix."""
if not tv_root.exists():
return []
clean_title = _sanitize(tmdb_title).replace(" ", ".")
prefix = f"{clean_title}.{tmdb_year}".lower()
return sorted(
entry.name
for entry in tv_root.iterdir()
if entry.is_dir() and entry.name.lower().startswith(prefix)
)
def _get_tv_root() -> Path | None:
memory = get_memory()
tv_root = memory.ltm.library_paths.get("tv_show")
return Path(tv_root) if tv_root else None
# ---------------------------------------------------------------------------
# Internal sentinel + series-folder resolver (shared by the 3 TV use cases)
# ---------------------------------------------------------------------------
@dataclass
class _Clarification:
"""Module-private sentinel signalling that user input is needed."""
question: str
options: list[str]
def _resolve_series_folder(
tv_root: Path,
tmdb_title: str,
tmdb_year: int,
computed_name: str,
confirmed_folder: str | None,
) -> tuple[str, bool] | _Clarification:
"""
Resolve which series folder to use.
Returns:
(folder_name, is_new) if resolved unambiguously,
_Clarification(question, options) if the caller must ask the user.
"""
if confirmed_folder:
return confirmed_folder, not (tv_root / confirmed_folder).exists()
existing = _find_existing_tvshow_folders(tv_root, tmdb_title, tmdb_year)
if not existing:
return computed_name, True
if len(existing) == 1 and existing[0] == computed_name:
return existing[0], False
options = existing + ([computed_name] if computed_name not in existing else [])
return _Clarification(
question=(
f"Un dossier série existe déjà pour '{tmdb_title}' "
f"mais son nom diffère du nom calculé ({computed_name}). "
f"Lequel utiliser ?"
),
options=options,
)
# ---------------------------------------------------------------------------
# DTOs
# ---------------------------------------------------------------------------
@dataclass
class ResolvedDestination:
"""All computed paths for a release, ready to hand to move_media."""
class _ResolvedDestinationBase:
"""
Shared shape across all resolution DTOs.
Holds the status flag and the fields used in non-ok states
(error / needs_clarification). Subclasses add their own ok-state fields
and a to_dict() that delegates the non-ok cases via _base_dict().
"""
status: str # "ok" | "needs_clarification" | "error"
# Populated on "ok"
library_file: str | None = None # absolute path of the destination video file
series_folder: str | None = None # absolute path of the series root folder
season_folder: str | None = None # absolute path of the season subfolder
series_folder_name: str | None = None # just the folder name (for display)
season_folder_name: str | None = None
filename: str | None = None
is_new_series_folder: bool = False # True if we're creating the folder
# Populated on "needs_clarification"
# needs_clarification
question: str | None = None
options: list[str] | None = None # existing group folder names to pick from
options: list[str] | None = None
# Populated on "error"
# error
error: str | None = None
message: str | None = None
def to_dict(self) -> dict:
def _base_dict(self) -> dict | None:
"""Return the dict for error/needs_clarification, or None for ok."""
if self.status == "error":
return {"status": self.status, "error": self.error, "message": self.message}
if self.status == "needs_clarification":
@@ -65,154 +134,194 @@ class ResolvedDestination:
"question": self.question,
"options": self.options or [],
}
return {
return None
@dataclass
class ResolvedSeasonDestination(_ResolvedDestinationBase):
"""Paths for a season pack — folder move, no individual file paths."""
series_folder: str | None = None
season_folder: str | None = None
series_folder_name: str | None = None
season_folder_name: str | None = None
is_new_series_folder: bool = False
def to_dict(self) -> dict:
return self._base_dict() or {
"status": self.status,
"library_file": self.library_file,
"series_folder": self.series_folder,
"season_folder": self.season_folder,
"series_folder_name": self.series_folder_name,
"season_folder_name": self.season_folder_name,
"is_new_series_folder": self.is_new_series_folder,
}
@dataclass
class ResolvedEpisodeDestination(_ResolvedDestinationBase):
"""Paths for a single episode — file move."""
series_folder: str | None = None
season_folder: str | None = None
library_file: str | None = None # full path to destination .mkv
series_folder_name: str | None = None
season_folder_name: str | None = None
filename: str | None = None
is_new_series_folder: bool = False
def to_dict(self) -> dict:
return self._base_dict() or {
"status": self.status,
"series_folder": self.series_folder,
"season_folder": self.season_folder,
"library_file": self.library_file,
"series_folder_name": self.series_folder_name,
"season_folder_name": self.season_folder_name,
"filename": self.filename,
"is_new_series_folder": self.is_new_series_folder,
}
@dataclass
class ResolvedMovieDestination(_ResolvedDestinationBase):
"""Paths for a movie — file move."""
movie_folder: str | None = None
library_file: str | None = None
movie_folder_name: str | None = None
filename: str | None = None
is_new_folder: bool = False
def to_dict(self) -> dict:
return self._base_dict() or {
"status": self.status,
"movie_folder": self.movie_folder,
"library_file": self.library_file,
"movie_folder_name": self.movie_folder_name,
"filename": self.filename,
"is_new_folder": self.is_new_folder,
}
@dataclass
class ResolvedSeriesDestination(_ResolvedDestinationBase):
"""Paths for a complete multi-season series pack — folder move."""
series_folder: str | None = None
series_folder_name: str | None = None
is_new_series_folder: bool = False
def to_dict(self) -> dict:
return self._base_dict() or {
"status": self.status,
"series_folder": self.series_folder,
"series_folder_name": self.series_folder_name,
"is_new_series_folder": self.is_new_series_folder,
}
# ---------------------------------------------------------------------------
# Use case
# Use cases
# ---------------------------------------------------------------------------
class ResolveDestinationUseCase:
def resolve_season_destination(
release_name: str,
tmdb_title: str,
tmdb_year: int,
confirmed_folder: str | None = None,
) -> ResolvedSeasonDestination:
"""
Compute the full destination path for a media file being organised.
Compute destination paths for a season pack.
The caller provides:
- release_name: the raw release folder/file name
- source_file: path to the actual video file (to get extension)
- tmdb_title: canonical title from TMDB
- tmdb_year: release year from TMDB
- tmdb_episode_title: episode title from TMDB (None for movies / season packs)
- confirmed_folder: if the user already answered needs_clarification, pass
the chosen folder name here to skip the check
Returns a ResolvedDestination.
Returns series_folder + season_folder. No file paths — the whole
source folder is moved as-is into season_folder.
"""
tv_root = _get_tv_root()
if not tv_root:
return ResolvedSeasonDestination(
status="error",
error="library_not_set",
message="TV show library path is not configured.",
)
def execute(
self,
parsed = parse_release(release_name)
computed_name = _sanitize(parsed.show_folder_name(tmdb_title, tmdb_year))
resolved = _resolve_series_folder(
tv_root, tmdb_title, tmdb_year, computed_name, confirmed_folder
)
if isinstance(resolved, _Clarification):
return ResolvedSeasonDestination(
status="needs_clarification",
question=resolved.question,
options=resolved.options,
)
series_folder_name, is_new = resolved
season_folder_name = parsed.season_folder_name()
series_path = tv_root / series_folder_name
season_path = series_path / season_folder_name
return ResolvedSeasonDestination(
status="ok",
series_folder=str(series_path),
season_folder=str(season_path),
series_folder_name=series_folder_name,
season_folder_name=season_folder_name,
is_new_series_folder=is_new,
)
def resolve_episode_destination(
release_name: str,
source_file: str,
tmdb_title: str,
tmdb_year: int,
tmdb_episode_title: str | None = None,
confirmed_folder: str | None = None,
) -> ResolvedDestination:
parsed = parse_release(release_name)
ext = Path(source_file).suffix # ".mkv"
) -> ResolvedEpisodeDestination:
"""
Compute destination paths for a single episode file.
if parsed.is_movie:
return self._resolve_movie(parsed, tmdb_title, tmdb_year, ext)
return self._resolve_tvshow(
parsed, tmdb_title, tmdb_year, tmdb_episode_title, ext, confirmed_folder
)
# ------------------------------------------------------------------
# Movie
# ------------------------------------------------------------------
def _resolve_movie(
self, parsed: ParsedRelease, tmdb_title: str, tmdb_year: int, ext: str
) -> ResolvedDestination:
memory = get_memory()
movies_root = memory.ltm.library_paths.get("movie")
if not movies_root:
return ResolvedDestination(
status="error",
error="library_not_set",
message="Movie library path is not configured.",
)
folder_name = _sanitise(parsed.movie_folder_name(tmdb_title, tmdb_year))
filename = _sanitise(parsed.movie_filename(tmdb_title, tmdb_year, ext))
folder_path = Path(movies_root) / folder_name
file_path = folder_path / filename
return ResolvedDestination(
status="ok",
library_file=str(file_path),
series_folder=str(folder_path),
series_folder_name=folder_name,
filename=filename,
is_new_series_folder=not folder_path.exists(),
)
# ------------------------------------------------------------------
# TV show
# ------------------------------------------------------------------
def _resolve_tvshow(
self,
parsed: ParsedRelease,
tmdb_title: str,
tmdb_year: int,
tmdb_episode_title: str | None,
ext: str,
confirmed_folder: str | None,
) -> ResolvedDestination:
memory = get_memory()
tv_root = memory.ltm.library_paths.get("tv_show")
Returns series_folder + season_folder + library_file (full path to .mkv).
"""
tv_root = _get_tv_root()
if not tv_root:
return ResolvedDestination(
return ResolvedEpisodeDestination(
status="error",
error="library_not_set",
message="TV show library path is not configured.",
)
tv_root_path = Path(tv_root)
parsed = parse_release(release_name)
ext = Path(source_file).suffix
computed_name = _sanitize(parsed.show_folder_name(tmdb_title, tmdb_year))
# --- Find existing series folders for this title ---
existing = _find_existing_series_folders(tv_root_path, tmdb_title, tmdb_year)
# --- Determine series folder name ---
if confirmed_folder:
series_folder_name = confirmed_folder
is_new = not (tv_root_path / confirmed_folder).exists()
elif len(existing) == 0:
# No existing folder — create with release group
series_folder_name = _sanitise(parsed.show_folder_name(tmdb_title, tmdb_year))
is_new = True
elif len(existing) == 1:
# Exactly one match — use it regardless of group
series_folder_name = existing[0]
is_new = False
else:
# Multiple folders — ask user
return ResolvedDestination(
resolved = _resolve_series_folder(
tv_root, tmdb_title, tmdb_year, computed_name, confirmed_folder
)
if isinstance(resolved, _Clarification):
return ResolvedEpisodeDestination(
status="needs_clarification",
question=(
f"Multiple folders found for '{tmdb_title}' in your library. "
f"Which one should I use for this release ({parsed.group})?"
),
options=existing,
question=resolved.question,
options=resolved.options,
)
# --- Build paths ---
series_folder_name, is_new = resolved
season_folder_name = parsed.season_folder_name()
filename = _sanitise(
parsed.episode_filename(tmdb_episode_title, ext)
if not parsed.is_season_pack
else parsed.season_folder_name() + ext
)
filename = _sanitize(parsed.episode_filename(tmdb_episode_title, ext))
series_path = tv_root_path / series_folder_name
series_path = tv_root / series_folder_name
season_path = series_path / season_folder_name
file_path = season_path / filename
return ResolvedDestination(
return ResolvedEpisodeDestination(
status="ok",
library_file=str(file_path),
series_folder=str(series_path),
season_folder=str(season_path),
library_file=str(file_path),
series_folder_name=series_folder_name,
season_folder_name=season_folder_name,
filename=filename,
@@ -220,27 +329,83 @@ class ResolveDestinationUseCase:
)
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _find_existing_series_folders(tv_root: Path, tmdb_title: str, tmdb_year: int) -> list[str]:
def resolve_movie_destination(
release_name: str,
source_file: str,
tmdb_title: str,
tmdb_year: int,
) -> ResolvedMovieDestination:
"""
Return names of folders in tv_root that match the given title + year.
Compute destination paths for a movie file.
Matching is loose: normalised title (dots, no special chars) + year must
appear at the start of the folder name.
Returns movie_folder + library_file (full path to .mkv).
"""
if not tv_root.exists():
return []
memory = get_memory()
movies_root = memory.ltm.library_paths.get("movie")
if not movies_root:
return ResolvedMovieDestination(
status="error",
error="library_not_set",
message="Movie library path is not configured.",
)
# Build a normalised prefix to match against: "Oz.1997"
clean_title = _sanitise(tmdb_title).replace(" ", ".")
prefix = f"{clean_title}.{tmdb_year}".lower()
parsed = parse_release(release_name)
ext = Path(source_file).suffix
matches = []
for entry in tv_root.iterdir():
if entry.is_dir() and entry.name.lower().startswith(prefix):
matches.append(entry.name)
folder_name = _sanitize(parsed.movie_folder_name(tmdb_title, tmdb_year))
filename = _sanitize(parsed.movie_filename(tmdb_title, tmdb_year, ext))
return sorted(matches)
folder_path = Path(movies_root) / folder_name
file_path = folder_path / filename
return ResolvedMovieDestination(
status="ok",
movie_folder=str(folder_path),
library_file=str(file_path),
movie_folder_name=folder_name,
filename=filename,
is_new_folder=not folder_path.exists(),
)
def resolve_series_destination(
release_name: str,
tmdb_title: str,
tmdb_year: int,
confirmed_folder: str | None = None,
) -> ResolvedSeriesDestination:
"""
Compute destination path for a complete multi-season series pack.
Returns only series_folder — the whole pack lands directly inside it.
"""
tv_root = _get_tv_root()
if not tv_root:
return ResolvedSeriesDestination(
status="error",
error="library_not_set",
message="TV show library path is not configured.",
)
parsed = parse_release(release_name)
computed_name = _sanitize(parsed.show_folder_name(tmdb_title, tmdb_year))
resolved = _resolve_series_folder(
tv_root, tmdb_title, tmdb_year, computed_name, confirmed_folder
)
if isinstance(resolved, _Clarification):
return ResolvedSeriesDestination(
status="needs_clarification",
question=resolved.question,
options=resolved.options,
)
series_folder_name, is_new = resolved
series_path = tv_root / series_folder_name
return ResolvedSeriesDestination(
status="ok",
series_folder=str(series_path),
series_folder_name=series_folder_name,
is_new_series_folder=is_new,
)
-5
View File
@@ -1,5 +0,0 @@
"""Media domain — shared naming and release parsing."""
from .release_parser import ParsedRelease, parse_release
__all__ = ["ParsedRelease", "parse_release"]
-306
View File
@@ -1,306 +0,0 @@
"""
release_parser.py — Parse a release name into structured components.
Handles both dot-separated and space-separated release names:
Oz.S03.1080p.WEBRip.x265-KONTRAST
Oz S03 1080p WEBRip x265-KONTRAST
Inception.2010.1080p.BluRay.x265-GROUP
"""
from __future__ import annotations
import re
from dataclasses import dataclass, field
# Known quality tokens
_QUALITIES = {"2160p", "1080p", "720p", "480p", "576p", "4k", "8k"}
# Known source tokens (case-insensitive match)
_SOURCES = {
"bluray", "blu-ray", "bdrip", "brrip",
"webrip", "web-rip", "webdl", "web-dl", "web",
"hdtv", "hdrip", "dvdrip", "dvd", "vodrip",
"amzn", "nf", "dsnp", "hmax", "atvp",
}
# Known codec tokens
_CODECS = {
"x264", "x265", "h264", "h265", "hevc", "avc",
"xvid", "divx", "av1", "vp9",
"h.264", "h.265",
}
# Windows-forbidden characters (we strip these from display names)
_WIN_FORBIDDEN = re.compile(r'[?:*"<>|\\]')
# Episode/season pattern: S01, S01E02, S01E02E03, 1x02, etc.
_SEASON_EP_RE = re.compile(
r"S(\d{1,2})(?:E(\d{2})(?:E(\d{2}))?)?",
re.IGNORECASE,
)
# Year pattern
_YEAR_RE = re.compile(r"\b(19\d{2}|20\d{2})\b")
@dataclass
class ParsedRelease:
"""Structured representation of a parsed release name."""
raw: str # original release name (untouched)
normalised: str # dots instead of spaces
title: str # show/movie title (dots, no year/season/tech)
year: int | None # movie year or show start year (from TMDB)
season: int | None # season number (None for movies)
episode: int | None # first episode number (None if season-pack)
episode_end: int | None # last episode for multi-ep (None otherwise)
quality: str | None # 1080p, 2160p, …
source: str | None # WEBRip, BluRay, …
codec: str | None # x265, HEVC, …
group: str # release group, "UNKNOWN" if missing
tech_string: str # quality.source.codec joined with dots
# -------------------------------------------------------------------------
# Derived helpers
# -------------------------------------------------------------------------
@property
def is_movie(self) -> bool:
return self.season is None
@property
def is_season_pack(self) -> bool:
return self.season is not None and self.episode is None
def show_folder_name(self, tmdb_title: str, tmdb_year: int) -> str:
"""
Build the series root folder name.
Format: {Title}.{Year}.{Tech}-{Group}
Example: Oz.1997.1080p.WEBRip.x265-KONTRAST
"""
title_part = _sanitise_for_fs(tmdb_title).replace(" ", ".")
tech = self.tech_string or "Unknown"
return f"{title_part}.{tmdb_year}.{tech}-{self.group}"
def season_folder_name(self) -> str:
"""
Build the season subfolder name = normalised release name (no episode).
Example: Oz.S03.1080p.WEBRip.x265-KONTRAST
For a single-episode release we still strip the episode token so the
folder can hold the whole season.
"""
return _strip_episode_from_normalised(self.normalised)
def episode_filename(self, tmdb_episode_title: str | None, ext: str) -> str:
"""
Build the episode filename.
Format: {Title}.{SxxExx}.{EpisodeTitle}.{Tech}-{Group}.{ext}
Example: Oz.S01E01.The.Routine.1080p.WEBRip.x265-KONTRAST.mkv
If tmdb_episode_title is None, omits the episode title segment.
"""
title_part = _sanitise_for_fs(self.title) # already dotted from normalised
s = f"S{self.season:02d}" if self.season is not None else ""
e = f"E{self.episode:02d}" if self.episode is not None else ""
se = s + e
ep_title = ""
if tmdb_episode_title:
ep_title = "." + _sanitise_for_fs(tmdb_episode_title).replace(" ", ".")
tech = self.tech_string or "Unknown"
ext_clean = ext.lstrip(".")
return f"{title_part}.{se}{ep_title}.{tech}-{self.group}.{ext_clean}"
def movie_folder_name(self, tmdb_title: str, tmdb_year: int) -> str:
"""
Build the movie folder name.
Format: {Title}.{Year}.{Tech}-{Group}
Example: Inception.2010.1080p.BluRay.x265-GROUP
"""
return self.show_folder_name(tmdb_title, tmdb_year)
def movie_filename(self, tmdb_title: str, tmdb_year: int, ext: str) -> str:
"""
Build the movie filename (same as folder name + extension).
Example: Inception.2010.1080p.BluRay.x265-GROUP.mkv
"""
ext_clean = ext.lstrip(".")
return f"{self.movie_folder_name(tmdb_title, tmdb_year)}.{ext_clean}"
# ---------------------------------------------------------------------------
# Public API
# ---------------------------------------------------------------------------
def parse_release(name: str) -> ParsedRelease:
"""
Parse a release name and return a ParsedRelease.
Accepts both dot-separated and space-separated names.
"""
normalised = _normalise(name)
tokens = normalised.split(".")
season, episode, episode_end = _extract_season_episode(tokens)
quality, source, codec, group, tech_tokens = _extract_tech(tokens)
title = _extract_title(tokens, season, episode, tech_tokens)
year = _extract_year(tokens, title)
tech_parts = [p for p in [quality, source, codec] if p]
tech_string = ".".join(tech_parts)
return ParsedRelease(
raw=name,
normalised=normalised,
title=title,
year=year,
season=season,
episode=episode,
episode_end=episode_end,
quality=quality,
source=source,
codec=codec,
group=group,
tech_string=tech_string,
)
# ---------------------------------------------------------------------------
# Internal helpers
# ---------------------------------------------------------------------------
def _normalise(name: str) -> str:
"""Replace spaces with dots, collapse multiple dots."""
s = name.replace(" ", ".")
s = re.sub(r"\.{2,}", ".", s)
return s.strip(".")
def _sanitise_for_fs(text: str) -> str:
"""Remove Windows-forbidden characters from a string."""
return _WIN_FORBIDDEN.sub("", text)
def _extract_season_episode(tokens: list[str]) -> tuple[int | None, int | None, int | None]:
joined = ".".join(tokens)
m = _SEASON_EP_RE.search(joined)
if not m:
return None, None, None
season = int(m.group(1))
episode = int(m.group(2)) if m.group(2) else None
episode_end = int(m.group(3)) if m.group(3) else None
return season, episode, episode_end
def _extract_tech(
tokens: list[str],
) -> tuple[str | None, str | None, str | None, str, set[str]]:
"""
Extract quality, source, codec, group from tokens.
Returns (quality, source, codec, group, tech_token_set).
Group extraction strategy (in priority order):
1. Token where prefix is a known codec: x265-GROUP
2. Last token in the list that contains a dash (fallback for 10bit-GROUP, AAC5.1-GROUP, etc.)
"""
quality: str | None = None
source: str | None = None
codec: str | None = None
group = "UNKNOWN"
tech_tokens: set[str] = set()
for tok in tokens:
tl = tok.lower()
if tl in _QUALITIES:
quality = tok
tech_tokens.add(tok)
continue
if tl in _SOURCES:
source = tok
tech_tokens.add(tok)
continue
if "-" in tok:
parts = tok.rsplit("-", 1)
# codec-GROUP (highest priority for group)
if parts[0].lower() in _CODECS:
codec = parts[0]
group = parts[1] if parts[1] else "UNKNOWN"
tech_tokens.add(tok)
continue
# source with dash: Web-DL, WEB-DL, etc.
if parts[0].lower() in _SOURCES or tok.lower().replace("-", "") in _SOURCES:
source = tok
tech_tokens.add(tok)
continue
if tl in _CODECS:
codec = tok
tech_tokens.add(tok)
# Fallback: if group still UNKNOWN, use the rightmost token with a dash
# that isn't a known source (handles "10bit-Protozoan", "AAC5.1-YTS", etc.)
if group == "UNKNOWN":
for tok in reversed(tokens):
if "-" in tok:
parts = tok.rsplit("-", 1)
tl = tok.lower()
if tl in _SOURCES or tok.lower().replace("-", "") in _SOURCES:
continue
if parts[1]: # non-empty group part
group = parts[1]
break
return quality, source, codec, group, tech_tokens
def _extract_title(tokens: list[str], season: int | None, episode: int | None, tech_tokens: set[str]) -> str:
"""
Extract the title portion: everything before the first season/year/tech token.
"""
title_parts = []
for tok in tokens:
# Stop at season token
if _SEASON_EP_RE.match(tok):
break
# Stop at year
if _YEAR_RE.fullmatch(tok):
break
# Stop at tech tokens
if tok in tech_tokens or tok.lower() in _QUALITIES | _SOURCES | _CODECS:
break
# Stop if token contains a dash (likely codec-GROUP)
if "-" in tok and any(p.lower() in _CODECS | _SOURCES for p in tok.split("-")):
break
title_parts.append(tok)
return ".".join(title_parts) if title_parts else tokens[0]
def _extract_year(tokens: list[str], title: str) -> int | None:
"""Extract a 4-digit year from tokens (only after the title)."""
title_len = len(title.split("."))
for tok in tokens[title_len:]:
m = _YEAR_RE.fullmatch(tok)
if m:
return int(m.group(1))
return None
def _strip_episode_from_normalised(normalised: str) -> str:
"""
Remove all episode parts (Exx) from a normalised release name, keeping Sxx.
Oz.S03E01.1080p... → Oz.S03.1080p...
Archer.S14E09E10E11.1080p... → Archer.S14.1080p...
"""
return re.sub(r"(S\d{2})(E\d{2})+", r"\1", normalised, flags=re.IGNORECASE)
-2
View File
@@ -2,7 +2,6 @@
from .entities import Movie
from .exceptions import InvalidMovieData, MovieNotFound
from .services import MovieService
from .value_objects import MovieTitle, Quality, ReleaseYear
__all__ = [
@@ -12,5 +11,4 @@ __all__ = [
"Quality",
"MovieNotFound",
"InvalidMovieData",
"MovieService",
]
+48 -3
View File
@@ -3,16 +3,23 @@
from dataclasses import dataclass, field
from datetime import datetime
from ..shared.value_objects import FilePath, FileSize, ImdbId
from ..shared.media import AudioTrack, SubtitleTrack, track_lang_matches
from ..shared.value_objects import FilePath, FileSize, ImdbId, Language
from .value_objects import MovieTitle, Quality, ReleaseYear
@dataclass
class Movie:
"""
Movie entity representing a movie in the media library.
Movie aggregate root for the movies domain.
This is the main aggregate root for the movies domain.
Carries file metadata (path, size) and the tracks discovered by the
ffprobe + subtitle scan pipeline. The track lists may be empty when the
movie is known but not yet scanned, or when no file is downloaded.
Track helpers follow the same "C+" contract as ``Episode``: pass a
``Language`` for cross-format matching, or a ``str`` for case-insensitive
direct comparison.
"""
imdb_id: ImdbId
@@ -23,6 +30,8 @@ class Movie:
file_size: FileSize | None = None
tmdb_id: int | None = None
added_at: datetime = field(default_factory=datetime.now)
audio_tracks: list[AudioTrack] = field(default_factory=list)
subtitle_tracks: list[SubtitleTrack] = field(default_factory=list)
def __post_init__(self):
"""Validate movie entity."""
@@ -52,6 +61,42 @@ class Movie:
"""Check if the movie is downloaded (has a file)."""
return self.has_file()
# ── Audio helpers ──────────────────────────────────────────────────────
def has_audio_in(self, lang: str | Language) -> bool:
"""True if at least one audio track is in the given language."""
return any(track_lang_matches(t.language, lang) for t in self.audio_tracks)
def audio_languages(self) -> list[str]:
"""Unique audio languages across all tracks, in track order."""
seen: set[str] = set()
result: list[str] = []
for t in self.audio_tracks:
if t.language and t.language not in seen:
seen.add(t.language)
result.append(t.language)
return result
# ── Subtitle helpers ───────────────────────────────────────────────────
def has_subtitles_in(self, lang: str | Language) -> bool:
"""True if at least one subtitle track is in the given language."""
return any(track_lang_matches(t.language, lang) for t in self.subtitle_tracks)
def has_forced_subs(self) -> bool:
"""True if at least one subtitle track is flagged as forced."""
return any(t.is_forced for t in self.subtitle_tracks)
def subtitle_languages(self) -> list[str]:
"""Unique subtitle languages across all tracks, in track order."""
seen: set[str] = set()
result: list[str] = []
for t in self.subtitle_tracks:
if t.language and t.language not in seen:
seen.add(t.language)
result.append(t.language)
return result
def get_folder_name(self) -> str:
"""
Get the folder name for this movie.
-192
View File
@@ -1,192 +0,0 @@
"""Movie domain services - Business logic."""
import logging
import re
from ..shared.value_objects import FilePath, ImdbId
from .entities import Movie
from .exceptions import MovieAlreadyExists, MovieNotFound
from .repositories import MovieRepository
from .value_objects import Quality
logger = logging.getLogger(__name__)
class MovieService:
"""
Domain service for movie-related business logic.
This service contains business rules that don't naturally fit
within a single entity.
"""
def __init__(self, repository: MovieRepository):
"""
Initialize movie service.
Args:
repository: Movie repository for persistence
"""
self.repository = repository
def add_movie(self, movie: Movie) -> None:
"""
Add a new movie to the library.
Args:
movie: Movie entity to add
Raises:
MovieAlreadyExists: If movie with same IMDb ID already exists
"""
if self.repository.exists(movie.imdb_id):
raise MovieAlreadyExists(
f"Movie with IMDb ID {movie.imdb_id} already exists"
)
self.repository.save(movie)
logger.info(f"Added movie: {movie.title.value} ({movie.imdb_id})")
def get_movie(self, imdb_id: ImdbId) -> Movie:
"""
Get a movie by IMDb ID.
Args:
imdb_id: IMDb ID of the movie
Returns:
Movie entity
Raises:
MovieNotFound: If movie not found
"""
movie = self.repository.find_by_imdb_id(imdb_id)
if not movie:
raise MovieNotFound(f"Movie with IMDb ID {imdb_id} not found")
return movie
def get_all_movies(self) -> list[Movie]:
"""
Get all movies in the library.
Returns:
List of all movies
"""
return self.repository.find_all()
def update_movie(self, movie: Movie) -> None:
"""
Update an existing movie.
Args:
movie: Movie entity with updated data
Raises:
MovieNotFound: If movie doesn't exist
"""
if not self.repository.exists(movie.imdb_id):
raise MovieNotFound(f"Movie with IMDb ID {movie.imdb_id} not found")
self.repository.save(movie)
logger.info(f"Updated movie: {movie.title.value} ({movie.imdb_id})")
def remove_movie(self, imdb_id: ImdbId) -> None:
"""
Remove a movie from the library.
Args:
imdb_id: IMDb ID of the movie to remove
Raises:
MovieNotFound: If movie not found
"""
if not self.repository.delete(imdb_id):
raise MovieNotFound(f"Movie with IMDb ID {imdb_id} not found")
logger.info(f"Removed movie with IMDb ID: {imdb_id}")
def detect_quality_from_filename(self, filename: str) -> Quality:
"""
Detect video quality from filename.
Args:
filename: Filename to analyze
Returns:
Detected quality or UNKNOWN
"""
filename_lower = filename.lower()
# Check for quality indicators
if "2160p" in filename_lower or "4k" in filename_lower:
return Quality.UHD_4K
elif "1080p" in filename_lower:
return Quality.FULL_HD
elif "720p" in filename_lower:
return Quality.HD
elif "480p" in filename_lower:
return Quality.SD
return Quality.UNKNOWN
def extract_year_from_filename(self, filename: str) -> int | None:
"""
Extract release year from filename.
Args:
filename: Filename to analyze
Returns:
Year if found, None otherwise
"""
# Look for 4-digit year in parentheses or standalone
# Examples: "Movie (2010)", "Movie.2010.1080p"
patterns = [
r"\((\d{4})\)", # (2010)
r"\.(\d{4})\.", # .2010.
r"\s(\d{4})\s", # 2010
]
for pattern in patterns:
match = re.search(pattern, filename)
if match:
year = int(match.group(1))
# Validate year is reasonable
if 1888 <= year <= 2100:
return year
return None
def validate_movie_file(self, file_path: FilePath) -> bool:
"""
Validate that a file is a valid movie file.
Args:
file_path: Path to the file
Returns:
True if valid movie file, False otherwise
"""
if not file_path.exists():
logger.warning(f"File does not exist: {file_path}")
return False
if not file_path.is_file():
logger.warning(f"Path is not a file: {file_path}")
return False
# Check file extension
valid_extensions = {".mkv", ".mp4", ".avi", ".mov", ".wmv", ".flv", ".webm"}
if file_path.value.suffix.lower() not in valid_extensions:
logger.warning(f"Invalid file extension: {file_path.value.suffix}")
return False
# Check file size (should be at least 100 MB for a movie)
min_size = 100 * 1024 * 1024 # 100 MB
if file_path.value.stat().st_size < min_size:
logger.warning(
f"File too small to be a movie: {file_path.value.stat().st_size} bytes"
)
return False
return True
+3 -7
View File
@@ -1,10 +1,10 @@
"""Movie domain value objects."""
import re
from dataclasses import dataclass
from enum import Enum
from ..shared.exceptions import ValidationError
from ..shared.value_objects import to_dot_folder_name
class Quality(Enum):
@@ -17,7 +17,7 @@ class Quality(Enum):
UNKNOWN = "unknown"
@classmethod
def from_string(cls, quality_str: str) -> "Quality":
def from_string(cls, quality_str: str) -> Quality:
"""
Parse quality from string.
@@ -67,11 +67,7 @@ class MovieTitle:
Removes special characters and replaces spaces with dots.
"""
# Remove special characters except spaces, dots, and hyphens
cleaned = re.sub(r"[^\w\s\.\-]", "", self.value)
# Replace spaces with dots
normalized = cleaned.replace(" ", ".")
return normalized
return to_dot_folder_name(self.value)
def __str__(self) -> str:
return self.value
+6
View File
@@ -0,0 +1,6 @@
"""Release domain — release name parsing and naming conventions."""
from .services import parse_release
from .value_objects import ParsedRelease
__all__ = ["ParsedRelease", "parse_release"]
+140
View File
@@ -0,0 +1,140 @@
"""Release knowledge loader.
Three-layer merge (lowest → highest priority):
1. Builtin — alfred/knowledge/release/
2. Sites — alfred/knowledge/release/sites/*.yaml (all trackers)
3. Learned — data/knowledge/release/ (user additions via the learn tool)
Lists are extended additively, scalars from higher layers win.
"""
from pathlib import Path
import yaml
import alfred as _alfred_pkg
_BUILTIN_ROOT = Path(_alfred_pkg.__file__).parent / "knowledge" / "release"
_SITES_ROOT = _BUILTIN_ROOT / "sites"
_LEARNED_ROOT = (
Path(_alfred_pkg.__file__).parent.parent / "data" / "knowledge" / "release"
)
def _merge(base: dict, overlay: dict) -> dict:
"""Merge overlay into base — lists are extended, scalars from overlay win."""
result = dict(base)
for key, val in overlay.items():
if key in result and isinstance(result[key], list) and isinstance(val, list):
result[key] = result[key] + [v for v in val if v not in result[key]]
else:
result[key] = val
return result
def _read(path: Path) -> dict:
try:
with open(path, encoding="utf-8") as f:
return yaml.safe_load(f) or {}
except FileNotFoundError:
return {}
def _load(filename: str) -> dict:
result = _read(_BUILTIN_ROOT / filename)
result = _merge(result, _read(_LEARNED_ROOT / filename))
return result
def _load_sites() -> dict:
"""Merge all site YAML files into a single dict."""
result: dict = {}
for site_file in sorted(_SITES_ROOT.glob("*.yaml")):
result = _merge(result, _read(site_file))
return result
def load_resolutions() -> set[str]:
return set(_load("resolutions.yaml").get("resolutions", []))
def load_sources() -> set[str]:
return set(_load("sources.yaml").get("sources", []))
def load_codecs() -> set[str]:
return set(_load("codecs.yaml").get("codecs", []))
def load_win_forbidden_chars() -> list[str]:
return _load("filesystem.yaml").get("win_forbidden_chars", [])
def load_video_extensions() -> set[str]:
return set(_load("file_extensions.yaml").get("video", []))
def load_non_video_extensions() -> set[str]:
return set(_load("file_extensions.yaml").get("non_video", []))
def load_metadata_extensions() -> set[str]:
return set(_load("file_extensions.yaml").get("metadata", []))
def load_subtitle_extensions() -> set[str]:
return set(_load("file_extensions.yaml").get("subtitle", []))
def load_forbidden_chars() -> set[str]:
return set(_load("release_format.yaml").get("forbidden_chars", []))
def load_language_tokens() -> set[str]:
base = {t.upper() for t in _load("languages.yaml").get("tokens", [])}
sites = {t.upper() for t in _load_sites().get("languages", [])}
return base | sites
def load_audio() -> dict:
return _load("audio.yaml")
def load_video() -> dict:
return _load("video.yaml")
def load_editions() -> dict:
base = _load("editions.yaml")
site_tokens = _load_sites().get("editions", {}).get("tokens", [])
if site_tokens:
existing = base.get("tokens", [])
base["tokens"] = existing + [t for t in site_tokens if t not in existing]
return base
def load_sources_extra() -> set[str]:
"""Additional source tokens from site files."""
return set(_load_sites().get("sources", []))
def load_hdr_extra() -> set[str]:
"""Additional HDR tokens from site files."""
return {t.upper() for t in _load_sites().get("hdr", [])}
def load_media_type_tokens() -> dict:
"""Site-specific media type tokens (doc, concert, collection, integrale)."""
return _load_sites().get("media_type_tokens", {})
def load_separators() -> list[str]:
"""Single-char token separators used by the release name tokenizer.
Always includes the canonical "." even if absent from YAML, to prevent a
misconfigured file from breaking the parser entirely.
"""
seps = _load("separators.yaml").get("separators", []) or []
if "." not in seps:
seps = [".", *seps]
return seps
+506
View File
@@ -0,0 +1,506 @@
"""Release domain — parsing service."""
from __future__ import annotations
import re
from .knowledge import load_separators
from .value_objects import (
_AUDIO,
_CODECS,
_EDITIONS,
_FORBIDDEN_CHARS,
_HDR_EXTRA,
_LANGUAGE_TOKENS,
_MEDIA_TYPE_TOKENS,
_RESOLUTIONS,
_SOURCES,
_VIDEO_META,
ParsedRelease,
)
def _tokenize(name: str) -> list[str]:
"""Split a release name on the configured separators, dropping empty tokens."""
pattern = "[" + re.escape("".join(load_separators())) + "]+"
return [t for t in re.split(pattern, name) if t]
def parse_release(name: str) -> ParsedRelease:
"""
Parse a release name and return a ParsedRelease.
Flow:
1. Strip a leading/trailing [site.tag] if present (sets parse_path="sanitized").
2. Check the remainder for truly forbidden chars (anything not in the
configured separators list). If any remain → media_type="unknown",
parse_path="ai", and the LLM handles it.
3. Tokenize using the configured separators (".", " ", "[", "]", "(", ")", "_", ...)
and run token-level matchers (season/episode, tech, languages, audio,
video, edition, title, year).
"""
parse_path = "direct"
# Always try to extract a bracket-enclosed site tag first.
clean, site_tag = _strip_site_tag(name)
if site_tag is not None:
parse_path = "sanitized"
if not _is_well_formed(clean):
return ParsedRelease(
raw=name,
normalised=clean,
title=clean,
year=None,
season=None,
episode=None,
episode_end=None,
quality=None,
source=None,
codec=None,
group="UNKNOWN",
tech_string="",
media_type="unknown",
site_tag=site_tag,
parse_path="ai",
)
name = clean
tokens = _tokenize(name)
season, episode, episode_end = _extract_season_episode(tokens)
quality, source, codec, group, tech_tokens = _extract_tech(tokens)
languages, lang_tokens = _extract_languages(tokens)
audio_codec, audio_channels, audio_tokens = _extract_audio(tokens)
bit_depth, hdr_format, video_tokens = _extract_video_meta(tokens)
edition, edition_tokens = _extract_edition(tokens)
title = _extract_title(
tokens,
tech_tokens | lang_tokens | audio_tokens | video_tokens | edition_tokens,
)
year = _extract_year(tokens, title)
media_type = _infer_media_type(
season, quality, source, codec, year, edition, tokens
)
tech_parts = [p for p in [quality, source, codec] if p]
tech_string = ".".join(tech_parts)
return ParsedRelease(
raw=name,
normalised=name,
title=title,
year=year,
season=season,
episode=episode,
episode_end=episode_end,
quality=quality,
source=source,
codec=codec,
group=group,
tech_string=tech_string,
media_type=media_type,
site_tag=site_tag,
parse_path=parse_path,
languages=languages,
audio_codec=audio_codec,
audio_channels=audio_channels,
bit_depth=bit_depth,
hdr_format=hdr_format,
edition=edition,
)
def _infer_media_type(
season: int | None,
quality: str | None,
source: str | None,
codec: str | None,
year: int | None,
edition: str | None,
tokens: list[str],
) -> str:
"""
Infer media_type from token-level evidence only (no filesystem access).
- documentary : DOC token present
- concert : CONCERT token present
- tv_complete : INTEGRALE/COMPLETE token, no season
- tv_show : season token found
- movie : no season, at least one tech marker
- unknown : no conclusive evidence
"""
upper_tokens = {t.upper() for t in tokens}
doc_tokens = {t.upper() for t in _MEDIA_TYPE_TOKENS.get("doc", [])}
concert_tokens = {t.upper() for t in _MEDIA_TYPE_TOKENS.get("concert", [])}
integrale_tokens = {t.upper() for t in _MEDIA_TYPE_TOKENS.get("integrale", [])}
if upper_tokens & doc_tokens:
return "documentary"
if upper_tokens & concert_tokens:
return "concert"
if (
edition in {"COMPLETE", "INTEGRALE", "COLLECTION"}
or upper_tokens & integrale_tokens
) and season is None:
return "tv_complete"
if season is not None:
return "tv_show"
if any([quality, source, codec, year]):
return "movie"
return "unknown"
def _is_well_formed(name: str) -> bool:
"""Return True if name contains no forbidden characters per scene naming rules.
Characters listed as token separators (spaces, brackets, parens, …) are NOT
considered malforming — the tokenizer handles them. Only truly broken chars
like '@', '#', '!', '%' make a name malformed.
"""
tokenizable = set(load_separators())
return not any(c in name for c in _FORBIDDEN_CHARS if c not in tokenizable)
def _strip_site_tag(name: str) -> tuple[str, str | None]:
"""
Strip a site watermark tag from the release name and return (clean_name, tag).
Handles two positions:
- Prefix: "[ OxTorrent.vc ] The.Title.S01..."
- Suffix: "The.Title.S01...-NTb[TGx]"
Anything between [...] is treated as a site tag.
Returns (original_name, None) if no tag found.
"""
s = name.strip()
if s.startswith("["):
close = s.find("]")
if close != -1:
tag = s[1:close].strip()
remainder = s[close + 1 :].strip()
if tag and remainder:
return remainder, tag
if s.endswith("]"):
open_bracket = s.rfind("[")
if open_bracket != -1:
tag = s[open_bracket + 1 : -1].strip()
remainder = s[:open_bracket].strip()
if tag and remainder:
return remainder, tag
return s, None
def _parse_season_episode(tok: str) -> tuple[int, int | None, int | None] | None:
"""
Parse a single token as a season/episode marker.
Handles:
- SxxExx / SxxExxExx / Sxx (canonical scene form)
- NxNN / NxNNxNN (alt form: 1x05, 12x07x08)
Returns (season, episode, episode_end) or None if not a season token.
"""
upper = tok.upper()
# SxxExx form
if len(upper) >= 3 and upper[0] == "S" and upper[1:3].isdigit():
season = int(upper[1:3])
rest = upper[3:]
if not rest:
return season, None, None
episodes: list[int] = []
while rest.startswith("E") and len(rest) >= 3 and rest[1:3].isdigit():
episodes.append(int(rest[1:3]))
rest = rest[3:]
if not episodes:
return None # malformed token like "S03XYZ"
return season, episodes[0], episodes[1] if len(episodes) >= 2 else None
# NxNN form — split on "X" (uppercased), all parts must be digits
if "X" in upper:
parts = upper.split("X")
if len(parts) >= 2 and all(p.isdigit() and p for p in parts):
season = int(parts[0])
episode = int(parts[1])
episode_end = int(parts[2]) if len(parts) >= 3 else None
return season, episode, episode_end
return None
def _extract_season_episode(
tokens: list[str],
) -> tuple[int | None, int | None, int | None]:
for tok in tokens:
parsed = _parse_season_episode(tok)
if parsed is not None:
return parsed
return None, None, None
def _extract_tech(
tokens: list[str],
) -> tuple[str | None, str | None, str | None, str, set[str]]:
"""
Extract quality, source, codec, group from tokens.
Returns (quality, source, codec, group, tech_token_set).
Group extraction strategy (in priority order):
1. Token where prefix is a known codec: x265-GROUP
2. Rightmost token with a dash that isn't a known source
"""
quality: str | None = None
source: str | None = None
codec: str | None = None
group = "UNKNOWN"
tech_tokens: set[str] = set()
for tok in tokens:
tl = tok.lower()
if tl in _RESOLUTIONS:
quality = tok
tech_tokens.add(tok)
continue
if tl in _SOURCES:
source = tok
tech_tokens.add(tok)
continue
if "-" in tok:
parts = tok.rsplit("-", 1)
# codec-GROUP (highest priority for group)
if parts[0].lower() in _CODECS:
codec = parts[0]
group = parts[1] if parts[1] else "UNKNOWN"
tech_tokens.add(tok)
continue
# source with dash: Web-DL, WEB-DL, etc.
if parts[0].lower() in _SOURCES or tok.lower().replace("-", "") in _SOURCES:
source = tok
tech_tokens.add(tok)
continue
if tl in _CODECS:
codec = tok
tech_tokens.add(tok)
# Fallback: rightmost token with a dash that isn't a known source
if group == "UNKNOWN":
for tok in reversed(tokens):
if "-" in tok:
parts = tok.rsplit("-", 1)
tl = tok.lower()
if tl in _SOURCES or tok.lower().replace("-", "") in _SOURCES:
continue
if parts[1]:
group = parts[1]
break
return quality, source, codec, group, tech_tokens
def _is_year_token(tok: str) -> bool:
"""Return True if tok is a 4-digit year between 1900 and 2099."""
return len(tok) == 4 and tok.isdigit() and 1900 <= int(tok) <= 2099
def _extract_title(tokens: list[str], tech_tokens: set[str]) -> str:
"""Extract the title portion: everything before the first season/year/tech token."""
title_parts = []
for tok in tokens:
if _parse_season_episode(tok) is not None:
break
if _is_year_token(tok):
break
if tok in tech_tokens or tok.lower() in _RESOLUTIONS | _SOURCES | _CODECS:
break
if "-" in tok and any(p.lower() in _CODECS | _SOURCES for p in tok.split("-")):
break
title_parts.append(tok)
return ".".join(title_parts) if title_parts else tokens[0]
def _extract_year(tokens: list[str], title: str) -> int | None:
"""Extract a 4-digit year from tokens (only after the title)."""
title_len = len(title.split("."))
for tok in tokens[title_len:]:
if _is_year_token(tok):
return int(tok)
return None
# ---------------------------------------------------------------------------
# Sequence matcher
# ---------------------------------------------------------------------------
def _match_sequences(
tokens: list[str],
sequences: list[dict],
key: str,
) -> tuple[str | None, set[str]]:
"""
Try to match multi-token sequences against consecutive tokens.
Returns (matched_value, set_of_matched_tokens) or (None, empty_set).
Sequences must be ordered most-specific first in the YAML.
"""
upper_tokens = [t.upper() for t in tokens]
for seq in sequences:
seq_upper = [s.upper() for s in seq["tokens"]]
n = len(seq_upper)
for i in range(len(upper_tokens) - n + 1):
if upper_tokens[i : i + n] == seq_upper:
matched = set(tokens[i : i + n])
return seq[key], matched
return None, set()
# ---------------------------------------------------------------------------
# Language extraction
# ---------------------------------------------------------------------------
def _extract_languages(tokens: list[str]) -> tuple[list[str], set[str]]:
"""Extract language tokens. Returns (languages, matched_token_set)."""
languages = []
lang_tokens: set[str] = set()
for tok in tokens:
if tok.upper() in _LANGUAGE_TOKENS:
languages.append(tok.upper())
lang_tokens.add(tok)
return languages, lang_tokens
# ---------------------------------------------------------------------------
# Audio extraction
# ---------------------------------------------------------------------------
def _extract_audio(
tokens: list[str],
) -> tuple[str | None, str | None, set[str]]:
"""
Extract audio codec and channel layout.
Returns (audio_codec, audio_channels, matched_token_set).
Sequences are tried first (DTS.HD.MA, TrueHD.Atmos, …), then single tokens.
"""
audio_codec: str | None = None
audio_channels: str | None = None
audio_tokens: set[str] = set()
known_codecs = {c.upper() for c in _AUDIO.get("codecs", [])}
known_channels = set(_AUDIO.get("channels", []))
# Try multi-token sequences first
matched_codec, matched_set = _match_sequences(
tokens, _AUDIO.get("sequences", []), "codec"
)
if matched_codec:
audio_codec = matched_codec
audio_tokens |= matched_set
# Channel layouts like "5.1" or "7.1" are split into two tokens by normalize —
# detect them as consecutive pairs "X" + "Y" where "X.Y" is a known channel.
# The second token may have a "-GROUP" suffix (e.g. "1-KTH" → strip it).
for i in range(len(tokens) - 1):
second = tokens[i + 1].split("-")[0]
candidate = f"{tokens[i]}.{second}"
if candidate in known_channels and audio_channels is None:
audio_channels = candidate
audio_tokens.add(tokens[i])
audio_tokens.add(tokens[i + 1])
for tok in tokens:
if tok in audio_tokens:
continue
if tok.upper() in known_codecs and audio_codec is None:
audio_codec = tok
audio_tokens.add(tok)
elif tok in known_channels and audio_channels is None:
audio_channels = tok
audio_tokens.add(tok)
return audio_codec, audio_channels, audio_tokens
# ---------------------------------------------------------------------------
# Video metadata extraction (bit depth, HDR)
# ---------------------------------------------------------------------------
def _extract_video_meta(
tokens: list[str],
) -> tuple[str | None, str | None, set[str]]:
"""
Extract bit depth and HDR format.
Returns (bit_depth, hdr_format, matched_token_set).
"""
bit_depth: str | None = None
hdr_format: str | None = None
video_tokens: set[str] = set()
known_hdr = {h.upper() for h in _VIDEO_META.get("hdr", [])} | _HDR_EXTRA
known_depth = {d.lower() for d in _VIDEO_META.get("bit_depth", [])}
# Try HDR sequences first
matched_hdr, matched_set = _match_sequences(
tokens, _VIDEO_META.get("sequences", []), "hdr"
)
if matched_hdr:
hdr_format = matched_hdr
video_tokens |= matched_set
for tok in tokens:
if tok in video_tokens:
continue
if tok.upper() in known_hdr and hdr_format is None:
hdr_format = tok.upper()
video_tokens.add(tok)
elif tok.lower() in known_depth and bit_depth is None:
bit_depth = tok.lower()
video_tokens.add(tok)
return bit_depth, hdr_format, video_tokens
# ---------------------------------------------------------------------------
# Edition extraction
# ---------------------------------------------------------------------------
def _extract_edition(tokens: list[str]) -> tuple[str | None, set[str]]:
"""
Extract release edition (UNRATED, EXTENDED, DIRECTORS.CUT, …).
Returns (edition, matched_token_set).
"""
known_tokens = {t.upper() for t in _EDITIONS.get("tokens", [])}
# Try multi-token sequences first
matched_edition, matched_set = _match_sequences(
tokens, _EDITIONS.get("sequences", []), "edition"
)
if matched_edition:
return matched_edition, matched_set
for tok in tokens:
if tok.upper() in known_tokens:
return tok.upper(), {tok}
return None, set()
+165
View File
@@ -0,0 +1,165 @@
"""Release domain — value objects and token sets."""
from __future__ import annotations
from dataclasses import dataclass, field
from .knowledge import (
load_audio,
load_codecs,
load_editions,
load_forbidden_chars,
load_hdr_extra,
load_language_tokens,
load_media_type_tokens,
load_metadata_extensions,
load_non_video_extensions,
load_resolutions,
load_sources,
load_sources_extra,
load_subtitle_extensions,
load_video,
load_video_extensions,
load_win_forbidden_chars,
)
# Token sets — loaded once at import time from alfred/knowledge/release/
_RESOLUTIONS: set[str] = load_resolutions()
_SOURCES: set[str] = load_sources() | load_sources_extra()
_CODECS: set[str] = load_codecs()
_VIDEO_EXTENSIONS: set[str] = load_video_extensions()
_NON_VIDEO_EXTENSIONS: set[str] = load_non_video_extensions()
_SUBTITLE_EXTENSIONS: set[str] = load_subtitle_extensions()
# Both metadata and subtitle extensions are ignored when deciding the media
# type of a folder — neither is a conclusive signal for movie/tv/other.
_METADATA_EXTENSIONS: set[str] = load_metadata_extensions() | _SUBTITLE_EXTENSIONS
_FORBIDDEN_CHARS: set[str] = load_forbidden_chars()
_LANGUAGE_TOKENS: set[str] = load_language_tokens()
_AUDIO: dict = load_audio()
_VIDEO_META: dict = load_video()
_EDITIONS: dict = load_editions()
_HDR_EXTRA: set[str] = load_hdr_extra()
_MEDIA_TYPE_TOKENS: dict = load_media_type_tokens()
# Translation table for stripping Windows-forbidden characters
_WIN_FORBIDDEN_TABLE = str.maketrans("", "", "".join(load_win_forbidden_chars()))
def _sanitize_for_fs(text: str) -> str:
"""Remove Windows-forbidden characters from a string."""
return text.translate(_WIN_FORBIDDEN_TABLE)
def _strip_episode_from_normalized(normalized: str) -> str:
"""
Remove all episode parts (Exx) from a normalized release name, keeping Sxx.
Oz.S03E01.1080p... → Oz.S03.1080p...
Archer.S14E09E10E11.1080p... → Archer.S14.1080p...
"""
tokens = normalized.split(".")
result = []
for tok in tokens:
upper = tok.upper()
# Token is SxxExx... — keep only the Sxx part
if len(upper) >= 3 and upper[0] == "S" and upper[1:3].isdigit():
result.append(tok[:3]) # "S" + two digits
else:
result.append(tok)
return ".".join(result)
@dataclass
class ParsedRelease:
"""Structured representation of a parsed release name."""
raw: str # original release name (untouched)
normalised: str # dots instead of spaces
title: str # show/movie title (dots, no year/season/tech)
year: int | None # movie year or show start year (from TMDB)
season: int | None # season number (None for movies)
episode: int | None # first episode number (None if season-pack)
episode_end: int | None # last episode for multi-ep (None otherwise)
quality: str | None # 1080p, 2160p, …
source: str | None # WEBRip, BluRay, …
codec: str | None # x265, HEVC, …
group: str # release group, "UNKNOWN" if missing
tech_string: str # quality.source.codec joined with dots
media_type: str = (
"unknown" # "movie" | "tv_show" | "tv_complete" | "other" | "unknown"
)
site_tag: str | None = (
None # site watermark stripped from name, e.g. "TGx", "OxTorrent.vc"
)
parse_path: str = "direct" # "direct" | "sanitized" | "ai"
languages: list[str] = field(default_factory=list) # ["MULTI", "VFF"], ["FRENCH"], …
audio_codec: str | None = None # "DTS-HD.MA", "DDP", "EAC3", …
audio_channels: str | None = None # "5.1", "7.1", "2.0", …
bit_depth: str | None = None # "10bit", "8bit", …
hdr_format: str | None = None # "DV", "HDR10", "DV.HDR10", …
edition: str | None = None # "UNRATED", "EXTENDED", "DIRECTORS.CUT", …
@property
def is_season_pack(self) -> bool:
return self.season is not None and self.episode is None
def show_folder_name(self, tmdb_title: str, tmdb_year: int) -> str:
"""
Build the series root folder name.
Format: {Title}.{Year}.{Tech}-{Group}
Example: Oz.1997.1080p.WEBRip.x265-KONTRAST
"""
title_part = _sanitize_for_fs(tmdb_title).replace(" ", ".")
tech = self.tech_string or "Unknown"
return f"{title_part}.{tmdb_year}.{tech}-{self.group}"
def season_folder_name(self) -> str:
"""
Build the season subfolder name = normalized release name (no episode).
Example: Oz.S03.1080p.WEBRip.x265-KONTRAST
For a single-episode release we still strip the episode token so the
folder can hold the whole season.
"""
return _strip_episode_from_normalized(self.normalised)
def episode_filename(self, tmdb_episode_title: str | None, ext: str) -> str:
"""
Build the episode filename.
Format: {Title}.{SxxExx}.{EpisodeTitle}.{Tech}-{Group}.{ext}
Example: Oz.S01E01.The.Routine.1080p.WEBRip.x265-KONTRAST.mkv
If tmdb_episode_title is None, omits the episode title segment.
"""
title_part = _sanitize_for_fs(self.title)
s = f"S{self.season:02d}" if self.season is not None else ""
e = f"E{self.episode:02d}" if self.episode is not None else ""
se = s + e
ep_title = ""
if tmdb_episode_title:
ep_title = "." + _sanitize_for_fs(tmdb_episode_title).replace(" ", ".")
tech = self.tech_string or "Unknown"
ext_clean = ext.lstrip(".")
return f"{title_part}.{se}{ep_title}.{tech}-{self.group}.{ext_clean}"
def movie_folder_name(self, tmdb_title: str, tmdb_year: int) -> str:
"""
Build the movie folder name.
Format: {Title}.{Year}.{Tech}-{Group}
Example: Inception.2010.1080p.BluRay.x265-GROUP
"""
return self.show_folder_name(tmdb_title, tmdb_year)
def movie_filename(self, tmdb_title: str, tmdb_year: int, ext: str) -> str:
"""
Build the movie filename (same as folder name + extension).
Example: Inception.2010.1080p.BluRay.x265-GROUP.mkv
"""
ext_clean = ext.lstrip(".")
return f"{self.movie_folder_name(tmdb_title, tmdb_year)}.{ext_clean}"
+2 -1
View File
@@ -1,7 +1,7 @@
"""Shared kernel - Common domain concepts used across subdomains."""
from .exceptions import DomainException, ValidationError
from .value_objects import FilePath, FileSize, ImdbId
from .value_objects import FilePath, FileSize, ImdbId, Language
__all__ = [
"DomainException",
@@ -9,4 +9,5 @@ __all__ = [
"ImdbId",
"FilePath",
"FileSize",
"Language",
]
@@ -0,0 +1,5 @@
"""Shared knowledge loaders (cross-domain)."""
from .language_registry import LanguageRegistry
__all__ = ["LanguageRegistry"]
@@ -0,0 +1,129 @@
"""LanguageRegistry — loads and queries the canonical language table from YAML.
Builtin entries live in ``alfred/knowledge/iso_languages.yaml`` (versioned).
Learned entries can be added to ``data/knowledge/iso_languages_learned.yaml``
(gitignored, instance-local) and are merged additively — they extend builtin
languages or add new ones, never remove builtin entries.
"""
import logging
from pathlib import Path
import yaml
import alfred as _alfred_pkg
from ..value_objects import Language
logger = logging.getLogger(__name__)
_BUILTIN_ROOT = Path(_alfred_pkg.__file__).parent / "knowledge"
_LEARNED_ROOT = Path(_alfred_pkg.__file__).parent.parent / "data" / "knowledge"
def _load_yaml(path: Path) -> dict:
try:
with open(path, encoding="utf-8") as f:
return yaml.safe_load(f) or {}
except FileNotFoundError:
return {}
except Exception as e:
logger.warning(f"LanguageRegistry: could not load {path}: {e}")
return {}
def _merge_language_entries(base: dict, override: dict) -> dict:
"""
Merge learned language entries into builtin entries.
For each language iso, aliases lists are extended (deduped, order preserved);
scalar fields in override win over base.
"""
result = dict(base)
for iso, override_entry in override.items():
if iso not in result:
result[iso] = override_entry
continue
merged = dict(result[iso])
for key, val in override_entry.items():
if key == "aliases" and isinstance(val, list):
existing = merged.get("aliases", []) or []
merged["aliases"] = existing + [v for v in val if v not in existing]
else:
merged[key] = val
result[iso] = merged
return result
class LanguageRegistry:
"""
Loads the canonical language table and provides lookup methods.
Usage::
registry = LanguageRegistry()
fr = registry.from_iso("fra")
fr2 = registry.from_any("French") # → same Language as `fr`
fr3 = registry.from_any("fr") # → same Language
fr4 = registry.from_any("vostfr") # → None (vostfr is subtitle-specific,
# lives in subtitles knowledge)
"""
def __init__(self) -> None:
self._by_iso: dict[str, Language] = {}
self._lookup: dict[str, Language] = {} # any-form → Language
self._load()
def _load(self) -> None:
builtin = (
_load_yaml(_BUILTIN_ROOT / "iso_languages.yaml").get("languages", {}) or {}
)
learned = (
_load_yaml(_LEARNED_ROOT / "iso_languages_learned.yaml").get(
"languages", {}
)
or {}
)
merged = _merge_language_entries(builtin, learned)
for iso, entry in merged.items():
language = Language(
iso=iso,
english_name=entry.get("english_name", iso),
native_name=entry.get("native_name", iso),
aliases=tuple(entry.get("aliases", []) or []),
)
self._by_iso[language.iso] = language
# Build the flat lookup table for from_any
self._lookup[language.iso] = language
self._lookup[language.english_name.lower()] = language
self._lookup[language.native_name.lower()] = language
for alias in language.aliases:
self._lookup[alias] = language
logger.info(f"LanguageRegistry: {len(self._by_iso)} languages loaded")
def from_iso(self, code: str) -> Language | None:
"""Look up by canonical 639-2/T code (case-insensitive)."""
if not isinstance(code, str):
return None
return self._by_iso.get(code.lower().strip())
def from_any(self, raw: str) -> Language | None:
"""
Look up by any known representation: iso code, 639-1, 639-2/B variant,
english name, native name, or any registered alias. Case-insensitive.
"""
if not isinstance(raw, str):
return None
return self._lookup.get(raw.lower().strip())
def all(self) -> list[Language]:
"""Return all known languages, in load order."""
return list(self._by_iso.values())
def __contains__(self, raw: str) -> bool:
return self.from_any(raw) is not None
def __len__(self) -> int:
return len(self._by_iso)
+19
View File
@@ -0,0 +1,19 @@
"""Media — file-level track types (video/audio/subtitle) and MediaInfo container.
These are the **container-view** dataclasses, populated from ffprobe output and
used across the project to describe the content of a media file.
"""
from .audio import AudioTrack
from .info import MediaInfo
from .matching import track_lang_matches
from .subtitle import SubtitleTrack
from .video import VideoTrack
__all__ = [
"AudioTrack",
"MediaInfo",
"SubtitleTrack",
"VideoTrack",
"track_lang_matches",
]
+17
View File
@@ -0,0 +1,17 @@
"""AudioTrack — a single audio stream as reported by ffprobe."""
from __future__ import annotations
from dataclasses import dataclass
@dataclass
class AudioTrack:
"""A single audio track as reported by ffprobe."""
index: int
codec: str | None # aac, ac3, eac3, dts, truehd, flac, …
channels: int | None # 2, 6 (5.1), 8 (7.1), …
channel_layout: str | None # stereo, 5.1, 7.1, …
language: str | None # ISO 639-2: fre, eng, und, …
is_default: bool = False
+76
View File
@@ -0,0 +1,76 @@
"""MediaInfo — assembles video, audio and subtitle tracks for a media file."""
from __future__ import annotations
from dataclasses import dataclass, field
from .audio import AudioTrack
from .subtitle import SubtitleTrack
from .video import VideoTrack
@dataclass
class MediaInfo:
"""
File-level media metadata extracted by ffprobe.
Symmetric design: every stream type is a list of typed track objects.
Backwards-compatible flat accessors (``resolution``, ``width``, ) read
from the first video track when present.
"""
video_tracks: list[VideoTrack] = field(default_factory=list)
audio_tracks: list[AudioTrack] = field(default_factory=list)
subtitle_tracks: list[SubtitleTrack] = field(default_factory=list)
# File-level (from ffprobe ``format`` block, not from any single stream)
duration_seconds: float | None = None
bitrate_kbps: int | None = None
# ──────────────────────────────────────────────────────────────────────
# Video conveniences — read the first video track
# ──────────────────────────────────────────────────────────────────────
@property
def primary_video(self) -> VideoTrack | None:
return self.video_tracks[0] if self.video_tracks else None
@property
def width(self) -> int | None:
v = self.primary_video
return v.width if v else None
@property
def height(self) -> int | None:
v = self.primary_video
return v.height if v else None
@property
def video_codec(self) -> str | None:
v = self.primary_video
return v.codec if v else None
@property
def resolution(self) -> str | None:
v = self.primary_video
return v.resolution if v else None
# ──────────────────────────────────────────────────────────────────────
# Audio conveniences
# ──────────────────────────────────────────────────────────────────────
@property
def audio_languages(self) -> list[str]:
"""Unique audio languages across all tracks (ISO 639-2)."""
seen: set[str] = set()
result: list[str] = []
for track in self.audio_tracks:
if track.language and track.language not in seen:
seen.add(track.language)
result.append(track.language)
return result
@property
def is_multi_audio(self) -> bool:
"""True if more than one audio language is present."""
return len(self.audio_languages) > 1
+33
View File
@@ -0,0 +1,33 @@
"""Language-matching helper shared by media-bearing entities.
Both ``Episode`` and ``Movie`` carry ``audio_tracks`` / ``subtitle_tracks`` and
need to answer "do I have audio in language X?". The matching contract is the
same in both cases keep it in one place.
"""
from __future__ import annotations
from ..value_objects import Language
def track_lang_matches(track_lang: str | None, query: str | Language) -> bool:
"""
Match a track's language string against a query (contract "C+").
* ``Language`` query matches if the track string is any known
representation of that Language (delegates to ``Language.matches``).
Powerful, cross-format mode.
* ``str`` query case-insensitive direct comparison against
``track_lang``. Simple, no normalization, no registry lookup.
Callers needing cross-format resolution (``"fr"`` ``"fre"``
``"french"``) should resolve their string through a ``LanguageRegistry``
once and pass the resulting ``Language``.
"""
if track_lang is None:
return False
if isinstance(query, Language):
return query.matches(track_lang)
if isinstance(query, str):
return track_lang.lower().strip() == query.lower().strip()
return False
+25
View File
@@ -0,0 +1,25 @@
"""SubtitleTrack — a single embedded subtitle stream as reported by ffprobe.
This is the **container-view** representation (ffprobe output) used uniformly
across the project to describe a subtitle stream embedded in a media file.
Not to be confused with ``alfred.domain.subtitles.entities.SubtitleCandidate``
which models a subtitle being **scanned/matched** (with confidence, raw tokens,
file path, etc.). The two coexist by design they describe the same real-world
concept seen from two different bounded contexts.
"""
from __future__ import annotations
from dataclasses import dataclass
@dataclass
class SubtitleTrack:
"""A single embedded subtitle track as reported by ffprobe."""
index: int
codec: str | None # subrip, ass, hdmv_pgs_subtitle, …
language: str | None # ISO 639-2: fre, eng, und, …
is_default: bool = False
is_forced: bool = False
+62
View File
@@ -0,0 +1,62 @@
"""VideoTrack — a single video stream as reported by ffprobe."""
from __future__ import annotations
from dataclasses import dataclass
@dataclass
class VideoTrack:
"""A single video track as reported by ffprobe.
A media file typically has one video track but can have several (alt
camera angles, attached thumbnail images reported as still-image streams,
etc.), hence the list[VideoTrack] on MediaInfo.
"""
index: int
codec: str | None # h264, hevc, av1, …
width: int | None
height: int | None
is_default: bool = False
@property
def resolution(self) -> str | None:
"""
Best-effort resolution string: 2160p, 1080p, 720p,
Width takes priority over height to handle widescreen/cinema crops
(e.g. 1920×960 scope 1080p, not 720p). Falls back to height when
width is unavailable.
"""
match (self.width, self.height):
case (None, None):
return None
case (w, h) if w is not None:
match True:
case _ if w >= 3840:
return "2160p"
case _ if w >= 1920:
return "1080p"
case _ if w >= 1280:
return "720p"
case _ if w >= 720:
return "576p"
case _ if w >= 640:
return "480p"
case _:
return f"{h}p" if h else f"{w}w"
case (None, h):
match True:
case _ if h >= 2160:
return "2160p"
case _ if h >= 1080:
return "1080p"
case _ if h >= 720:
return "720p"
case _ if h >= 576:
return "576p"
case _ if h >= 480:
return "480p"
case _:
return f"{h}p"
+94
View File
@@ -131,3 +131,97 @@ class FileSize:
def __repr__(self) -> str:
return f"FileSize({self.bytes})"
@dataclass(frozen=True)
class Language:
"""
Canonical language value object.
The primary identifier is the ISO 639-2/B code (3 letters, bibliographic form,
e.g. "fre", "eng", "ger"). This is what ffprobe emits and the project-wide
canonical form. All other representations (ISO 639-1 code, ISO 639-2/T
variant, english/native names, common spellings) live in ``aliases`` and are
used by ``matches()`` for case-insensitive lookup.
Equality and hashing are based solely on ``iso`` so two Language objects with
the same canonical code are interchangeable regardless of aliases.
"""
iso: str
english_name: str
native_name: str
aliases: tuple[str, ...] = ()
def __post_init__(self):
if not isinstance(self.iso, str) or not self.iso:
raise ValidationError(
f"Language.iso must be a non-empty string, got {self.iso!r}"
)
if len(self.iso) != 3:
raise ValidationError(
f"Language.iso must be a 3-letter ISO 639-2/B code, got {self.iso!r}"
)
# Normalize iso to lowercase
object.__setattr__(self, "iso", self.iso.lower())
# Normalize aliases to a tuple of lowercase strings (dedup, preserve order)
seen: set[str] = set()
normalized: list[str] = []
for alias in self.aliases:
if not isinstance(alias, str):
continue
a = alias.lower().strip()
if a and a not in seen:
seen.add(a)
normalized.append(a)
object.__setattr__(self, "aliases", tuple(normalized))
def matches(self, raw: str) -> bool:
"""
True if ``raw`` is any known representation of this language.
Comparison is case-insensitive and whitespace-trimmed. The match space is
the union of the canonical ``iso`` code, the english/native names, and
every alias.
"""
if not isinstance(raw, str):
return False
needle = raw.lower().strip()
if not needle:
return False
if needle == self.iso:
return True
if needle == self.english_name.lower():
return True
if needle == self.native_name.lower():
return True
return needle in self.aliases
def __eq__(self, other: object) -> bool:
if not isinstance(other, Language):
return NotImplemented
return self.iso == other.iso
def __hash__(self) -> int:
return hash(self.iso)
def __str__(self) -> str:
return self.iso
def __repr__(self) -> str:
return f"Language({self.iso!r}, {self.english_name!r})"
# Characters allowed in dot-separated folder/filename forms:
# alphanumerics, underscores, spaces (about to be replaced with dots),
# literal dots, and hyphens. Everything else is stripped.
_FS_SAFE_CHARS = re.compile(r"[^\w\s\.\-]")
def to_dot_folder_name(title: str) -> str:
"""Sanitize ``title`` for filesystem use and convert spaces to dots.
Produces e.g. ``Breaking.Bad`` from ``"Breaking Bad"`` or
``Spider.Man`` from ``"Spider-Man: No Way Home"``.
"""
return _FS_SAFE_CHARS.sub("", title).replace(" ", ".")
+2 -2
View File
@@ -1,7 +1,7 @@
"""Subtitles domain — subtitle identification, classification and placement."""
from .aggregates import SubtitleRuleSet
from .entities import MediaSubtitleMetadata, SubtitleTrack
from .entities import MediaSubtitleMetadata, SubtitleCandidate
from .exceptions import SubtitleNotFound
from .knowledge import KnowledgeLoader, SubtitleKnowledgeBase
from .services import PatternDetector, SubtitleIdentifier, SubtitleMatcher
@@ -17,7 +17,7 @@ from .value_objects import (
)
__all__ = [
"SubtitleTrack",
"SubtitleCandidate",
"MediaSubtitleMetadata",
"SubtitleRuleSet",
"SubtitleKnowledgeBase",
+9 -4
View File
@@ -26,7 +26,7 @@ class SubtitleRuleSet:
"""
scope: RuleScope
parent: "SubtitleRuleSet | None" = None
parent: SubtitleRuleSet | None = None
pinned_to: ImdbId | None = None
# Deltas — None = inherit
@@ -47,7 +47,9 @@ class SubtitleRuleSet:
preferred_formats=self._formats or base.preferred_formats,
allowed_types=self._types or base.allowed_types,
format_priority=self._format_priority or base.format_priority,
min_confidence=self._min_confidence if self._min_confidence is not None else base.min_confidence,
min_confidence=self._min_confidence
if self._min_confidence is not None
else base.min_confidence,
)
def override(
@@ -83,8 +85,11 @@ class SubtitleRuleSet:
delta["format_priority"] = self._format_priority
if self._min_confidence is not None:
delta["min_confidence"] = self._min_confidence
return {"scope": {"level": self.scope.level, "identifier": self.scope.identifier}, "override": delta}
return {
"scope": {"level": self.scope.level, "identifier": self.scope.identifier},
"override": delta,
}
@classmethod
def global_default(cls) -> "SubtitleRuleSet":
def global_default(cls) -> SubtitleRuleSet:
return cls(scope=RuleScope(level="global"))
+29 -13
View File
@@ -4,16 +4,24 @@ from dataclasses import dataclass, field
from pathlib import Path
from ..shared.value_objects import ImdbId
from .value_objects import SubtitleFormat, SubtitleLanguage, SubtitleMatchingRules, SubtitleType
from .value_objects import (
SubtitleFormat,
SubtitleLanguage,
SubtitleType,
)
@dataclass
class SubtitleTrack:
class SubtitleCandidate:
"""
A single subtitle track either an external file or an embedded stream.
A subtitle being scanned and matched either an external file or an embedded stream.
State can evolve: unknown resolved after user clarification.
confidence reflects how certain we are about language + type classification.
Unlike ``alfred.domain.shared.media.SubtitleTrack`` (the pure container-view
populated from ffprobe), a SubtitleCandidate carries the **flow state** of the
subtitle matching pipeline: language/format are typed value objects that may
be ``None`` while classification is in progress, ``confidence`` reflects how
certain we are, and ``raw_tokens`` holds the filename fragments still under
analysis. State evolves: unknown resolved after user clarification.
"""
# Classification (may be None if not yet resolved)
@@ -29,7 +37,9 @@ class SubtitleTrack:
# Matching state
confidence: float = 0.0 # 0.0 → 1.0, not applicable for embedded
raw_tokens: list[str] = field(default_factory=list) # tokens extracted from filename
raw_tokens: list[str] = field(
default_factory=list
) # tokens extracted from filename
def is_resolved(self) -> bool:
return self.language is not None
@@ -43,7 +53,9 @@ class SubtitleTrack:
{lang}.forced.{ext}
"""
if not self.language or not self.format:
raise ValueError("Cannot compute destination_name: language or format missing")
raise ValueError(
"Cannot compute destination_name: language or format missing"
)
ext = self.format.extensions[0].lstrip(".")
parts = [self.language.code]
if self.subtitle_type == SubtitleType.SDH:
@@ -55,8 +67,12 @@ class SubtitleTrack:
def __repr__(self) -> str:
lang = self.language.code if self.language else "?"
fmt = self.format.id if self.format else "?"
src = "embedded" if self.is_embedded else str(self.file_path.name if self.file_path else "?")
return f"SubtitleTrack({lang}, {self.subtitle_type.value}, {fmt}, src={src}, conf={self.confidence:.2f})"
src = (
"embedded"
if self.is_embedded
else str(self.file_path.name if self.file_path else "?")
)
return f"SubtitleCandidate({lang}, {self.subtitle_type.value}, {fmt}, src={src}, conf={self.confidence:.2f})"
@dataclass
@@ -68,14 +84,14 @@ class MediaSubtitleMetadata:
media_id: ImdbId | None
media_type: str # "movie" | "tv_show"
embedded_tracks: list[SubtitleTrack] = field(default_factory=list)
external_tracks: list[SubtitleTrack] = field(default_factory=list)
embedded_tracks: list[SubtitleCandidate] = field(default_factory=list)
external_tracks: list[SubtitleCandidate] = field(default_factory=list)
release_group: str | None = None
detected_pattern_id: str | None = None # pattern id from knowledge base
pattern_confirmed: bool = False
@property
def all_tracks(self) -> list[SubtitleTrack]:
def all_tracks(self) -> list[SubtitleCandidate]:
return self.embedded_tracks + self.external_tracks
@property
@@ -83,5 +99,5 @@ class MediaSubtitleMetadata:
return len(self.embedded_tracks) + len(self.external_tracks)
@property
def unresolved_tracks(self) -> list[SubtitleTrack]:
def unresolved_tracks(self) -> list[SubtitleCandidate]:
return [t for t in self.external_tracks if t.language is None]
+45 -13
View File
@@ -1,8 +1,8 @@
"""SubtitleKnowledgeBase — parsed, typed view of the loaded knowledge."""
import logging
from functools import cached_property
from ...shared.knowledge.language_registry import LanguageRegistry
from ..value_objects import (
ScanStrategy,
SubtitleFormat,
@@ -25,11 +25,16 @@ class SubtitleKnowledgeBase:
without restarting.
"""
def __init__(self, loader: KnowledgeLoader | None = None):
def __init__(
self,
loader: KnowledgeLoader | None = None,
language_registry: LanguageRegistry | None = None,
):
self._loader = loader or KnowledgeLoader()
self._language_registry = language_registry or LanguageRegistry()
self._build()
def _build(self) -> None:
def _build(self) -> None: # noqa: PLR0912 — straight-line YAML projection
data = self._loader.subtitles()
self._formats: dict[str, SubtitleFormat] = {}
@@ -40,17 +45,44 @@ class SubtitleKnowledgeBase:
description=fdata.get("description", ""),
)
self._languages: dict[str, SubtitleLanguage] = {}
for code, ldata in data.get("languages", {}).items():
self._languages[code] = SubtitleLanguage(
code=code,
tokens=ldata.get("tokens", []),
)
# Languages are sourced primarily from the canonical LanguageRegistry
# (alfred/knowledge/iso_languages.yaml — ISO 639-2/B). Subtitle-specific
# tokens (VOSTFR, VF, VFF…) are merged on top from subtitles.yaml's
# ``language_tokens`` section.
subtitle_extras: dict[str, list[str]] = {
code: list(tokens or [])
for code, tokens in (data.get("language_tokens", {}) or {}).items()
}
# Build reverse token → language code map
self._languages: dict[str, SubtitleLanguage] = {}
self._lang_token_map: dict[str, str] = {}
for code, lang in self._languages.items():
for token in lang.tokens:
for language in self._language_registry.all():
tokens: list[str] = [language.iso, language.english_name.lower()]
if language.native_name.lower() not in tokens:
tokens.append(language.native_name.lower())
for alias in language.aliases:
if alias not in tokens:
tokens.append(alias)
for extra in subtitle_extras.get(language.iso, []):
if extra.lower() not in tokens:
tokens.append(extra.lower())
self._languages[language.iso] = SubtitleLanguage(
code=language.iso,
tokens=tokens,
)
for token in tokens:
self._lang_token_map[token.lower()] = language.iso
# Subtitle-specific tokens for languages NOT in the canonical registry
# are still honored: register them as a minimal SubtitleLanguage.
for code, extras in subtitle_extras.items():
if code in self._languages:
continue
tokens = [code] + [e.lower() for e in extras]
self._languages[code] = SubtitleLanguage(code=code, tokens=tokens)
for token in tokens:
self._lang_token_map[token.lower()] = code
# Build reverse token → type map
@@ -62,7 +94,7 @@ class SubtitleKnowledgeBase:
d = data.get("defaults", {})
self._default_rules = SubtitleMatchingRules(
preferred_languages=d.get("languages", ["fra", "eng"]),
preferred_languages=d.get("languages", ["fre", "eng"]),
preferred_formats=d.get("formats", ["srt"]),
allowed_types=d.get("types", ["standard", "forced"]),
format_priority=d.get("format_priority", ["srt", "ass"]),
+8 -4
View File
@@ -5,10 +5,10 @@ from pathlib import Path
import yaml
logger = logging.getLogger(__name__)
import alfred as _alfred_pkg
logger = logging.getLogger(__name__)
# Builtin knowledge — anchored on the alfred package itself, not on this file's depth
_BUILTIN_ROOT = Path(_alfred_pkg.__file__).parent / "knowledge"
@@ -84,7 +84,9 @@ class KnowledgeLoader:
data = _load_yaml(path)
pid = data.get("id", path.stem)
if pid in self._cache["patterns"]:
self._cache["patterns"][pid] = _merge(self._cache["patterns"][pid], data)
self._cache["patterns"][pid] = _merge(
self._cache["patterns"][pid], data
)
else:
self._cache["patterns"][pid] = data
logger.info(f"KnowledgeLoader: learned new pattern '{pid}'")
@@ -100,7 +102,9 @@ class KnowledgeLoader:
data = _load_yaml(path)
name = data.get("name", path.stem)
if name in self._cache["release_groups"]:
self._cache["release_groups"][name] = _merge(self._cache["release_groups"][name], data)
self._cache["release_groups"][name] = _merge(
self._cache["release_groups"][name], data
)
else:
self._cache["release_groups"][name] = data
logger.info(f"KnowledgeLoader: learned new release group '{name}'")
-60
View File
@@ -1,60 +0,0 @@
"""Subtitle repository interfaces (abstract)."""
from abc import ABC, abstractmethod
from ..shared.value_objects import ImdbId
from .entities import Subtitle
from .value_objects import Language
class SubtitleRepository(ABC):
"""
Abstract repository for subtitle persistence.
This defines the interface that infrastructure implementations must follow.
"""
@abstractmethod
def save(self, subtitle: Subtitle) -> None:
"""
Save a subtitle to the repository.
Args:
subtitle: Subtitle entity to save
"""
pass
@abstractmethod
def find_by_media(
self,
media_imdb_id: ImdbId,
language: Language | None = None,
season: int | None = None,
episode: int | None = None,
) -> list[Subtitle]:
"""
Find subtitles for a media item.
Args:
media_imdb_id: IMDb ID of the media
language: Optional language filter
season: Optional season number (for TV shows)
episode: Optional episode number (for TV shows)
Returns:
List of matching subtitles
"""
pass
@abstractmethod
def delete(self, subtitle: Subtitle) -> bool:
"""
Delete a subtitle from the repository.
Args:
subtitle: Subtitle to delete
Returns:
True if deleted, False if not found
"""
pass
+60 -74
View File
@@ -3,8 +3,11 @@
Given a video file path, the scanner:
1. Looks for subtitle files in the same directory as the video.
2. Optionally also inspects a Subs/ subfolder adjacent to the video.
3. Classifies each file (language, SDH, forced) from its filename.
4. Filters according to SubtitlePreferences (languages, min_size_kb, keep_sdh, keep_forced).
3. Classifies each file (language, SDH, forced) from its filename, delegating
all token knowledge to SubtitleKnowledgeBase (which itself merges
LanguageRegistry + subtitle-specific tokens from subtitles.yaml).
4. Filters according to SubtitlePreferences (languages, min_size_kb, keep_sdh,
keep_forced).
5. Returns a list of SubtitleCandidate one per file that passes the filter,
with the destination filename already computed.
@@ -12,12 +15,14 @@ Filename classification heuristics
-----------------------------------
We parse the stem of each subtitle file looking for known patterns:
fr.srt lang=fr, sdh=False, forced=False
fr.sdh.srt lang=fr, sdh=True
fr.hi.srt lang=fr, sdh=True (hi = hearing-impaired, alias for sdh)
fr.forced.srt lang=fr, forced=True
Breaking.Bad.S01E01.French.srt lang=fr (keyword match)
Breaking.Bad.S01E01.VOSTFR.srt lang=fr (VOSTFR = French forced/foreign subs)
fre.srt lang=fre, sdh=False, forced=False
fre.sdh.srt lang=fre, sdh=True
fre.forced.srt lang=fre, forced=True
Breaking.Bad.S01E01.French.srt lang=fre (alias match via LanguageRegistry)
Breaking.Bad.S01E01.VOSTFR.srt lang=fre (subtitle-specific token)
ISO 639-2/B codes are used throughout (matching the project-wide canonical form
from iso_languages.yaml what ffprobe emits).
Output naming convention (matches SubtitlePreferences docstring):
{lang}.srt
@@ -26,62 +31,16 @@ Output naming convention (matches SubtitlePreferences docstring):
"""
import logging
from dataclasses import dataclass, field
import re
from dataclasses import dataclass
from pathlib import Path
from .knowledge.base import SubtitleKnowledgeBase
from .value_objects import SubtitleType
logger = logging.getLogger(__name__)
# Subtitle file extensions we handle
SUBTITLE_EXTENSIONS = {".srt", ".ass", ".ssa", ".vtt", ".sub"}
# Language keyword map: lowercase token → ISO 639-1 code
_LANG_KEYWORDS: dict[str, str] = {
# French
"fr": "fr",
"fra": "fr",
"french": "fr",
"francais": "fr",
"français": "fr",
"vf": "fr",
"vff": "fr",
"vostfr": "fr",
# English
"en": "en",
"eng": "en",
"english": "en",
# Spanish
"es": "es",
"spa": "es",
"spanish": "es",
"espanol": "es",
# German
"de": "de",
"deu": "de",
"ger": "de",
"german": "de",
# Italian
"it": "it",
"ita": "it",
"italian": "it",
# Portuguese
"pt": "pt",
"por": "pt",
"portuguese": "pt",
# Dutch
"nl": "nl",
"nld": "nl",
"dutch": "nl",
# Japanese
"ja": "ja",
"jpn": "ja",
"japanese": "ja",
}
# Tokens that indicate SDH / hearing-impaired
_SDH_TOKENS = {"sdh", "hi", "hearing", "impaired", "cc", "closedcaption"}
# Tokens that indicate forced subtitles
_FORCED_TOKENS = {"forced", "foreign"}
_TOKEN_SPLIT = re.compile(r"[\.\s_\-]+")
@dataclass
@@ -89,7 +48,7 @@ class SubtitleCandidate:
"""A subtitle file that passed the filter, ready to be placed."""
source_path: Path
language: str # ISO 639-1 code, e.g. "fr"
language: str # ISO 639-2/B code, e.g. "fre"
is_sdh: bool
is_forced: bool
extension: str # e.g. ".srt"
@@ -111,27 +70,44 @@ class SubtitleCandidate:
return ".".join(parts) + "." + ext
# Module-level KB instance — built lazily on first use to avoid loading YAML at import.
_KB: SubtitleKnowledgeBase | None = None
def _kb() -> SubtitleKnowledgeBase:
global _KB # noqa: PLW0603 — intentional lazy module-level cache
if _KB is None:
_KB = SubtitleKnowledgeBase()
return _KB
def _classify(path: Path) -> tuple[str | None, bool, bool]:
"""
Parse a subtitle filename and return (language_code, is_sdh, is_forced).
``language_code`` is the ISO 639-2/B canonical code (e.g. ``"fre"``).
Returns (None, False, False) if the language cannot be determined.
"""
stem = path.stem.lower()
# Split on dots, spaces, underscores, hyphens
import re
tokens = re.split(r"[\.\s_\-]+", stem)
tokens = _TOKEN_SPLIT.split(stem)
kb = _kb()
language: str | None = None
is_sdh = False
is_forced = False
for token in tokens:
if token in _LANG_KEYWORDS:
language = _LANG_KEYWORDS[token]
if token in _SDH_TOKENS:
if not token:
continue
if language is None:
lang = kb.language_for_token(token)
if lang is not None:
language = lang.code
continue
stype = kb.type_for_token(token)
if stype is SubtitleType.SDH:
is_sdh = True
if token in _FORCED_TOKENS:
elif stype is SubtitleType.FORCED:
is_forced = True
return language, is_sdh, is_forced
@@ -147,11 +123,15 @@ class SubtitleScanner:
# Each candidate has .source_path and .destination_name
"""
def __init__(self, languages: list[str], min_size_kb: int, keep_sdh: bool, keep_forced: bool):
self.languages = [l.lower() for l in languages]
def __init__(
self, languages: list[str], min_size_kb: int, keep_sdh: bool, keep_forced: bool
):
self.languages = [lang.lower() for lang in languages]
self.min_size_kb = min_size_kb
self.keep_sdh = keep_sdh
self.keep_forced = keep_forced
self._kb = _kb()
self._subtitle_extensions = {e.lower() for e in self._kb.known_extensions()}
def scan(self, video_path: Path) -> list[SubtitleCandidate]:
"""
@@ -173,14 +153,16 @@ class SubtitleScanner:
for path in sorted(directory.iterdir()):
if not path.is_file():
continue
if path.suffix.lower() not in SUBTITLE_EXTENSIONS:
if path.suffix.lower() not in self._subtitle_extensions:
continue
candidate = self._evaluate(path)
if candidate is not None:
candidates.append(candidate)
logger.info(f"SubtitleScanner: {len(candidates)} candidate(s) found for {video_path.name}")
logger.info(
f"SubtitleScanner: {len(candidates)} candidate(s) found for {video_path.name}"
)
return candidates
def _evaluate(self, path: Path) -> SubtitleCandidate | None:
@@ -188,7 +170,9 @@ class SubtitleScanner:
# Size filter
size_kb = path.stat().st_size / 1024
if size_kb < self.min_size_kb:
logger.debug(f"SubtitleScanner: skip {path.name} (too small: {size_kb:.1f} KB)")
logger.debug(
f"SubtitleScanner: skip {path.name} (too small: {size_kb:.1f} KB)"
)
return None
language, is_sdh, is_forced = _classify(path)
@@ -199,7 +183,9 @@ class SubtitleScanner:
return None
if language not in self.languages:
logger.debug(f"SubtitleScanner: skip {path.name} (language '{language}' not in prefs)")
logger.debug(
f"SubtitleScanner: skip {path.name} (language '{language}' not in prefs)"
)
return None
# SDH filter
-149
View File
@@ -1,149 +0,0 @@
"""Subtitle domain services - Business logic."""
import logging
from ..shared.value_objects import FilePath, ImdbId
from .entities import Subtitle
from .exceptions import SubtitleNotFound
from .repositories import SubtitleRepository
from .value_objects import Language, SubtitleFormat
logger = logging.getLogger(__name__)
class SubtitleService:
"""
Domain service for subtitle-related business logic.
This service is SHARED between movies and TV shows domains.
Both can use this service to manage subtitles.
"""
def __init__(self, repository: SubtitleRepository):
"""
Initialize subtitle service.
Args:
repository: Subtitle repository for persistence
"""
self.repository = repository
def add_subtitle(self, subtitle: Subtitle) -> None:
"""
Add a subtitle to the library.
Args:
subtitle: Subtitle entity to add
"""
self.repository.save(subtitle)
logger.info(
f"Added subtitle: {subtitle.language.value} for {subtitle.media_imdb_id}"
)
def find_subtitles_for_movie(
self, imdb_id: ImdbId, languages: list[Language] | None = None
) -> list[Subtitle]:
"""
Find subtitles for a movie.
Args:
imdb_id: IMDb ID of the movie
languages: Optional list of languages to filter by
Returns:
List of matching subtitles
"""
if languages:
all_subtitles = []
for lang in languages:
subs = self.repository.find_by_media(imdb_id, language=lang)
all_subtitles.extend(subs)
return all_subtitles
else:
return self.repository.find_by_media(imdb_id)
def find_subtitles_for_episode(
self,
imdb_id: ImdbId,
season: int,
episode: int,
languages: list[Language] | None = None,
) -> list[Subtitle]:
"""
Find subtitles for a TV show episode.
Args:
imdb_id: IMDb ID of the TV show
season: Season number
episode: Episode number
languages: Optional list of languages to filter by
Returns:
List of matching subtitles
"""
if languages:
all_subtitles = []
for lang in languages:
subs = self.repository.find_by_media(
imdb_id, language=lang, season=season, episode=episode
)
all_subtitles.extend(subs)
return all_subtitles
else:
return self.repository.find_by_media(
imdb_id, season=season, episode=episode
)
def remove_subtitle(self, subtitle: Subtitle) -> None:
"""
Remove a subtitle from the library.
Args:
subtitle: Subtitle to remove
Raises:
SubtitleNotFound: If subtitle not found
"""
if not self.repository.delete(subtitle):
raise SubtitleNotFound(f"Subtitle not found: {subtitle}")
logger.info(f"Removed subtitle: {subtitle}")
def detect_format_from_file(self, file_path: FilePath) -> SubtitleFormat:
"""
Detect subtitle format from file extension.
Args:
file_path: Path to subtitle file
Returns:
Detected subtitle format
"""
extension = file_path.value.suffix
return SubtitleFormat.from_extension(extension)
def validate_subtitle_file(self, file_path: FilePath) -> bool:
"""
Validate that a file is a valid subtitle file.
Args:
file_path: Path to the file
Returns:
True if valid subtitle file, False otherwise
"""
if not file_path.exists():
logger.warning(f"File does not exist: {file_path}")
return False
if not file_path.is_file():
logger.warning(f"Path is not a file: {file_path}")
return False
# Check file extension
try:
self.detect_format_from_file(file_path)
return True
except Exception as e:
logger.warning(f"Invalid subtitle format: {e}")
return False
+96 -35
View File
@@ -1,13 +1,13 @@
"""SubtitleIdentifier — finds and classifies all subtitle tracks for a video file."""
import json
import logging
import re
import subprocess
import json
from pathlib import Path
from ...shared.value_objects import ImdbId
from ..entities import MediaSubtitleMetadata, SubtitleTrack
from ..entities import MediaSubtitleMetadata, SubtitleCandidate
from ..knowledge.base import SubtitleKnowledgeBase
from ..value_objects import ScanStrategy, SubtitlePattern, SubtitleType
@@ -15,10 +15,28 @@ logger = logging.getLogger(__name__)
def _tokenize(name: str) -> list[str]:
"""Split a filename stem into lowercase tokens."""
"""Split a filename stem into lowercase tokens, stripping parentheses."""
# Strip parenthesized qualifiers like (simplified), (canada), (brazil)
name = re.sub(r"\([^)]*\)", "", name)
return [t.lower() for t in re.split(r"[\.\s_\-]+", name) if t]
def _tokenize_suffix(stem: str, episode_stem: str) -> list[str]:
"""
For episode_subfolder pattern: the filename is {episode_stem}.{lang_tokens}.
Return only the tokens that come after the episode stem portion.
Falls back to full tokenization if the stem doesn't start with episode_stem.
"""
stem_lower = stem.lower()
prefix = episode_stem.lower()
if stem_lower.startswith(prefix):
suffix = stem[len(prefix) :]
tokens = _tokenize(suffix)
if tokens:
return tokens
return _tokenize(stem)
def _count_entries(path: Path) -> int:
"""Return the entry count of an SRT file by finding the last cue number."""
try:
@@ -73,23 +91,36 @@ class SubtitleIdentifier:
# Embedded tracks — ffprobe
# ------------------------------------------------------------------
def _scan_embedded(self, video_path: Path) -> list[SubtitleTrack]:
def _scan_embedded(self, video_path: Path) -> list[SubtitleCandidate]:
if not video_path.exists():
return []
try:
result = subprocess.run(
[
"ffprobe", "-v", "quiet",
"-print_format", "json",
"ffprobe",
"-v",
"quiet",
"-print_format",
"json",
"-show_streams",
"-select_streams", "s",
"-select_streams",
"s",
str(video_path),
],
capture_output=True, text=True, timeout=30,
capture_output=True,
text=True,
timeout=30,
check=False,
)
data = json.loads(result.stdout)
except (subprocess.TimeoutExpired, json.JSONDecodeError, FileNotFoundError) as e:
logger.debug(f"SubtitleIdentifier: ffprobe failed for {video_path.name}: {e}")
except (
subprocess.TimeoutExpired,
json.JSONDecodeError,
FileNotFoundError,
) as e:
logger.debug(
f"SubtitleIdentifier: ffprobe failed for {video_path.name}: {e}"
)
return []
tracks = []
@@ -97,7 +128,6 @@ class SubtitleIdentifier:
tags = stream.get("tags", {})
disposition = stream.get("disposition", {})
lang_code = tags.get("language", "")
title = tags.get("title", "")
lang = self.kb.language_for_token(lang_code) if lang_code else None
@@ -108,39 +138,50 @@ class SubtitleIdentifier:
else:
stype = SubtitleType.STANDARD
tracks.append(SubtitleTrack(
tracks.append(
SubtitleCandidate(
language=lang,
format=None,
subtitle_type=stype,
is_embedded=True,
raw_tokens=[lang_code] if lang_code else [],
))
)
)
logger.debug(f"SubtitleIdentifier: {len(tracks)} embedded track(s) in {video_path.name}")
logger.debug(
f"SubtitleIdentifier: {len(tracks)} embedded track(s) in {video_path.name}"
)
return tracks
# ------------------------------------------------------------------
# External tracks — filesystem scan per pattern strategy
# ------------------------------------------------------------------
def _scan_external(self, video_path: Path, pattern: SubtitlePattern) -> list[SubtitleTrack]:
def _scan_external(
self, video_path: Path, pattern: SubtitlePattern
) -> list[SubtitleCandidate]:
strategy = pattern.scan_strategy
episode_stem: str | None = None
if strategy == ScanStrategy.ADJACENT:
candidates = self._find_adjacent(video_path)
elif strategy == ScanStrategy.FLAT:
candidates = self._find_flat(video_path, pattern.root_folder or "Subs")
elif strategy == ScanStrategy.EPISODE_SUBFOLDER:
candidates = self._find_episode_subfolder(video_path, pattern.root_folder or "Subs")
candidates, episode_stem = self._find_episode_subfolder(
video_path, pattern.root_folder or "Subs"
)
else:
return []
return self._classify_files(candidates, pattern)
return self._classify_files(candidates, pattern, episode_stem=episode_stem)
def _find_adjacent(self, video_path: Path) -> list[Path]:
return [
p for p in sorted(video_path.parent.iterdir())
if p.is_file() and p.suffix.lower() in self.kb.known_extensions()
p
for p in sorted(video_path.parent.iterdir())
if p.is_file()
and p.suffix.lower() in self.kb.known_extensions()
and p.stem != video_path.stem
]
@@ -152,17 +193,22 @@ class SubtitleIdentifier:
if not subs_dir.is_dir():
return []
return [
p for p in sorted(subs_dir.iterdir())
p
for p in sorted(subs_dir.iterdir())
if p.is_file() and p.suffix.lower() in self.kb.known_extensions()
]
def _find_episode_subfolder(self, video_path: Path, root_folder: str) -> list[Path]:
def _find_episode_subfolder(
self, video_path: Path, root_folder: str
) -> tuple[list[Path], str]:
"""
Look for Subs/{episode_stem}/*.srt
Checks two locations:
1. Adjacent to the video: video_path.parent / root_folder / video_path.stem
2. Release root (one level up): video_path.parent.parent / root_folder / video_path.stem
Returns (files, episode_stem) so the classifier can strip the prefix.
"""
episode_stem = video_path.stem
candidates_dirs = [
@@ -172,22 +218,30 @@ class SubtitleIdentifier:
for subs_dir in candidates_dirs:
if subs_dir.is_dir():
files = [
p for p in sorted(subs_dir.iterdir())
p
for p in sorted(subs_dir.iterdir())
if p.is_file() and p.suffix.lower() in self.kb.known_extensions()
]
if files:
logger.debug(f"SubtitleIdentifier: found {len(files)} file(s) in {subs_dir}")
return files
return []
logger.debug(
f"SubtitleIdentifier: found {len(files)} file(s) in {subs_dir}"
)
return files, episode_stem
return [], episode_stem
# ------------------------------------------------------------------
# Classification
# ------------------------------------------------------------------
def _classify_files(self, paths: list[Path], pattern: SubtitlePattern) -> list[SubtitleTrack]:
def _classify_files(
self,
paths: list[Path],
pattern: SubtitlePattern,
episode_stem: str | None = None,
) -> list[SubtitleCandidate]:
tracks = []
for path in paths:
track = self._classify_single(path)
track = self._classify_single(path, episode_stem=episode_stem)
tracks.append(track)
# Post-process: if multiple tracks share same language but type is ambiguous,
@@ -197,9 +251,15 @@ class SubtitleIdentifier:
return tracks
def _classify_single(self, path: Path) -> SubtitleTrack:
def _classify_single(
self, path: Path, episode_stem: str | None = None
) -> SubtitleCandidate:
fmt = self.kb.format_for_extension(path.suffix)
tokens = _tokenize(path.stem)
tokens = (
_tokenize_suffix(path.stem, episode_stem)
if episode_stem
else _tokenize(path.stem)
)
language = None
subtitle_type = SubtitleType.UNKNOWN
@@ -230,7 +290,7 @@ class SubtitleIdentifier:
size_kb = path.stat().st_size / 1024 if path.exists() else None
entry_count = _count_entries(path) if path.exists() else None
return SubtitleTrack(
return SubtitleCandidate(
language=language,
format=fmt,
subtitle_type=subtitle_type,
@@ -242,7 +302,9 @@ class SubtitleIdentifier:
raw_tokens=tokens,
)
def _disambiguate_by_size(self, tracks: list[SubtitleTrack]) -> list[SubtitleTrack]:
def _disambiguate_by_size(
self, tracks: list[SubtitleCandidate]
) -> list[SubtitleCandidate]:
"""
When multiple tracks share the same language and type is UNKNOWN/STANDARD,
the one with the most entries (lines) is SDH, the smallest is FORCED if
@@ -250,16 +312,15 @@ class SubtitleIdentifier:
Only applied when type_detection = size_and_count.
"""
from itertools import groupby
# Group by language code
lang_groups: dict[str, list[SubtitleTrack]] = {}
lang_groups: dict[str, list[SubtitleCandidate]] = {}
for track in tracks:
key = track.language.code if track.language else "__unknown__"
lang_groups.setdefault(key, []).append(track)
result = []
for lang_code, group in lang_groups.items():
for group in lang_groups.values():
if len(group) == 1:
result.extend(group)
continue
@@ -282,6 +343,6 @@ class SubtitleIdentifier:
return result
def _set_type(self, track: SubtitleTrack, stype: SubtitleType) -> None:
def _set_type(self, track: SubtitleCandidate, stype: SubtitleType) -> None:
"""Mutate track type in-place."""
track.subtitle_type = stype
+15 -13
View File
@@ -2,15 +2,15 @@
import logging
from ..entities import SubtitleTrack
from ..value_objects import SubtitleMatchingRules, SubtitleType
from ..entities import SubtitleCandidate
from ..value_objects import SubtitleMatchingRules
logger = logging.getLogger(__name__)
class SubtitleMatcher:
"""
Filters a list of SubtitleTrack against effective SubtitleMatchingRules.
Filters a list of SubtitleCandidate against effective SubtitleMatchingRules.
Returns matched tracks (pass all filters, confidence >= min_confidence)
and unresolved tracks (need user clarification).
@@ -21,14 +21,14 @@ class SubtitleMatcher:
def match(
self,
tracks: list[SubtitleTrack],
tracks: list[SubtitleCandidate],
rules: SubtitleMatchingRules,
) -> tuple[list[SubtitleTrack], list[SubtitleTrack]]:
) -> tuple[list[SubtitleCandidate], list[SubtitleCandidate]]:
"""
Returns (matched, unresolved).
"""
matched: list[SubtitleTrack] = []
unresolved: list[SubtitleTrack] = []
matched: list[SubtitleCandidate] = []
unresolved: list[SubtitleCandidate] = []
for track in tracks:
if track.is_embedded:
@@ -50,7 +50,9 @@ class SubtitleMatcher:
)
return matched, unresolved
def _passes_filters(self, track: SubtitleTrack, rules: SubtitleMatchingRules) -> bool:
def _passes_filters(
self, track: SubtitleCandidate, rules: SubtitleMatchingRules
) -> bool:
# Language filter
if rules.preferred_languages:
if not track.language:
@@ -74,14 +76,14 @@ class SubtitleMatcher:
def _resolve_conflicts(
self,
tracks: list[SubtitleTrack],
tracks: list[SubtitleCandidate],
rules: SubtitleMatchingRules,
) -> list[SubtitleTrack]:
) -> list[SubtitleCandidate]:
"""
When multiple tracks have same language + type, keep only the best one
according to format_priority. If no format_priority applies, keep the first.
"""
seen: dict[tuple, SubtitleTrack] = {}
seen: dict[tuple, SubtitleCandidate] = {}
for track in tracks:
lang = track.language.code if track.language else None
@@ -104,8 +106,8 @@ class SubtitleMatcher:
def _prefer(
self,
candidate: SubtitleTrack,
existing: SubtitleTrack,
candidate: SubtitleCandidate,
existing: SubtitleCandidate,
format_priority: list[str],
) -> bool:
"""Return True if candidate is preferable to existing."""
@@ -49,13 +49,20 @@ class PatternDetector:
try:
result = subprocess.run(
[
"ffprobe", "-v", "quiet",
"-print_format", "json",
"ffprobe",
"-v",
"quiet",
"-print_format",
"json",
"-show_streams",
"-select_streams", "s",
"-select_streams",
"s",
str(video_path),
],
capture_output=True, text=True, timeout=30,
capture_output=True,
text=True,
timeout=30,
check=False,
)
data = json.loads(result.stdout)
return len(data.get("streams", [])) > 0
@@ -87,15 +94,22 @@ class PatternDetector:
# Is it flat or episode_subfolder?
children = list(subs_candidate.iterdir())
sub_files = [c for c in children if c.is_file() and c.suffix.lower() in known_exts]
sub_files = [
c
for c in children
if c.is_file() and c.suffix.lower() in known_exts
]
sub_dirs = [c for c in children if c.is_dir()]
if sub_dirs and not sub_files:
findings["subs_strategy"] = "episode_subfolder"
# Count files in a sample subfolder
sample_sub = sub_dirs[0]
sample_files = [f for f in sample_sub.iterdir()
if f.is_file() and f.suffix.lower() in known_exts]
sample_files = [
f
for f in sample_sub.iterdir()
if f.is_file() and f.suffix.lower() in known_exts
]
findings["files_per_episode"] = len(sample_files)
# Check naming conventions
for f in sample_files:
@@ -103,22 +117,27 @@ class PatternDetector:
parts = stem.split("_")
if parts[0].isdigit():
findings["has_numeric_prefix"] = True
if any(self.kb.is_known_lang_token(t.lower())
for t in stem.replace("_", ".").split(".")):
if any(
self.kb.is_known_lang_token(t.lower())
for t in stem.replace("_", ".").split(".")
):
findings["has_lang_tokens"] = True
else:
findings["subs_strategy"] = "flat"
findings["files_per_episode"] = len(sub_files)
for f in sub_files:
if any(self.kb.is_known_lang_token(t.lower())
for t in f.stem.replace("_", ".").split(".")):
if any(
self.kb.is_known_lang_token(t.lower())
for t in f.stem.replace("_", ".").split(".")
):
findings["has_lang_tokens"] = True
break
# Check adjacent subs (next to the video)
if not findings["has_subs_folder"]:
adjacent = [
p for p in sample_video.parent.iterdir()
p
for p in sample_video.parent.iterdir()
if p.is_file() and p.suffix.lower() in known_exts
]
if adjacent:
@@ -157,7 +176,9 @@ class PatternDetector:
total += 1
if findings.get("has_embedded"):
score += 1.0
if not findings.get("has_subs_folder") and not findings.get("adjacent_subs"):
if not findings.get("has_subs_folder") and not findings.get(
"adjacent_subs"
):
score += 0.5
total += 0.5
+31 -8
View File
@@ -5,11 +5,32 @@ import os
from dataclasses import dataclass
from pathlib import Path
from ..entities import SubtitleTrack
from ..entities import SubtitleCandidate
from ..value_objects import SubtitleType
logger = logging.getLogger(__name__)
def _build_dest_name(track: SubtitleCandidate, video_stem: str) -> str:
"""
Build the destination filename for a subtitle track.
Format: {video_stem}.{lang}.{ext}
{video_stem}.{lang}.sdh.{ext}
{video_stem}.{lang}.forced.{ext}
"""
if not track.language or not track.format:
raise ValueError("Cannot compute destination name: language or format missing")
ext = track.format.extensions[0].lstrip(".")
parts = [video_stem, track.language.code]
if track.subtitle_type == SubtitleType.SDH:
parts.append("sdh")
elif track.subtitle_type == SubtitleType.FORCED:
parts.append("forced")
return ".".join(parts) + "." + ext
@dataclass
class PlacedTrack:
source: Path
@@ -20,7 +41,7 @@ class PlacedTrack:
@dataclass
class PlaceResult:
placed: list[PlacedTrack]
skipped: list[tuple[SubtitleTrack, str]] # (track, reason)
skipped: list[tuple[SubtitleCandidate, str]] # (track, reason)
@property
def placed_count(self) -> int:
@@ -33,7 +54,7 @@ class PlaceResult:
class SubtitlePlacer:
"""
Hard-links matched SubtitleTrack files next to a destination video.
Hard-links matched SubtitleCandidate files next to a destination video.
Uses the same hard-link strategy as FileManager.copy_file:
instant, no data duplication, qBittorrent keeps seeding.
@@ -43,11 +64,11 @@ class SubtitlePlacer:
def place(
self,
tracks: list[SubtitleTrack],
tracks: list[SubtitleCandidate],
destination_video: Path,
) -> PlaceResult:
placed: list[PlacedTrack] = []
skipped: list[tuple[SubtitleTrack, str]] = []
skipped: list[tuple[SubtitleCandidate, str]] = []
dest_dir = destination_video.parent
@@ -62,7 +83,7 @@ class SubtitlePlacer:
continue
try:
dest_name = track.destination_name
dest_name = _build_dest_name(track, destination_video.stem)
except ValueError as e:
skipped.append((track, str(e)))
continue
@@ -76,11 +97,13 @@ class SubtitlePlacer:
try:
os.link(track.file_path, dest_path)
placed.append(PlacedTrack(
placed.append(
PlacedTrack(
source=track.file_path,
destination=dest_path,
filename=dest_name,
))
)
)
logger.info(f"SubtitlePlacer: placed {dest_name}")
except OSError as e:
logger.warning(f"SubtitlePlacer: failed to place {dest_name}: {e}")
+3 -3
View File
@@ -1,9 +1,9 @@
"""Subtitle service utilities."""
from ..entities import SubtitleTrack
from ..entities import SubtitleCandidate
def available_subtitles(tracks: list[SubtitleTrack]) -> list[SubtitleTrack]:
def available_subtitles(tracks: list[SubtitleCandidate]) -> list[SubtitleCandidate]:
"""
Return the distinct subtitle tracks available, deduped by (language, type).
@@ -11,7 +11,7 @@ def available_subtitles(tracks: list[SubtitleTrack]) -> list[SubtitleTrack]:
preferences e.g. eng, eng.sdh, fra all show up as separate entries.
"""
seen: set[tuple] = set()
result: list[SubtitleTrack] = []
result: list[SubtitleCandidate] = []
for track in tracks:
lang = track.language.code if track.language else None
key = (lang, track.subtitle_type)
-2
View File
@@ -2,8 +2,6 @@
from dataclasses import dataclass, field
from enum import Enum
from pathlib import Path
from typing import Any
class ScanStrategy(Enum):
+7 -3
View File
@@ -2,18 +2,22 @@
from .entities import Episode, Season, TVShow
from .exceptions import InvalidEpisode, SeasonNotFound, TVShowNotFound
from .services import TVShowService
from .value_objects import EpisodeNumber, SeasonNumber, ShowStatus
from .value_objects import (
CollectionStatus,
EpisodeNumber,
SeasonNumber,
ShowStatus,
)
__all__ = [
"TVShow",
"Season",
"Episode",
"ShowStatus",
"CollectionStatus",
"SeasonNumber",
"EpisodeNumber",
"TVShowNotFound",
"InvalidEpisode",
"SeasonNotFound",
"TVShowService",
]
+366 -117
View File
@@ -1,120 +1,270 @@
"""TV Show domain entities."""
"""TV Show domain entities.
This module implements the TVShow aggregate following DDD principles.
Aggregate ownership::
TVShow aggregate root (the repo returns this)
seasons: dict[SeasonNumber, Season]
Season
episodes: dict[EpisodeNumber, Episode]
Episode file metadata + audio/subtitle tracks
Rules:
* ``TVShow`` is the aggregate **root** the only entity exposed by the
repository.
* ``Season`` is owned by TVShow. ``Episode`` is owned by Season.
* Children do not back-reference the root (no ``show_imdb_id`` on
Season/Episode): they are only ever reached *through* TVShow.
* Mutation invariants are enforced through aggregate-root methods such as
``TVShow.add_episode()`` never reach into ``show.seasons[...].episodes``
to mutate without going through the root, otherwise invariants are not
guaranteed.
"""
from __future__ import annotations
import re
from dataclasses import dataclass, field
from ..shared.value_objects import FilePath, FileSize, ImdbId
from .value_objects import EpisodeNumber, SeasonNumber, ShowStatus
from ..shared.media import AudioTrack, SubtitleTrack, track_lang_matches
from ..shared.value_objects import (
FilePath,
FileSize,
ImdbId,
Language,
to_dot_folder_name,
)
from .value_objects import (
CollectionStatus,
EpisodeNumber,
SeasonNumber,
ShowStatus,
)
# ════════════════════════════════════════════════════════════════════════════
# Episode
# ════════════════════════════════════════════════════════════════════════════
@dataclass
class TVShow:
class Episode:
"""
TV Show entity representing a TV show in the media library.
A single episode of a TV show leaf of the TVShow aggregate.
This is the main aggregate root for the TV shows domain.
Migrated from agent/models/tv_show.py
Carries the file metadata (path, size) and the discovered tracks
(audio + subtitle). Track lists are populated by the ffprobe + subtitle
scan pipeline; they may be empty when the episode is known but not yet
scanned, or when no file is downloaded yet.
"""
imdb_id: ImdbId
season_number: SeasonNumber
episode_number: EpisodeNumber
title: str
seasons_count: int
status: ShowStatus
tmdb_id: int | None = None
file_path: FilePath | None = None
file_size: FileSize | None = None
audio_tracks: list[AudioTrack] = field(default_factory=list)
subtitle_tracks: list[SubtitleTrack] = field(default_factory=list)
def __post_init__(self):
"""Validate TV show entity."""
# Ensure ImdbId is actually an ImdbId instance
if not isinstance(self.imdb_id, ImdbId):
if isinstance(self.imdb_id, str):
object.__setattr__(self, "imdb_id", ImdbId(self.imdb_id))
else:
raise ValueError(
f"imdb_id must be ImdbId or str, got {type(self.imdb_id)}"
)
def __post_init__(self) -> None:
# Coerce numbers if raw ints were passed
if not isinstance(self.season_number, SeasonNumber):
if isinstance(self.season_number, int):
self.season_number = SeasonNumber(self.season_number)
if not isinstance(self.episode_number, EpisodeNumber):
if isinstance(self.episode_number, int):
self.episode_number = EpisodeNumber(self.episode_number)
# Ensure ShowStatus is actually a ShowStatus instance
if not isinstance(self.status, ShowStatus):
if isinstance(self.status, str):
object.__setattr__(self, "status", ShowStatus.from_string(self.status))
else:
raise ValueError(
f"status must be ShowStatus or str, got {type(self.status)}"
)
# ── File presence ──────────────────────────────────────────────────────
# Validate seasons_count
if not isinstance(self.seasons_count, int) or self.seasons_count < 0:
raise ValueError(
f"seasons_count must be a non-negative integer, got {self.seasons_count}"
)
def has_file(self) -> bool:
"""True if a file path is set and the file actually exists on disk."""
return self.file_path is not None and self.file_path.exists()
def is_ongoing(self) -> bool:
"""Check if the show is still ongoing."""
return self.status == ShowStatus.ONGOING
def is_downloaded(self) -> bool:
"""Alias of ``has_file()`` — reads better in collection-status contexts."""
return self.has_file()
def is_ended(self) -> bool:
"""Check if the show has ended."""
return self.status == ShowStatus.ENDED
# ── Audio helpers ──────────────────────────────────────────────────────
def get_folder_name(self) -> str:
"""
Get the folder name for this TV show.
def has_audio_in(self, lang: str | Language) -> bool:
"""True if at least one audio track is in the given language."""
return any(track_lang_matches(t.language, lang) for t in self.audio_tracks)
Format: "Title"
Example: "Breaking.Bad"
"""
# Remove special characters and replace spaces with dots
cleaned = re.sub(r"[^\w\s\.\-]", "", self.title)
return cleaned.replace(" ", ".")
def audio_languages(self) -> list[str]:
"""Unique audio languages across all tracks, in track order."""
seen: set[str] = set()
result: list[str] = []
for t in self.audio_tracks:
if t.language and t.language not in seen:
seen.add(t.language)
result.append(t.language)
return result
# ── Subtitle helpers ───────────────────────────────────────────────────
def has_subtitles_in(self, lang: str | Language) -> bool:
"""True if at least one subtitle track is in the given language."""
return any(track_lang_matches(t.language, lang) for t in self.subtitle_tracks)
def has_forced_subs(self) -> bool:
"""True if at least one subtitle track is flagged as forced."""
return any(t.is_forced for t in self.subtitle_tracks)
def subtitle_languages(self) -> list[str]:
"""Unique subtitle languages across all tracks, in track order."""
seen: set[str] = set()
result: list[str] = []
for t in self.subtitle_tracks:
if t.language and t.language not in seen:
seen.add(t.language)
result.append(t.language)
return result
# ── Naming ─────────────────────────────────────────────────────────────
def get_filename(self) -> str:
"""Suggested filename: ``S01E05.Pilot``."""
season_str = f"S{self.season_number.value:02d}"
episode_str = f"E{self.episode_number.value:02d}"
clean_title = re.sub(r"[^\w\s\-]", "", self.title)
clean_title = clean_title.replace(" ", ".")
return f"{season_str}{episode_str}.{clean_title}"
def __str__(self) -> str:
return f"{self.title} ({self.status.value}, {self.seasons_count} seasons)"
return f"S{self.season_number.value:02d}E{self.episode_number.value:02d} - {self.title}"
def __repr__(self) -> str:
return f"TVShow(imdb_id={self.imdb_id}, title='{self.title}')"
return (
f"Episode(S{self.season_number.value:02d}E{self.episode_number.value:02d})"
)
# ════════════════════════════════════════════════════════════════════════════
# Season
# ════════════════════════════════════════════════════════════════════════════
@dataclass
class Season:
"""
Season entity representing a season of a TV show.
A season of a TV show owned by ``TVShow``.
Owns its episodes via the ``episodes`` dict keyed by ``EpisodeNumber``.
Two TMDB-sourced counts shape the collection logic:
* ``expected_episodes`` total episodes planned for the season
(``None`` if unknown).
* ``aired_episodes`` episodes **already aired** as of the latest TMDB
refresh. ``None`` falls back to ``expected_episodes`` (best-effort).
The split matters: ``is_complete()`` checks owned against aired, so a season
in the middle of broadcasting can be "complete" today and become "partial"
later when new episodes air that is correct behavior.
"""
show_imdb_id: ImdbId
season_number: SeasonNumber
episode_count: int
episodes: dict[EpisodeNumber, Episode] = field(default_factory=dict)
expected_episodes: int | None = None
aired_episodes: int | None = None
name: str | None = None
def __post_init__(self):
"""Validate season entity."""
# Ensure ImdbId is actually an ImdbId instance
if not isinstance(self.show_imdb_id, ImdbId):
if isinstance(self.show_imdb_id, str):
object.__setattr__(self, "show_imdb_id", ImdbId(self.show_imdb_id))
# Ensure SeasonNumber is actually a SeasonNumber instance
def __post_init__(self) -> None:
if not isinstance(self.season_number, SeasonNumber):
if isinstance(self.season_number, int):
object.__setattr__(
self, "season_number", SeasonNumber(self.season_number)
self.season_number = SeasonNumber(self.season_number)
if self.expected_episodes is not None and self.expected_episodes < 0:
raise ValueError(
f"expected_episodes must be >= 0, got {self.expected_episodes}"
)
if self.aired_episodes is not None and self.aired_episodes < 0:
raise ValueError(f"aired_episodes must be >= 0, got {self.aired_episodes}")
if (
self.expected_episodes is not None
and self.aired_episodes is not None
and self.aired_episodes > self.expected_episodes
):
raise ValueError(
f"aired_episodes ({self.aired_episodes}) cannot exceed "
f"expected_episodes ({self.expected_episodes})"
)
# Validate episode_count
if not isinstance(self.episode_count, int) or self.episode_count < 0:
raise ValueError(
f"episode_count must be a non-negative integer, got {self.episode_count}"
# ── Properties ─────────────────────────────────────────────────────────
@property
def episode_count(self) -> int:
"""Number of episodes currently owned in this season."""
return len(self.episodes)
# ── Collection state ───────────────────────────────────────────────────
def _effective_aired(self) -> int | None:
"""``aired_episodes`` if set, else fall back to ``expected_episodes``."""
return (
self.aired_episodes
if self.aired_episodes is not None
else self.expected_episodes
)
def is_complete(self) -> bool:
"""
True if every aired episode is owned.
Returns False (conservative) when the aired count is unknown without
knowing how many episodes have aired we cannot claim completeness.
"""
aired = self._effective_aired()
if aired is None:
return False
if aired == 0:
# No episode has aired yet → trivially "complete"
return True
return len(self.episodes) >= aired
def is_fully_aired(self) -> bool:
"""True if all planned episodes have already aired."""
if self.expected_episodes is None or self.aired_episodes is None:
return False
return self.aired_episodes >= self.expected_episodes
def missing_episodes(self) -> list[EpisodeNumber]:
"""
List of episode numbers that have aired but are not owned.
Episodes beyond ``aired_episodes`` are **not** considered missing
(they have not aired yet). When the aired count is unknown, returns
an empty list we cannot reason about gaps without a target.
"""
aired = self._effective_aired()
if aired is None or aired <= 0:
return []
present = {ep.value for ep in self.episodes}
return [EpisodeNumber(n) for n in range(1, aired + 1) if n not in present]
# ── Mutation (called through the aggregate root) ───────────────────────
def add_episode(self, episode: Episode) -> None:
"""
Insert an episode into this season. Replaces any episode with the same
number callers wishing to detect conflicts should check beforehand.
"""
if episode.season_number != self.season_number:
raise ValueError(
f"Episode season ({episode.season_number}) does not match season "
f"({self.season_number})"
)
self.episodes[episode.episode_number] = episode
# ── Naming ─────────────────────────────────────────────────────────────
def is_special(self) -> bool:
"""Check if this is the specials season."""
return self.season_number.is_special()
def get_folder_name(self) -> str:
"""
Get the folder name for this season.
Format: "Season 01" or "Specials" for season 0
"""
"""``Season 01`` or ``Specials`` for season 0."""
if self.is_special():
return "Specials"
return f"Season {self.season_number.value:02d}"
@@ -125,69 +275,168 @@ class Season:
return f"Season {self.season_number.value}"
def __repr__(self) -> str:
return f"Season(show={self.show_imdb_id}, number={self.season_number.value})"
return (
f"Season(number={self.season_number.value}, episodes={len(self.episodes)})"
)
# ════════════════════════════════════════════════════════════════════════════
# TVShow — aggregate root
# ════════════════════════════════════════════════════════════════════════════
@dataclass
class Episode:
class TVShow:
"""
Episode entity representing an episode of a TV show.
Aggregate root for the TV shows domain.
Owns its seasons via the ``seasons`` dict keyed by ``SeasonNumber``.
All mutations (adding episodes, creating seasons) MUST go through the
methods on this class that is how invariants are preserved.
Two axes describe the show, kept deliberately orthogonal:
* ``status`` (``ShowStatus``) production state (TMDB-sourced).
* ``collection_status()`` what the user owns vs what has aired today.
A third axis (upcoming/scheduled) will be added later as a separate flag
when scheduling support is introduced; for now we make no claim about
future episodes.
"""
show_imdb_id: ImdbId
season_number: SeasonNumber
episode_number: EpisodeNumber
imdb_id: ImdbId
title: str
file_path: FilePath | None = None
file_size: FileSize | None = None
status: ShowStatus
seasons: dict[SeasonNumber, Season] = field(default_factory=dict)
expected_seasons: int | None = None
tmdb_id: int | None = None
def __post_init__(self):
"""Validate episode entity."""
# Ensure ImdbId is actually an ImdbId instance
if not isinstance(self.show_imdb_id, ImdbId):
if isinstance(self.show_imdb_id, str):
object.__setattr__(self, "show_imdb_id", ImdbId(self.show_imdb_id))
# Ensure SeasonNumber is actually a SeasonNumber instance
if not isinstance(self.season_number, SeasonNumber):
if isinstance(self.season_number, int):
object.__setattr__(
self, "season_number", SeasonNumber(self.season_number)
def __post_init__(self) -> None:
if not isinstance(self.imdb_id, ImdbId):
if isinstance(self.imdb_id, str):
self.imdb_id = ImdbId(self.imdb_id)
else:
raise ValueError(
f"imdb_id must be ImdbId or str, got {type(self.imdb_id)}"
)
# Ensure EpisodeNumber is actually an EpisodeNumber instance
if not isinstance(self.episode_number, EpisodeNumber):
if isinstance(self.episode_number, int):
object.__setattr__(
self, "episode_number", EpisodeNumber(self.episode_number)
if not isinstance(self.status, ShowStatus):
if isinstance(self.status, str):
self.status = ShowStatus.from_string(self.status)
else:
raise ValueError(
f"status must be ShowStatus or str, got {type(self.status)}"
)
def has_file(self) -> bool:
"""Check if the episode has an associated file."""
return self.file_path is not None and self.file_path.exists()
if self.expected_seasons is not None and self.expected_seasons < 0:
raise ValueError(
f"expected_seasons must be >= 0, got {self.expected_seasons}"
)
def is_downloaded(self) -> bool:
"""Check if the episode is downloaded."""
return self.has_file()
# ── Production-state queries ───────────────────────────────────────────
def get_filename(self) -> str:
def is_ongoing(self) -> bool:
return self.status == ShowStatus.ONGOING
def is_ended(self) -> bool:
return self.status == ShowStatus.ENDED
# ── Properties ─────────────────────────────────────────────────────────
@property
def seasons_count(self) -> int:
"""Number of seasons currently owned (any episode count, even 0)."""
return len(self.seasons)
@property
def episode_count(self) -> int:
"""Total episodes owned across all seasons."""
return sum(s.episode_count for s in self.seasons.values())
# ── Mutation — the sole entry point for adding content ─────────────────
def add_episode(self, episode: Episode) -> None:
"""
Get the suggested filename for this episode.
Add an episode to the appropriate season, creating the season if needed.
Format: "S01E01 - Episode Title.ext"
Example: "S01E05 - Pilot.mkv"
This is the **only** sanctioned way to add content to the aggregate
it preserves the invariant that an episode is always reachable through
``show.seasons[s].episodes[e]``.
"""
season_str = f"S{self.season_number.value:02d}"
episode_str = f"E{self.episode_number.value:02d}"
season = self.seasons.get(episode.season_number)
if season is None:
season = Season(season_number=episode.season_number)
self.seasons[episode.season_number] = season
season.add_episode(episode)
# Clean title for filename
clean_title = re.sub(r"[^\w\s\-]", "", self.title)
clean_title = clean_title.replace(" ", ".")
def add_season(self, season: Season) -> None:
"""
Attach a (possibly already populated) Season to the show.
return f"{season_str}{episode_str}.{clean_title}"
Replaces any existing season with the same number.
"""
self.seasons[season.season_number] = season
# ── Collection state ───────────────────────────────────────────────────
def collection_status(self) -> CollectionStatus:
"""
High-level state of the user's collection for this show.
* ``EMPTY`` no episode owned
* ``COMPLETE`` every season is complete relative to its aired count
* ``PARTIAL`` at least one aired episode is missing
Seasons with an unknown aired count are treated conservatively: if no
season has any episode, the show is EMPTY; otherwise the unknown
seasons cannot prove completeness, so the show is PARTIAL.
"""
if self.episode_count == 0:
return CollectionStatus.EMPTY
# Check completeness across all seasons we know about
for season in self.seasons.values():
if not season.is_complete():
return CollectionStatus.PARTIAL
# We also need to consider whether seasons themselves are missing.
# If expected_seasons is known and we have fewer seasons than expected,
# the missing seasons may have aired episodes → cannot claim COMPLETE.
if (
self.expected_seasons is not None
and len(self.seasons) < self.expected_seasons
):
return CollectionStatus.PARTIAL
return CollectionStatus.COMPLETE
def is_complete_series(self) -> bool:
"""
True if the show is finished (ENDED) **and** the collection is complete.
This is the strongest "I own the entire series, no more to come" claim
we can make today, before scheduling/upcoming-episode awareness lands.
"""
return self.is_ended() and self.collection_status() == CollectionStatus.COMPLETE
def missing_episodes(self) -> list[tuple[SeasonNumber, EpisodeNumber]]:
"""All aired-but-not-owned ``(season, episode)`` pairs across the show."""
result: list[tuple[SeasonNumber, EpisodeNumber]] = []
for season_number, season in sorted(
self.seasons.items(), key=lambda kv: kv[0].value
):
for ep_number in season.missing_episodes():
result.append((season_number, ep_number))
return result
# ── Naming ─────────────────────────────────────────────────────────────
def get_folder_name(self) -> str:
"""Dot-separated folder name (e.g. ``Breaking.Bad``)."""
return to_dot_folder_name(self.title)
def __str__(self) -> str:
return f"S{self.season_number.value:02d}E{self.episode_number.value:02d} - {self.title}"
return f"{self.title} ({self.status.value}, {self.seasons_count} seasons)"
def __repr__(self) -> str:
return f"Episode(show={self.show_imdb_id}, S{self.season_number.value:02d}E{self.episode_number.value:02d})"
return f"TVShow(imdb_id={self.imdb_id}, title='{self.title}')"
+15 -101
View File
@@ -1,126 +1,40 @@
"""TV Show repository interfaces (abstract)."""
"""TV Show repository interface.
A single repository for the aggregate root only Season and Episode are
**inside** the TVShow aggregate and are never persisted independently. The
aggregate is always loaded and saved as a whole.
"""
from abc import ABC, abstractmethod
from ..shared.value_objects import ImdbId
from .entities import Episode, Season, TVShow
from .value_objects import EpisodeNumber, SeasonNumber
from .entities import TVShow
class TVShowRepository(ABC):
"""
Abstract repository for TV show persistence.
Abstract repository for the TVShow aggregate.
This defines the interface that infrastructure implementations must follow.
Implementations are responsible for persisting the full aggregate graph
(TVShow + all its Seasons + all their Episodes) atomically.
"""
@abstractmethod
def save(self, show: TVShow) -> None:
"""
Save a TV show to the repository.
Args:
show: TVShow entity to save
"""
pass
"""Persist the full TVShow aggregate."""
@abstractmethod
def find_by_imdb_id(self, imdb_id: ImdbId) -> TVShow | None:
"""
Find a TV show by its IMDb ID.
Args:
imdb_id: IMDb ID to search for
Returns:
TVShow if found, None otherwise
"""
pass
"""Load the full TVShow aggregate by IMDb ID, or None if absent."""
@abstractmethod
def find_all(self) -> list[TVShow]:
"""
Get all TV shows in the repository.
Returns:
List of all TV shows
"""
pass
"""Load all TVShow aggregates."""
@abstractmethod
def delete(self, imdb_id: ImdbId) -> bool:
"""
Delete a TV show from the repository.
Args:
imdb_id: IMDb ID of the show to delete
Returns:
True if deleted, False if not found
"""
pass
"""Remove the aggregate. Returns True if it existed and was deleted."""
@abstractmethod
def exists(self, imdb_id: ImdbId) -> bool:
"""
Check if a TV show exists in the repository.
Args:
imdb_id: IMDb ID to check
Returns:
True if exists, False otherwise
"""
pass
class SeasonRepository(ABC):
"""Abstract repository for season persistence."""
@abstractmethod
def save(self, season: Season) -> None:
"""Save a season."""
pass
@abstractmethod
def find_by_show_and_number(
self, show_imdb_id: ImdbId, season_number: SeasonNumber
) -> Season | None:
"""Find a season by show and season number."""
pass
@abstractmethod
def find_all_by_show(self, show_imdb_id: ImdbId) -> list[Season]:
"""Get all seasons for a show."""
pass
class EpisodeRepository(ABC):
"""Abstract repository for episode persistence."""
@abstractmethod
def save(self, episode: Episode) -> None:
"""Save an episode."""
pass
@abstractmethod
def find_by_show_season_episode(
self,
show_imdb_id: ImdbId,
season_number: SeasonNumber,
episode_number: EpisodeNumber,
) -> Episode | None:
"""Find an episode by show, season, and episode number."""
pass
@abstractmethod
def find_all_by_season(
self, show_imdb_id: ImdbId, season_number: SeasonNumber
) -> list[Episode]:
"""Get all episodes for a season."""
pass
@abstractmethod
def find_all_by_show(self, show_imdb_id: ImdbId) -> list[Episode]:
"""Get all episodes for a show."""
pass
"""True if the aggregate exists in the store."""
-234
View File
@@ -1,234 +0,0 @@
"""TV Show domain services - Business logic."""
import logging
import re
from ..shared.value_objects import ImdbId
from .entities import TVShow
from .exceptions import (
TVShowAlreadyExists,
TVShowNotFound,
)
from .repositories import EpisodeRepository, SeasonRepository, TVShowRepository
logger = logging.getLogger(__name__)
class TVShowService:
"""
Domain service for TV show-related business logic.
This service contains business rules that don't naturally fit
within a single entity.
"""
def __init__(
self,
show_repository: TVShowRepository,
season_repository: SeasonRepository | None = None,
episode_repository: EpisodeRepository | None = None,
):
"""
Initialize TV show service.
Args:
show_repository: TV show repository for persistence
season_repository: Optional season repository
episode_repository: Optional episode repository
"""
self.show_repository = show_repository
self.season_repository = season_repository
self.episode_repository = episode_repository
def track_show(self, show: TVShow) -> None:
"""
Start tracking a TV show.
Args:
show: TVShow entity to track
Raises:
TVShowAlreadyExists: If show is already being tracked
"""
if self.show_repository.exists(show.imdb_id):
raise TVShowAlreadyExists(
f"TV show with IMDb ID {show.imdb_id} is already tracked"
)
self.show_repository.save(show)
logger.info(f"Started tracking TV show: {show.title} ({show.imdb_id})")
def get_show(self, imdb_id: ImdbId) -> TVShow:
"""
Get a TV show by IMDb ID.
Args:
imdb_id: IMDb ID of the show
Returns:
TVShow entity
Raises:
TVShowNotFound: If show not found
"""
show = self.show_repository.find_by_imdb_id(imdb_id)
if not show:
raise TVShowNotFound(f"TV show with IMDb ID {imdb_id} not found")
return show
def get_all_shows(self) -> list[TVShow]:
"""
Get all tracked TV shows.
Returns:
List of all TV shows
"""
return self.show_repository.find_all()
def get_ongoing_shows(self) -> list[TVShow]:
"""
Get all ongoing TV shows.
Returns:
List of ongoing TV shows
"""
all_shows = self.show_repository.find_all()
return [show for show in all_shows if show.is_ongoing()]
def get_ended_shows(self) -> list[TVShow]:
"""
Get all ended TV shows.
Returns:
List of ended TV shows
"""
all_shows = self.show_repository.find_all()
return [show for show in all_shows if show.is_ended()]
def update_show(self, show: TVShow) -> None:
"""
Update an existing TV show.
Args:
show: TVShow entity with updated data
Raises:
TVShowNotFound: If show doesn't exist
"""
if not self.show_repository.exists(show.imdb_id):
raise TVShowNotFound(f"TV show with IMDb ID {show.imdb_id} not found")
self.show_repository.save(show)
logger.info(f"Updated TV show: {show.title} ({show.imdb_id})")
def untrack_show(self, imdb_id: ImdbId) -> None:
"""
Stop tracking a TV show.
Args:
imdb_id: IMDb ID of the show to untrack
Raises:
TVShowNotFound: If show not found
"""
if not self.show_repository.delete(imdb_id):
raise TVShowNotFound(f"TV show with IMDb ID {imdb_id} not found")
logger.info(f"Stopped tracking TV show with IMDb ID: {imdb_id}")
def parse_episode_from_filename(self, filename: str) -> tuple[int, int] | None:
"""
Parse season and episode numbers from filename.
Supports formats:
- S01E05
- 1x05
- Season 1 Episode 5
Args:
filename: Filename to parse
Returns:
Tuple of (season, episode) if found, None otherwise
"""
filename_lower = filename.lower()
# Pattern 1: S01E05
pattern1 = r"s(\d{1,2})e(\d{1,2})"
match = re.search(pattern1, filename_lower)
if match:
return (int(match.group(1)), int(match.group(2)))
# Pattern 2: 1x05
pattern2 = r"(\d{1,2})x(\d{1,2})"
match = re.search(pattern2, filename_lower)
if match:
return (int(match.group(1)), int(match.group(2)))
# Pattern 3: Season 1 Episode 5
pattern3 = r"season\s*(\d{1,2})\s*episode\s*(\d{1,2})"
match = re.search(pattern3, filename_lower)
if match:
return (int(match.group(1)), int(match.group(2)))
return None
def validate_episode_file(self, filename: str) -> bool:
"""
Validate that a file is a valid episode file.
Args:
filename: Filename to validate
Returns:
True if valid episode file, False otherwise
"""
# Check file extension
valid_extensions = {".mkv", ".mp4", ".avi", ".mov", ".wmv", ".flv", ".webm"}
extension = filename[filename.rfind(".") :].lower() if "." in filename else ""
if extension not in valid_extensions:
logger.warning(f"Invalid file extension: {extension}")
return False
# Check if we can parse episode info
episode_info = self.parse_episode_from_filename(filename)
if not episode_info:
logger.warning(f"Could not parse episode info from filename: {filename}")
return False
return True
def find_next_episode(
self, show: TVShow, last_season: int, last_episode: int
) -> tuple[int, int] | None:
"""
Find the next episode to download for a show.
Args:
show: TVShow entity
last_season: Last downloaded season number
last_episode: Last downloaded episode number
Returns:
Tuple of (season, episode) for next episode, or None if show is complete
"""
# If show has ended and we've watched all seasons, no next episode
if show.is_ended() and last_season >= show.seasons_count:
return None
# Simple logic: next episode in same season, or first episode of next season
# This could be enhanced with actual episode counts per season
next_episode = last_episode + 1
next_season = last_season
# Assume max 50 episodes per season (could be improved with actual data)
if next_episode > 50:
next_season += 1
next_episode = 1
# Don't go beyond known seasons
if next_season > show.seasons_count:
return None
return (next_season, next_episode)
+47 -8
View File
@@ -1,5 +1,7 @@
"""TV Show domain value objects."""
from __future__ import annotations
from dataclasses import dataclass
from enum import Enum
@@ -7,28 +9,48 @@ from ..shared.exceptions import ValidationError
class ShowStatus(Enum):
"""Status of a TV show - whether it's still airing or has ended."""
"""
Production status of a TV show (real-world, source of truth = TMDB).
Describes the **production** state of the show, independently of what
the user owns. Orthogonal to ``CollectionStatus``.
"""
ONGOING = "ongoing"
ENDED = "ended"
UNKNOWN = "unknown"
@classmethod
def from_string(cls, status_str: str) -> "ShowStatus":
def from_string(cls, status_str: str) -> ShowStatus:
"""
Parse status from string.
Parse a production status string into a ShowStatus.
Args:
status_str: Status string (e.g., "ongoing", "ended")
Accepts our internal vocabulary ("ongoing", "ended") as well as the
statuses returned by TMDB ("Returning Series", "In Production",
"Pilot", "Ended", "Canceled"). The mapping is intentionally binary:
Returns:
ShowStatus enum value
* ONGOING any state where new episodes may still ship
* ENDED production has stopped (naturally or cancelled)
* UNKNOWN anything else / unrecognized
Comparison is case-insensitive and whitespace-trimmed.
"""
if not status_str:
return cls.UNKNOWN
key = status_str.strip().lower()
status_map = {
# Internal
"ongoing": cls.ONGOING,
"ended": cls.ENDED,
# TMDB
"returning series": cls.ONGOING,
"in production": cls.ONGOING,
"pilot": cls.ONGOING,
"planned": cls.ONGOING,
"canceled": cls.ENDED,
"cancelled": cls.ENDED,
}
return status_map.get(status_str.lower(), cls.UNKNOWN)
return status_map.get(key, cls.UNKNOWN)
@dataclass(frozen=True)
@@ -70,6 +92,23 @@ class SeasonNumber:
return self.value
class CollectionStatus(Enum):
"""
State of the user's **collection** for a TV show (orthogonal to ShowStatus).
Compares possessed episodes against episodes **already aired** never
against announced/upcoming ones. A returning show with all aired episodes
owned is ``COMPLETE``, not ``PARTIAL``, even if more seasons are upcoming.
Future scheduling info (upcoming seasons, next airing date) will live on
the TVShow aggregate as separate flags, not in this enum.
"""
EMPTY = "empty" # 0 episode owned
PARTIAL = "partial" # some aired episodes are missing
COMPLETE = "complete" # all aired-to-date episodes are owned
@dataclass(frozen=True)
class EpisodeNumber:
"""
@@ -1,6 +1,7 @@
"""qBittorrent Web API client."""
import logging
from pathlib import Path
from typing import Any
import requests
@@ -48,9 +49,9 @@ class QBittorrentClient:
"""
cfg = config or settings
self.host = host or "http://192.168.178.47:30024"
self.username = username or "admin"
self.password = password or "adminadmin"
self.host = host or cfg.qbittorrent_url
self.username = username or cfg.qbittorrent_username
self.password = password or cfg.qbittorrent_password
self.timeout = timeout or cfg.request_timeout
self.session = requests.Session()
@@ -336,6 +337,88 @@ class QBittorrentClient:
logger.error(f"Failed to resume torrent: {e}")
raise
def find_by_name(self, name: str) -> TorrentInfo | None:
"""
Find a torrent by release folder name.
Matching strategy (in order):
1. Exact name match (torrent.name == name)
2. Case-insensitive name match
3. save_path ends with the name (folder moved but name intact)
Args:
name: Release folder name (e.g. "Foundation.2021.S01.1080p...")
Returns:
TorrentInfo if found, None otherwise
"""
torrents = self.get_torrents()
# 1. Exact
for t in torrents:
if t.name == name:
return t
# 2. Case-insensitive
name_lower = name.lower()
for t in torrents:
if t.name.lower() == name_lower:
return t
# 3. save_path ends with the folder name
for t in torrents:
if t.save_path and Path(t.save_path).name.lower() == name_lower:
return t
return None
def set_location(self, torrent_hash: str, location: str) -> bool:
"""
Change the save path of a torrent.
Args:
torrent_hash: Hash of the torrent
location: New save path (must exist on the server)
Returns:
True if location changed successfully
"""
if not self._authenticated:
self.login()
data = {"hashes": torrent_hash, "location": location}
try:
self._make_request("POST", "/api/v2/torrents/setLocation", data=data)
logger.info(f"Set location for {torrent_hash}{location}")
return True
except QBittorrentAPIError as e:
logger.error(f"Failed to set location for {torrent_hash}: {e}")
raise
def recheck(self, torrent_hash: str) -> bool:
"""
Force recheck (hash verification) of a torrent.
Args:
torrent_hash: Hash of the torrent
Returns:
True if recheck triggered successfully
"""
if not self._authenticated:
self.login()
data = {"hashes": torrent_hash}
try:
self._make_request("POST", "/api/v2/torrents/recheck", data=data)
logger.info(f"Recheck triggered for {torrent_hash}")
return True
except QBittorrentAPIError as e:
logger.error(f"Failed to recheck {torrent_hash}: {e}")
raise
def get_torrent_properties(self, torrent_hash: str) -> dict[str, Any]:
"""
Get detailed properties of a torrent.
@@ -2,6 +2,7 @@
from .exceptions import FilesystemError, PathTraversalError
from .file_manager import FileManager
from .filesystem_operations import create_folder, move
from .organizer import MediaOrganizer
__all__ = [
@@ -9,4 +10,6 @@ __all__ = [
"MediaOrganizer",
"FilesystemError",
"PathTraversalError",
"create_folder",
"move",
]
+111
View File
@@ -0,0 +1,111 @@
"""ffprobe — infrastructure adapter for extracting MediaInfo from a video file."""
from __future__ import annotations
import json
import logging
import subprocess
from pathlib import Path
from alfred.domain.shared.media import AudioTrack, MediaInfo, SubtitleTrack, VideoTrack
logger = logging.getLogger(__name__)
_FFPROBE_CMD = [
"ffprobe",
"-v",
"quiet",
"-print_format",
"json",
"-show_streams",
"-show_format",
]
def probe(path: Path) -> MediaInfo | None:
"""
Run ffprobe on path and return a MediaInfo.
Returns None if ffprobe is not available or the file cannot be probed.
"""
try:
result = subprocess.run(
[*_FFPROBE_CMD, str(path)],
capture_output=True,
text=True,
timeout=30,
check=False,
)
except subprocess.TimeoutExpired:
logger.warning("ffprobe timed out on %s", path)
return None
if result.returncode != 0:
logger.warning("ffprobe failed on %s: %s", path, result.stderr.strip())
return None
try:
data = json.loads(result.stdout)
except json.JSONDecodeError:
logger.warning("ffprobe returned invalid JSON for %s", path)
return None
return _parse(data)
def _parse(data: dict) -> MediaInfo:
streams = data.get("streams", [])
fmt = data.get("format", {})
info = MediaInfo()
# File-level duration/bitrate (ffprobe ``format`` block — independent of streams)
if "duration" in fmt:
try:
info.duration_seconds = float(fmt["duration"])
except ValueError:
pass
if "bit_rate" in fmt:
try:
info.bitrate_kbps = int(fmt["bit_rate"]) // 1000
except ValueError:
pass
for stream in streams:
codec_type = stream.get("codec_type")
if codec_type == "video":
info.video_tracks.append(
VideoTrack(
index=stream.get("index", len(info.video_tracks)),
codec=stream.get("codec_name"),
width=stream.get("width"),
height=stream.get("height"),
is_default=stream.get("disposition", {}).get("default", 0) == 1,
)
)
elif codec_type == "audio":
info.audio_tracks.append(
AudioTrack(
index=stream.get("index", len(info.audio_tracks)),
codec=stream.get("codec_name"),
channels=stream.get("channels"),
channel_layout=stream.get("channel_layout"),
language=stream.get("tags", {}).get("language"),
is_default=stream.get("disposition", {}).get("default", 0) == 1,
)
)
elif codec_type == "subtitle":
info.subtitle_tracks.append(
SubtitleTrack(
index=stream.get("index", len(info.subtitle_tracks)),
codec=stream.get("codec_name"),
language=stream.get("tags", {}).get("language"),
is_default=stream.get("disposition", {}).get("default", 0) == 1,
is_forced=stream.get("disposition", {}).get("forced", 0) == 1,
)
)
return info
@@ -2,6 +2,7 @@
import logging
import os
import shutil
from collections import namedtuple
from pathlib import Path
from typing import Any
@@ -89,13 +90,18 @@ class FileManager:
folder_path = memory.ltm.library_paths.get(folder_type)
if not folder_path:
return _err("folder_not_set", f"{folder_type.capitalize()} folder not configured.")
return _err(
"folder_not_set",
f"{folder_type.capitalize()} folder not configured.",
)
root = Path(folder_path)
target = root / safe_path
if not self._is_safe_path(root, target):
return _err("forbidden", "Access denied: path outside allowed directory")
return _err(
"forbidden", "Access denied: path outside allowed directory"
)
if not target.exists():
return _err("not_found", f"Path does not exist: {safe_path}")
@@ -153,10 +159,15 @@ class FileManager:
return _err("source_not_file", f"Source is not a file: {source}")
if not dest_path.parent.exists():
return _err("destination_dir_not_found", f"Destination directory does not exist: {dest_path.parent}")
return _err(
"destination_dir_not_found",
f"Destination directory does not exist: {dest_path.parent}",
)
if dest_path.exists():
return _err("destination_exists", f"Destination already exists: {destination}")
return _err(
"destination_exists", f"Destination already exists: {destination}"
)
os.link(source_path, dest_path)
@@ -197,7 +208,9 @@ class FileManager:
source_path.unlink()
logger.info(f"File moved: {source_path.name} -> {link_result['destination']}")
logger.info(
f"File moved: {source_path.name} -> {link_result['destination']}"
)
return {
"status": "ok",
"source": str(source_path),
@@ -237,11 +250,19 @@ class FileManager:
torrent_root = Path(torrent_folder).resolve()
if not lib_path.exists():
return _err("library_file_not_found", f"Library file not found: {library_file}")
return _err(
"library_file_not_found", f"Library file not found: {library_file}"
)
if not src_folder.exists():
return _err("source_folder_not_found", f"Download folder not found: {original_download_folder}")
return _err(
"source_folder_not_found",
f"Download folder not found: {original_download_folder}",
)
if not torrent_root.exists():
return _err("torrent_folder_not_found", f"Torrent folder not found: {torrent_folder}")
return _err(
"torrent_folder_not_found",
f"Torrent folder not found: {torrent_folder}",
)
dest_folder = torrent_root / src_folder.name
dest_folder.mkdir(parents=True, exist_ok=True)
@@ -265,7 +286,6 @@ class FileManager:
if dest_item.exists():
skipped.append(str(rel))
continue
import shutil
shutil.copy2(item, dest_item)
copied.append(str(rel))
logger.debug(f"Copied for seeding: {rel}")
@@ -0,0 +1,73 @@
"""Low-level filesystem operations — one responsibility per function."""
import logging
import subprocess
from pathlib import Path
from typing import Any
logger = logging.getLogger(__name__)
def _err(error: str, message: str) -> dict[str, Any]:
return {"status": "error", "error": error, "message": message}
def create_folder(path: str) -> dict[str, Any]:
"""
Create a directory and all missing parents.
Args:
path: Absolute path to the directory to create.
Returns:
Dict with status and path, or error details.
"""
try:
p = Path(path)
p.mkdir(parents=True, exist_ok=True)
logger.info(f"Folder ready: {p}")
return {"status": "ok", "path": str(p)}
except OSError as e:
logger.error(f"create_folder failed: {e}")
return _err("mkdir_failed", str(e))
def move(source: str, destination: str) -> dict[str, Any]:
"""
Move a file or folder to a destination path.
Uses the system mv command instant on the same filesystem (ZFS rename).
Args:
source: Absolute path to the source file or folder.
destination: Absolute path to the destination.
Returns:
Dict with status, source, destination or error details.
"""
src = Path(source)
dst = Path(destination)
if not src.exists():
return _err("source_not_found", f"Source does not exist: {source}")
if dst.exists():
return _err("destination_exists", f"Destination already exists: {destination}")
try:
result = subprocess.run(
["mv", str(src), str(dst)],
capture_output=True,
text=True,
check=False,
)
if result.returncode != 0:
logger.error(f"mv failed: {result.stderr}")
return _err("move_failed", result.stderr.strip())
logger.info(f"Moved: {src} -> {dst}")
return {"status": "ok", "source": str(src), "destination": str(dst)}
except OSError as e:
logger.error(f"move failed: {e}")
return _err("move_failed", str(e))
@@ -0,0 +1,25 @@
"""find_video — locate the first video file in a release folder."""
from __future__ import annotations
from pathlib import Path
from alfred.domain.release.value_objects import _VIDEO_EXTENSIONS
def find_video_file(path: Path) -> Path | None:
"""
Return the first video file found at path.
- If path is a file and is a video return it directly.
- If path is a folder scan recursively, return the first video found
(sorted by name for determinism, picks S01E01 before S01E02 etc.).
"""
if path.is_file():
return path if path.suffix.lower() in _VIDEO_EXTENSIONS else None
for candidate in sorted(path.rglob("*")):
if candidate.is_file() and candidate.suffix.lower() in _VIDEO_EXTENSIONS:
return candidate
return None
+2 -10
View File
@@ -75,11 +75,7 @@ class MediaOrganizer:
show_dir = self.tvshow_folder / show_folder_name
# Create season folder
season = Season(
show_imdb_id=show.imdb_id,
season_number=episode.season_number,
episode_count=0, # Not needed for folder name
)
season = Season(season_number=episode.season_number)
season_folder_name = season.get_folder_name()
season_dir = show_dir / season_folder_name
@@ -126,11 +122,7 @@ class MediaOrganizer:
show_folder_name = show.get_folder_name()
show_dir = self.tvshow_folder / show_folder_name
season = Season(
show_imdb_id=show.imdb_id,
season_number=SeasonNumber(season_number),
episode_count=0,
)
season = Season(season_number=SeasonNumber(season_number))
season_folder_name = season.get_folder_name()
season_dir = show_dir / season_folder_name
@@ -0,0 +1,5 @@
"""Per-release `.alfred/metadata.yaml` persistence."""
from .store import MetadataStore
__all__ = ["MetadataStore"]
+183
View File
@@ -0,0 +1,183 @@
"""
MetadataStore reads/writes the `.alfred/metadata.yaml` file colocated with
a release folder.
The store is intentionally domain-agnostic: it knows how to atomically
load/save the YAML and exposes typed update helpers for the broad facts a
release carries (parse, probe, TMDB lookup, detected pattern). Subtitle
history lives next to the same file but is appended through a dedicated
helper kept under `alfred/infrastructure/subtitle/` so the subtitle pipeline
keeps full ownership of its payload shape.
The file layout:
<release_root>/
.alfred/
metadata.yaml
The store never raises on a missing file it returns empty defaults. Writes
are atomic (write to .tmp then rename).
"""
from __future__ import annotations
import logging
from datetime import UTC, datetime
from pathlib import Path
from typing import Any
import yaml
logger = logging.getLogger(__name__)
class MetadataStore:
"""Manages `.alfred/metadata.yaml` for one release folder."""
def __init__(self, release_root: str | Path):
self._root = Path(release_root)
self._alfred_dir = self._root / ".alfred"
self._metadata_path = self._alfred_dir / "metadata.yaml"
# ------------------------------------------------------------------
# Identity
# ------------------------------------------------------------------
@property
def release_root(self) -> Path:
return self._root
@property
def metadata_path(self) -> Path:
return self._metadata_path
def exists(self) -> bool:
return self._metadata_path.exists()
# ------------------------------------------------------------------
# Load / Save
# ------------------------------------------------------------------
def load(self) -> dict:
"""Return the full metadata dict. Empty dict if file absent."""
if not self._metadata_path.exists():
return {}
try:
with open(self._metadata_path, encoding="utf-8") as f:
return yaml.safe_load(f) or {}
except Exception as e:
logger.warning(f"MetadataStore: could not read {self._metadata_path}: {e}")
return {}
def save(self, data: dict) -> None:
"""Atomically write metadata.yaml. Creates .alfred/ if needed."""
self._alfred_dir.mkdir(parents=True, exist_ok=True)
tmp = self._metadata_path.with_suffix(".yaml.tmp")
try:
with open(tmp, "w", encoding="utf-8") as f:
yaml.safe_dump(
data,
f,
allow_unicode=True,
default_flow_style=False,
sort_keys=False,
)
tmp.rename(self._metadata_path)
except Exception as e:
logger.error(f"MetadataStore: could not write {self._metadata_path}: {e}")
tmp.unlink(missing_ok=True)
raise
# ------------------------------------------------------------------
# Generic update helper
# ------------------------------------------------------------------
def update_section(self, section: str, payload: dict[str, Any]) -> None:
"""
Merge `payload` into the top-level `section` block and stamp it.
The section is replaced wholesale (not deep-merged) so the last
successful tool run reflects the current truth. A `_updated_at`
ISO-8601 timestamp is added inside the section.
"""
data = self.load()
stamped = dict(payload)
stamped["_updated_at"] = datetime.now(UTC).isoformat()
data[section] = stamped
self.save(data)
# ------------------------------------------------------------------
# Typed update helpers — one per inspector tool
# ------------------------------------------------------------------
def update_parse(self, parse_result: dict[str, Any]) -> None:
"""Persist the result of analyze_release."""
clean = {k: v for k, v in parse_result.items() if k != "status"}
self.update_section("parse", clean)
def update_probe(self, probe_result: dict[str, Any]) -> None:
"""Persist the result of probe_media."""
clean = {k: v for k, v in probe_result.items() if k != "status"}
self.update_section("probe", clean)
def update_tmdb(self, tmdb_result: dict[str, Any]) -> None:
"""Persist the result of find_media_imdb_id."""
clean = {k: v for k, v in tmdb_result.items() if k != "status"}
self.update_section("tmdb", clean)
# Also promote core identity fields to the top level so they are
# cheap to read without parsing the full tmdb block.
data = self.load()
for key in ("imdb_id", "tmdb_id", "media_type"):
if key in clean and clean[key] is not None:
data[key] = clean[key]
if "title" in clean and clean["title"]:
data.setdefault("title", clean["title"])
self.save(data)
# ------------------------------------------------------------------
# Pattern (used by the subtitle pipeline)
# ------------------------------------------------------------------
def confirmed_pattern(self) -> str | None:
"""Return the confirmed pattern_id, or None."""
data = self.load()
if data.get("pattern_confirmed"):
return data.get("detected_pattern")
return None
def mark_pattern_confirmed(
self, pattern_id: str, media_info: dict | None = None
) -> None:
"""Persist detected_pattern + pattern_confirmed=true."""
data = self.load()
data["detected_pattern"] = pattern_id
data["pattern_confirmed"] = True
if media_info:
data.setdefault("media_type", media_info.get("media_type"))
data.setdefault("imdb_id", media_info.get("imdb_id"))
data.setdefault("title", media_info.get("title"))
self.save(data)
logger.info(
f"MetadataStore: confirmed pattern '{pattern_id}' for {self._root.name}"
)
# ------------------------------------------------------------------
# Subtitle history (kept for backwards compatibility with the
# subtitle pipeline — payload shape is owned by the caller).
# ------------------------------------------------------------------
def append_subtitle_history_entry(self, entry: dict[str, Any]) -> None:
"""Append one entry (raw dict) to subtitle_history."""
data = self.load()
history = data.setdefault("subtitle_history", [])
history.append(entry)
rg = entry.get("release_group")
if rg:
groups = data.setdefault("release_groups", [])
if rg not in groups:
groups.append(rg)
self.save(data)
def subtitle_history(self) -> list[dict]:
"""Return the raw subtitle history list."""
return self.load().get("subtitle_history", [])
@@ -1,11 +1,8 @@
"""JSON-based repository implementations."""
"""Placeholder package — previously held JSON-based repository implementations.
from .movie_repository import JsonMovieRepository
from .subtitle_repository import JsonSubtitleRepository
from .tvshow_repository import JsonTVShowRepository
__all__ = [
"JsonMovieRepository",
"JsonTVShowRepository",
"JsonSubtitleRepository",
]
The Json{Movie,TVShow,Subtitle}Repository classes were removed during the
test-week cleanup: they had no live callers, the subtitle variant had broken
imports, and the live code paths in agent/application use the memory-backed
``LongTermMemory.library`` directly. Keep this empty package so the namespace
remains importable if anything stale references ``alfred.infrastructure.persistence.json``.
"""
@@ -1,144 +0,0 @@
"""JSON-based movie repository implementation."""
import logging
from datetime import datetime
from typing import Any
from alfred.domain.movies.entities import Movie
from alfred.domain.movies.repositories import MovieRepository
from alfred.domain.movies.value_objects import MovieTitle, Quality, ReleaseYear
from alfred.domain.shared.value_objects import FilePath, FileSize, ImdbId
from alfred.infrastructure.persistence import get_memory
logger = logging.getLogger(__name__)
class JsonMovieRepository(MovieRepository):
"""
JSON-based implementation of MovieRepository.
Stores movies in the LTM library using the memory context.
"""
def save(self, movie: Movie) -> None:
"""
Save a movie to the repository.
Updates existing movie if IMDb ID matches.
Args:
movie: Movie entity to save.
"""
memory = get_memory()
movies = memory.ltm.library.get("movies", [])
# Remove existing movie with same IMDb ID
movies = [m for m in movies if m.get("imdb_id") != str(movie.imdb_id)]
movies.append(self._to_dict(movie))
memory.ltm.library["movies"] = movies
memory.save()
logger.debug(f"Saved movie: {movie.imdb_id}")
def find_by_imdb_id(self, imdb_id: ImdbId) -> Movie | None:
"""
Find a movie by its IMDb ID.
Args:
imdb_id: IMDb ID to search for.
Returns:
Movie if found, None otherwise.
"""
memory = get_memory()
movies = memory.ltm.library.get("movies", [])
for movie_dict in movies:
if movie_dict.get("imdb_id") == str(imdb_id):
return self._from_dict(movie_dict)
return None
def find_all(self) -> list[Movie]:
"""
Get all movies in the repository.
Returns:
List of all Movie entities.
"""
memory = get_memory()
movies_dict = memory.ltm.library.get("movies", [])
return [self._from_dict(m) for m in movies_dict]
def delete(self, imdb_id: ImdbId) -> bool:
"""
Delete a movie from the repository.
Args:
imdb_id: IMDb ID of movie to delete.
Returns:
True if deleted, False if not found.
"""
memory = get_memory()
movies = memory.ltm.library.get("movies", [])
initial_count = len(movies)
movies = [m for m in movies if m.get("imdb_id") != str(imdb_id)]
if len(movies) < initial_count:
memory.ltm.library["movies"] = movies
memory.save()
logger.debug(f"Deleted movie: {imdb_id}")
return True
return False
def exists(self, imdb_id: ImdbId) -> bool:
"""
Check if a movie exists in the repository.
Args:
imdb_id: IMDb ID to check.
Returns:
True if exists, False otherwise.
"""
return self.find_by_imdb_id(imdb_id) is not None
def _to_dict(self, movie: Movie) -> dict[str, Any]:
"""Convert Movie entity to dict for storage."""
return {
"imdb_id": str(movie.imdb_id),
"title": movie.title.value,
"release_year": movie.release_year.value if movie.release_year else None,
"quality": movie.quality.value,
"file_path": str(movie.file_path) if movie.file_path else None,
"file_size": movie.file_size.bytes if movie.file_size else None,
"tmdb_id": movie.tmdb_id,
"added_at": movie.added_at.isoformat(),
}
def _from_dict(self, data: dict[str, Any]) -> Movie:
"""Convert dict from storage to Movie entity."""
# Parse quality string to enum
quality_str = data.get("quality", "unknown")
quality = Quality.from_string(quality_str)
return Movie(
imdb_id=ImdbId(data["imdb_id"]),
title=MovieTitle(data["title"]),
release_year=(
ReleaseYear(data["release_year"]) if data.get("release_year") else None
),
quality=quality,
file_path=FilePath(data["file_path"]) if data.get("file_path") else None,
file_size=FileSize(data["file_size"]) if data.get("file_size") else None,
tmdb_id=data.get("tmdb_id"),
added_at=(
datetime.fromisoformat(data["added_at"])
if data.get("added_at")
else datetime.now()
),
)

Some files were not shown because too many files have changed in this diff Show More