175 Commits

Author SHA1 Message Date
francwa 745dec39f5 FINAL COMMIT BEFORE REWRITE 2026-05-26 21:45:11 +02:00
francwa 42fa6139ed refactor(tools): wire filesystem tools to new use cases, drop broken ones
Updates alfred/agent/tools/filesystem.py to use the five free-function
use cases introduced in the previous commit:

  - list_folder            -> list_dir_use_case(Path(path), roots)
  - create_directory (new) -> create_dir_use_case(Path(path), roots)
  - move_media             -> move_file_use_case(src, dst, roots)
  - move_to_destination    -> create_dir_use_case(dst.parent) + move_file

A module-level _load_directory_roots() helper reads memory once per
call and builds the DirectoryRoots VO; missing roots produce an
explicit 'roots_not_configured' error.

Tools whose backing code was moved to *_OLD files are removed entirely
rather than left broken: manage_subtitles, set_path_for_folder,
create_seed_links, and the four resolve_*_destination tools. They will
come back when the matching application/domain code is rebuilt later
on this branch.

alfred/agent/tools/__init__.py shrinks accordingly. find_media_imdb_id
(already broken before this branch — name not exported by tools.api)
is dropped from the package re-exports so the package imports cleanly
again.
2026-05-26 19:46:49 +02:00
francwa 2df7843d8b refactor(filesystem): split into 5 atomic free-function ops + use cases
Replaces the monolithic FileManager class + scattered helpers in
alfred/infrastructure/filesystem with five free functions, each
single-responsibility and pathlib-native:

  list_dir / create_dir / link_file / move_file / move_dir

The infra layer now raises typed exceptions (FilesystemError base
+ SourceNotFound / DestinationExists / NotADirectory / NotAFile /
PermissionDenied / CrossDevice / FilesystemOSError) instead of
returning {status: ok|error} dicts. No more get_memory() reads
from infra.

Application layer mirrors the same split: five free use cases
(<op>_use_case) wrap each infra op, guard inputs against escaping
the new DirectoryRoots VO (downloads / torrents / movies /
tv_shows), catch infra exceptions, and return frozen DTOs. Roots
are injected — no global state.

Legacy files kept on disk with _OLD suffix for reference during
the follow-up rewiring (FileManager, MediaOrganizer,
create_folder/move helpers; CreateSeedLinks/ListFolder/MoveMedia/
ManageSubtitles use cases, resolve_destination). They are no
longer exported from __init__, which intentionally breaks current
agent tool wrappers and downstream tests — re-wiring is the next
chunk of work on the unfuck branch.
2026-05-26 19:22:09 +02:00
francwa 28304bb162 fix(releases): repair singular 'release' imports in parser
The CHOP CHOP CHOP pass left parser/{pipeline,scoring,services}.py
importing from alfred.domain.release.value_objects (singular), which
does not exist. parse_release was unimportable; all release tests
errored at collection.

Point the 3 imports at value_objects_old_question_mark.py, which still
holds ParsedRelease/ParseReport/MediaTypeToken/TokenizationRoute. The
file name is misleading (it is not 'old' — it is the active parser VO);
naming will be resolved when ParsedRelease itself is replaced. Tracked
in .claude/specs/unfuck_technical_debt.md #4.
2026-05-26 06:55:30 +02:00
francwa c62ae81275 refactor(tmdb): ACL pass — push VOs into DTOs, split search per media type
Anti-corruption boundary tightened on the TMDB adapter:

* TmdbMovieInfo / TmdbShowInfo now carry domain VOs (TmdbId, ImdbId,
  MovieTitle, ReleaseYear, ShowStatus) instead of raw scalars —
  validation happens at the boundary, not three layers later.
* ShowStatus enum added (domain/tv_shows/value_objects) with a
  from_tmdb() mapper that falls back to UNKNOWN + logs a warning on
  unrecognized values. TVShow.status is now ShowStatus, not str.
* MovieTitle cap raised from 100 to 150 chars.
* MediaResult / ExternalIds dropped. Replaced by per-media search
  DTOs: TmdbMovieSearchResult and TmdbShowSearchResult. Neither
  carries imdb_id — search no longer enriches with external_ids
  (callers needing imdb_id follow up with get_movie_info /
  get_tv_show_info on the chosen tmdb_id).
* TMDBClient: search_multi / search_media / _parse_result removed.
  search_movies (/search/movie) and search_shows (/search/tv) added,
  each parsing hits into VO-typed DTOs.
* SearchMovieUseCase returns a list of MovieHit (flattened to
  primitives for the agent). New symmetric SearchShowUseCase +
  ShowHit / SearchShowResponse DTOs.
* agent/tools/api.py: find_media_imdb_id → search_movies +
  search_shows wrappers.
* FileEntry moved from domain/shared/ports/filesystem_scanner.py to
  domain/shared/file_entry.py (it's a DTO, not a Protocol); size_kb
  (float) → size (int bytes). Scanner and SubtitleIdentifier
  updated.

Tests: 79/79 pass on tests/infrastructure/api/ +
tests/application/test_search_movie.py +
tests/application/test_search_show.py.
2026-05-26 05:54:58 +02:00
francwa cffafa2e60 CHOP CHOP CHOP 2026-05-26 05:45:07 +02:00
francwa b3abad4da4 chore(dot_alfred): Phase 5 cleanup + changelog
Delete the orphan MediaWithTracks mixin (and its only consumer,
track_lang_matches) from alfred.domain.shared.media — zero callers
since the v2 aggregates landed in Phase 3, parked for the Phase 5
decision in CHANGELOG.

Cleanup sweep across alfred/ and tests/ returns zero hits for:
* MediaWithTracks
* the v1 dot_alfred symbols
* the v1 sidecar names
* the alfred.application.library package
The v2 surface is the only one left.

CHANGELOG updated with:
* the Phase 5 sync orchestrators (sync_show / sync_movie),
* the Phase 4b PACK vs EPISODIC fix (Fixed section),
* the MediaWithTracks deletion in Removed,
* refreshed suite count (1277 passing).
2026-05-26 00:55:17 +02:00
francwa 7ff2e6bc4e feat(movies): sync_movie populates library index from TMDB
Parallel to sync_show. Calls TMDBClient.get_movie_info,
combines the TmdbMovieInfo with the on-disk MovieRelease loaded
via DotAlfredMovieReleaseRepository.load_by_tmdb_id, and upserts
into DotAlfredMovieLibraryIndex.

Policy mirrors sync_show with two adaptations specific to movies:
* placeholder signature is name == metadata.path (auto-heal writes
  them equal — the schema requires name to be non-empty so we can't
  use name == "" as the spec originally suggested),
* when the per-movie sidecar is gone but the index entry remains,
  sync warns and returns the existing entry unchanged (no upsert
  possible without a release: index.upsert requires folder/imdb_id
  from the MovieRelease itself).

Raises MovieNotFoundInLibrary when neither index nor sidecar
carry tmdb_id.
2026-05-26 00:51:43 +02:00
francwa 8f31f880aa feat(tv_shows): sync_show populates library index from TMDB
New orchestrator alfred.application.tv_shows.sync.sync_show calls
TMDBClient.get_tv_show_info, combines the response with the on-disk
release loaded via DotAlfredSeriesReleaseRepository.load_by_tmdb_id,
and upserts the result into DotAlfredTVShowLibraryIndex.

Policy:
* placeholders (auto-healed entries, status=="unknown") always
  refresh regardless of TTL,
* fresh entries within Settings.tmdb_cache_ttl_days are no-ops,
* stale entries past TTL refresh,
* force=True overrides both gates,
* indexed shows whose per-show sidecar is gone still get a fresh
  TMDB pass — slot map clears until rescan repopulates it,
* truly absent shows raise ShowNotFoundInLibrary from the new
  alfred.application.exceptions module.
2026-05-26 00:49:00 +02:00
francwa 1efe9a82c1 feat(dot_alfred): load_by_tmdb_id on release repos
Series repo returns (release, folder) so the upcoming sync
orchestrator can feed the library index's upsert(..., path=...).
Movie repo returns the release alone (folder is on release.folder
by the one-folder-one-file convention) — kept as a semantic alias
of find_by_tmdb_id for symmetry with the series side.
2026-05-26 00:45:14 +02:00
francwa 0dc053881a feat(tmdb): add TmdbMovieInfo DTO and get_movie_info
Symmetric to TmdbShowInfo / get_tv_show_info — gives the upcoming
sync_movie orchestrator a typed cache snapshot for the v2 movie
library index.

* TmdbMovieInfo(tmdb_id, imdb_id, title, release_year)
* parse_movie_info(details, external_ids) — pure builder, parses
  release_year from the first 4 chars of release_date (None on
  missing/empty/non-numeric)
* TMDBClient.get_movie_info(tmdb_id) — aggregates
  /movie/{id} + /movie/{id}/external_ids and feeds the parser

Tests cover happy path, missing/null/empty imdb_id, every
release_year edge (none/empty/short/non-numeric/missing key),
and the two required-field errors (id, title).
2026-05-26 00:35:42 +02:00
francwa 97dc799a26 fix(tv_shows): correct PACK vs EPISODIC classification model
The Phase 4 walker + rescan logic classified seasons by parser
output (does the filename carry Exx?), but PACK vs EPISODIC is a
structural distinction:

* PACK = season folder with N flat SxxEyy videos directly inside
* EPISODIC = season folder with N subfolders, each holding one video

Changes:
* walker.py: descends two levels under show_root and classifies
  each season folder by FS structure. SeasonFolder now carries
  mode: ReleaseMode | None. Mixed layouts (flat + subfolders) and
  EPISODIC subfolders with >1 video log a warning and report
  mode=None.
* rescan.py: trusts walker.mode; drops the bogus 'single un-
  numbered video → PACK with empty episodes' branch. A season
  with no parseable episodes is now skipped with a warning.
* Tests rewritten against the real model: PACK with flat numbered
  files, EPISODIC with one-video-per-subfolder, malformed mixed
  layout skipped, single-un-numbered-file skipped.

Suite: 1237 → 1245 passing.
2026-05-25 21:37:34 +02:00
francwa fe9857aaed docs(changelog): Phase 4 Step 5 — record dot_alfred v2 Phase 4 work
Append Phase 4 entry under [Unreleased]:
* Added: rescan_show v2 signature + new rescan_movie + PACK empty-
  episodes semantics + Settings.tmdb_cache_ttl_days + library-index
  anchor-mismatch warning
* Removed: v1 dot_alfred stack (bridge/repository/serializer/sidecar),
  abstract domain ports (TVShowRepository / MovieRepository),
  application/library/ package, two Phase-3 quarantine test files
* Internal: 1233 → 1237 passing, 10 → 8 skips; MediaWithTracks
  mixin parked for Phase 5

Phase 3 entries left intact (historically accurate at commit time).
2026-05-25 21:17:23 +02:00
francwa cc334a7951 feat(dot_alfred/v2): Phase 4 Step 4 — settings + anchor warning
Two small additions that close out Phase 4's loose ends.

Settings — tmdb_cache_ttl_days

    class Settings(BaseSettings):
        # --- DOT_ALFRED ---
        tmdb_cache_ttl_days: int = 14

Default 14 days, matching the dot_alfred_v2 master spec. Will drive
the Phase 5 TTL policy on TVShowLibraryIndexSidecar /
MovieLibraryIndexSidecar (decide when a TMDB-cached entry is stale
and triggers a refresh sync).

Anchor-mismatch warning

DotAlfredTVShowLibraryIndex._load_or_heal and DotAlfredMovieLibraryIndex
._load_or_heal now cross-check each indexed entry's metadata.path
against the on-disk folder layout right after a successful parse.
Drift (sidecar says folder X, X no longer exists under library_root)
is surfaced as a WARNING log — one per missing folder, with the
tmdb_id for cross-reference. No auto-heal on drift; the caller
decides (the heal path remains opt-in via index.heal()).

The warning fires only on the parsed-index path. The heal path
always synthesizes entries from real folder names, so it can never
drift — silent by construction.

Tests

* TestTVShowLibraryIndexAnchorWarning — 3 scenarios:
  warn-on-drift / no-warn-on-match / no-warn-on-heal.
* TestMovieLibraryIndexAnchorWarning — symmetric coverage.

Full suite: 1237 passed / 8 skipped / 4 xfailed.
2026-05-25 21:14:18 +02:00
francwa 86222d95d1 refactor(persistence): Phase 4 Step 3 — delete v1 dot_alfred + ports
Now that rescan_show + rescan_movie run on the v2 release repositories
(Phase 4 Steps 1-2), the v1 dot_alfred stack and its abstract domain
ports have zero callers. Delete them and lift the Phase 3 quarantines.

Deleted

* alfred/infrastructure/persistence/dot_alfred/bridge.py
* alfred/infrastructure/persistence/dot_alfred/repository.py     (v1)
* alfred/infrastructure/persistence/dot_alfred/serializer.py     (v1)
* alfred/infrastructure/persistence/dot_alfred/sidecar.py        (v1)
* alfred/domain/tv_shows/repositories.py     (TVShowRepository ABC)
* alfred/domain/movies/repositories.py       (MovieRepository ABC)
* tests/infrastructure/persistence/dot_alfred/test_repository.py
* tests/infrastructure/persistence/dot_alfred/test_serializer.py

Rewrite

alfred/infrastructure/persistence/dot_alfred/__init__.py now re-
exports only the v2 surface: the four concrete repositories
(DotAlfredSeriesReleaseRepository, DotAlfredMovieReleaseRepository,
DotAlfredTVShowLibraryIndex, DotAlfredMovieLibraryIndex) plus
ShowFolderUnknown. DTO-level imports go through
alfred.infrastructure.persistence.dot_alfred.v2 directly.

No backwards-compat shims (per CLAUDE.md): the v1 names are gone,
not aliased. Test suite drops from 10 → 8 skips (the two Phase 3
module-level skips disappear with the quarantined files).

Full suite: 1233 passed / 8 skipped / 4 xfailed.

The MediaWithTracks mixin in alfred.domain.shared.media is now
orphaned (Episode lost its tracks in Phase 3, MovieRelease doesn't
inherit it). Parked for Phase 5, which will either mount it on
MovieRelease / SeasonRelease or delete it for good.
2026-05-25 21:10:32 +02:00
francwa 9e48c70b8a feat(rescan): Phase 4 Step 2 — add rescan_movie orchestrator
Mirror rescan_show for the movies library. Locates the main video via
find_video_file, runs inspect_release once (movies are one-folder-one-
main-file by convention), and writes a v2 MovieRelease sidecar via
DotAlfredMovieReleaseRepository.

Signature

    rescan_movie(
        movie_dir,
        *,
        tmdb_id: TmdbId,
        imdb_id: ImdbId | None = None,
        movie_repo: DotAlfredMovieReleaseRepository,
        prober,
        kb,
    ) -> MovieRelease

Behavior

* added_at = datetime.now(UTC) — the v2 sidecar records when the
  release was last reconciled with disk, not filesystem mtime (which
  drifts across moves and hard-links). Phase 3 made this field
  required on MovieRelease.
* No TMDB call. Index auto-heals from the new sidecar on next read.
* MovieRescanFailed raised when no video is found inside movie_dir
  (only explicit failure mode; all other adapter errors degrade
  gracefully into empty / partial fields).
* file_path is recorded relative to movie_dir so the sidecar stays
  portable across library moves.

Tests

tests/application/movies/test_rescan.py: 8 scenarios on the real v2
movie repo + real KB + stubbed prober. Covers track flattening,
sidecar round-trip, prober returning None, video in subfolder,
explicit no-video failure, imdb_id optional.

Full suite: 1233 passed / 10 skipped / 4 xfailed.
2026-05-25 21:09:02 +02:00
francwa 7da0f887e7 refactor(rescan): Phase 4 Step 1 — rescan_show on v2 release repo
Rewrite rescan_show to build a SeriesRelease (Phase 1 v2 aggregate)
and persist it via DotAlfredSeriesReleaseRepository. The orchestrator
keeps reusing inspect_release as the single source of parse/probe
truth — only the assembly target changes (SeriesRelease/SeasonRelease/
EpisodeRelease instead of TVShow/Season/Episode).

New signature

    rescan_show(
        show_root,
        *,
        tmdb_id: TmdbId,
        imdb_id: ImdbId | None = None,
        series_repo: DotAlfredSeriesReleaseRepository,
        scanner,
        prober,
        kb,
    ) -> SeriesRelease

Identity is TMDB-anchored (tmdb_id required, no coercion); imdb_id is
optional. No TMDB call from rescan — the library index auto-heals
from the new sidecar on its next read.

PACK vs EPISODIC

* Single-video + season-parsed + no-episode → SeasonRelease(
  mode=PACK, folder=<season folder>, episodes=()). The slot map stays
  empty until the Phase 5 TMDB sync supplies episode_count. We do
  not fabricate an EpisodeRange we cannot prove on disk.
* Otherwise → EPISODIC: every file with (season, episode) becomes an
  EpisodeRelease with EpisodeRange(start, end) = (E, E). Multi-episode
  files (S01E01E02) still record only the first slot — Parser does
  not yet expose episode_end (existing tech debt, unchanged).

Package move

The orchestrator moves from alfred/application/library/ to
alfred/application/tv_shows/ for symmetry with alfred/application/
movies/ (Step 2). walker.py + its tests move with it. The empty
library/ package is deleted.

Tests

tests/application/tv_shows/test_rescan.py rewritten end-to-end on
the real v2 repository, real KB, real scanner, stubbed prober.
9 happy-path + edge-case scenarios cover EPISODIC track flattening,
PACK empty-episodes semantics, sidecar round-trip, imdb_id optional,
empty show root, season folder with no videos, prober returning None.
test_walker.py moved verbatim (import path updated).

Full suite: 1214 passed / 10 skipped / 4 xfailed. The three v1
dot_alfred quarantines from Phase 3 stay in place until Step 3.
2026-05-25 21:07:25 +02:00
francwa c22b2b78eb refactor(domain): Phase 3 — TVShow/Movie aggregates become TMDB-only
Filesystem-side concerns (file paths, tracks, quality, mode, added_at)
move to the releases/ domain added in Phase 1; the TMDB aggregates now
carry only identity + TMDB catalog facts.

Domain entities:
- TVShow: tmdb_id: TmdbId required (primary key), imdb_id: ImdbId | None
  optional, status: str = "unknown" added.
- Season: episode_count: int = 0 added (TMDB-cached); audio_tracks,
  subtitle_tracks, mode property removed.
- Episode: slimmed to identity + title. file_path/file_size/tracks
  removed. No longer inherits MediaWithTracks.
- Movie: tmdb_id required, imdb_id optional. file_path/file_size/quality/
  added_at/audio_tracks/subtitle_tracks removed. get_filename() now
  returns "Title.Year" — quality moves to MovieRelease.

Builders:
- TVShowBuilder requires tmdb_id: TmdbId; imdb_id/status optional.
- SeasonBuilder.set_episode_count(int) replaces set_audio_tracks /
  set_subtitle_tracks.

No-coercion contract: TVShow(tmdb_id=1396) raises — callers pass
TmdbId(1396). No ergonomic shim per the no-shims rule.

Cascade fixes:
- MediaOrganizer test fixtures updated to new Movie/TVShow shapes.
- Movie.get_filename() re-added (without Quality) so MediaOrganizer
  keeps working until Phase 4 rewires it through MovieRelease.

Quarantined (deleted in Phase 4 alongside v1 dot_alfred):
- tests/application/library/test_rescan.py — module-level skip.
- tests/infrastructure/persistence/dot_alfred/test_repository.py —
  module-level skip.
- tests/infrastructure/persistence/dot_alfred/test_serializer.py —
  module-level skip.

Suite: 1216 passed, 11 skipped (8 pre-existing + 3 Phase 3
quarantines), 4 xfailed. CHANGELOG updated under [Unreleased].
2026-05-25 19:54:35 +02:00
francwa 2f160644da feat(dot_alfred/v2): bump SCHEMA_VERSION to 2 — added_at on MovieRelease
Phase 3 prep: Movie aggregate is about to become TMDB-only (no
filesystem fields). added_at is a release-time observation, not a
TMDB-aggregate concern, so it moves to MovieRelease +
MovieReleaseSidecar.

- Add added_at: datetime (required) to MovieRelease with a
  type-check in __post_init__.
- Add added_at: datetime (required) to MovieReleaseSidecar.
- Bump SCHEMA_VERSION 1 → 2 with a version-history note.
- Bridge round-trips added_at via Pydantic mode="json" (datetime
  → ISO 8601 string).
- Tests: update MovieRelease fixtures, add a validator test, add
  an added_at round-trip test, switch hard-coded `1` assertions
  to SCHEMA_VERSION for future-proofing.

No v1 sidecars in the wild yet — no migration code needed.
2026-05-25 19:47:25 +02:00
francwa e65c1df229 feat(.alfred v2 — Phase 2): Pydantic sidecars, atomic repos, auto-heal index
Spec: specs/dot_alfred_v2.md (Phase 2).

New package alfred/infrastructure/persistence/dot_alfred/v2/:
  * sidecar_release.py / sidecar_root.py — Pydantic DTOs
    (extra="forbid", frozen=True) for per-item sidecars and the
    library-root index. schema_version enforced via model_validator.
  * serializer.py — read_yaml / atomic_write_yaml (.tmp + os.replace).
    SidecarSchemaError wraps YAML + Pydantic errors uniformly.
  * bridge.py — lossless domain <-> sidecar for SeriesRelease /
    MovieRelease; projection-only show_index_entry_from /
    movie_index_entry_from with multi-episode-file flattening.
  * repository.py — DotAlfredSeriesReleaseRepository /
    DotAlfredMovieReleaseRepository (log+skip on corruption),
    DotAlfredTVShowLibraryIndex / DotAlfredMovieLibraryIndex with
    silent auto-heal on missing/corrupt index reads. Writes never
    auto-heal (read paths handle that).

TMDB client extensions:
  * TmdbSeasonInfo / TmdbShowInfo DTOs + pure parse_tv_show_info.
  * TMDBClient.get_tv_show_info aggregates /tv/{id} +
    /tv/{id}/external_ids.

Domain change:
  * SubtitleTrack gains is_sdh: bool = False, populated from
    ffprobe's hearing_impaired disposition. Required for v2 sidecar
    parity (spec replaces v1's type: "sdh" with explicit flag).
    Default keeps every existing caller unchanged.

Tests: 37 new v2 integration tests on tmp_path (round-trips, atomic
writes, schema mismatch handling, anchor warnings, auto-heal paths)
plus 16 TMDB DTO tests. Full suite: 1240 -> 1277 passed.

Implementation notes filed in .claude/specs/dot_alfred_v2_notes.md
(strict=True trade-off, upsert signature deviation from spec, etc.).

Phases 3-5 (TVShow/Movie refactor to TMDB-only, rescan_show rewrite,
v1 deletion + wiring) are next.
2026-05-25 16:01:39 +02:00
francwa c0f6d01048 feat(releases): Phase 1 — new filesystem release domain + TmdbId VO
First step of specs/dot_alfred_v2.md. Introduces a separate bounded
context (alfred/domain/releases/) for the filesystem-side aggregates,
disjoint from TMDB identity which stays in tv_shows/ and movies/.
The link between the two worlds is TmdbId, used as the natural key
in the persistence layer (no domain-level reference).

New package alfred/domain/releases/:
- value_objects: EpisodeRange (covers SxxE01E02E03 multi-episode
  files via start/end inclusive range, with count/numbers/is_single
  helpers), ReleaseMode enum (PACK = N video files direct in the
  season folder, EPISODIC = N sub-folders).
- entities: TrackProfile, EpisodeRelease, SeasonRelease (with
  episode_count() summing each EpisodeRange.count()), SeriesRelease
  (tmdb_id primary anchor, optional imdb_id secondary), MovieRelease.
  All frozen dataclasses.
- builders: SeasonReleaseBuilder + SeriesReleaseBuilder mirroring
  the v1 TVShowBuilder pattern. Builders sort episodes by range
  start on emit and reject overlapping ranges (two files claiming
  the same TMDB slot). from_existing() seeds a builder from an
  existing frozen aggregate for round-trip edits.
- repositories: abstract ports (SeriesReleaseRepository,
  MovieReleaseRepository); concrete .alfred sidecar impls arrive
  in Phase 2.

New shared VO alfred/domain/shared/value_objects.py::TmdbId — positive
int, rejects bool/str/float, symmetric with the existing ImdbId VO.

73 unit tests cover VO validation, entity invariants, builder sort
+ overlap detection, and from_existing() round-trips.

v1 code paths are untouched at this stage; the new domain coexists
with the old TVShow aggregate until Phase 3 refactors it.
2026-05-25 15:19:23 +02:00
francwa de7030fa9c feat(library): add rescan_show orchestrator + walker (Step 4)
Step 4 of specs/dot_alfred.md — rebuild a TVShow aggregate from disk
by reusing the existing release pipeline (inspect_release) on every
video file in a show folder, then persist via the .alfred repository.

- alfred/application/library/walker.py — pure structural walk
  (season folders detected via \bS\d{1,2}\b regex, video files
  filtered against kb.video_extensions, no recursion).
- alfred/application/library/rescan.py — orchestrator that ingests
  each season folder, infers PACK vs EPISODIC from on-disk file
  count + parser output, and assembles via TVShowBuilder. Episode
  paths stored relative to show_root. Logs + skips corrupt input
  (no season parsed, mixed season numbers, unparseable episodes).
- Season now inherits MediaWithTracks: PACK seasons carry
  season-level audio_tracks / subtitle_tracks; EPISODIC seasons
  leave them empty (tracks live per-episode). SeasonBuilder gains
  set_audio_tracks / set_subtitle_tracks; bridge writes/reads them
  in the PACK branch via shared _synth_* helpers.

Out of scope, tracked as tech debt: adjacent .srt capture, multi-
episode (episode_end), TMDB-driven PACK detection (the current
heuristic '1 file == PACK' is a placeholder until ShowTracker lands).

18 new tests (11 walker + 7 rescan integration) on tmp_path with
the Foundation layout. Full suite: 1149 passed.
2026-05-24 15:22:18 +02:00
francwa 3622c95154 chore(lint): Lint the shit out of it 2026-05-24 15:21:58 +02:00
francwa c7c11180d9 feat(persistence): add DotAlfredTVShowRepository (filesystem-backed)
Step 3 of specs/dot_alfred.md. Concrete TVShowRepository
implementation reading and writing per-show .alfred YAML files under
a configurable library_root. Writes are atomic (.alfred.tmp +
os.replace), reads tolerate corrupted/wrong-schema sidecars (log +
skip), and the repo never invents a folder name — save(show)
requires the target folder to exist beforehand (raises
ShowFolderUnknown otherwise), matching the spec's
MediaOrganizer-then-sidecar split.

Cold folders without a sidecar are skipped by find_all and yield
None from find_by_imdb_id — the upcoming rescan_show tool (step 4)
will own the opt-in rebuild path.

A small bridge module translates between the rich domain TVShow
(AudioTrack/SubtitleTrack with full ffprobe minutiae) and the
compact sidecar shape (language-only audio, embedded-only subs with
type derived from is_forced). The bridge is intentionally lossy on
probe details the sidecar does not store, per the spec's
factual-only philosophy.

20 integration tests on tmp_path: round-trip save/find,
cold-folder/unknown-id returns, find_all skipping
(corrupted/schema-violating sidecars), delete/exists, atomic write
(no .alfred.tmp leftover), overwrite, and folder-name fallbacks
(get_folder_name guess + full-scan rescue when renamed).
2026-05-22 17:16:41 +02:00
francwa b0e275bd11 feat(persistence): add .alfred sidecar serializer (DTO ↔ dict)
Step 2 of the specs/dot_alfred.md plan. Pure-dict in/out
(serialize(sidecar) -> dict, deserialize(data) -> ShowSidecar);
YAML I/O lives in the repository layer (step 3) and is kept out
for trivial testability.

DTOs mirror the YAML schema field-for-field:
- ShowSidecar (root: imdb_id, tmdb_id, schema_version, seasons)
- SeasonSidecar (number, path, optional audio/subtitles, optional episodes)
- EpisodeSidecar (number, path, optional audio/subtitles)
- SubtitleEntry (language, source, type)

The sidecar acts as a scan cache: it stores only what is genuinely
costly to recompute — folder/file paths (skipping the FS walk) and
probed track metadata (skipping ffprobe). Release identifiers
(group, source, quality, codec) live in folder/file names and are
derived on demand by the parser; they are deliberately absent from
the schema and rejected as unknown keys on deserialize.

The serializer is strict on schema: unknown keys at any level raise
SidecarSchemaError, missing required fields raise clearly, and bool
cannot sneak in as a season/episode number. Optional fields
(tmdb_id, empty audio/subtitles/episodes) are omitted from the
output rather than emitted as null / [].

Tests cover round-trip equivalence (DTO → dict → DTO and DTO → YAML
text → DTO), the Foundation S01 PACK case (real-world fixture with
mixed sub types — superset captured at season scope), and a
Breaking Bad S05 EPISODIC case. An on-disk tmp_path fixture
recreates the Foundation folder structure with placeholder files,
ready to be reused by the upcoming repository walk tests in step 3.
2026-05-22 16:56:56 +02:00
francwa 6c12c18a27 refactor(tv_shows): freeze aggregate, builder-only construction, drop ShowTracker fields
The TVShow aggregate is now fully immutable. TVShow, Season and Episode
are @dataclass(frozen=True), children stored as ordered tuples sorted
by number. All construction goes through TVShowBuilder / SeasonBuilder
(new module), which expose from_existing() to seed from a current
frozen aggregate and apply modifications.

ShowTracker-territory fields are stripped from the domain: ShowStatus,
CollectionStatus, expected_seasons/episodes, aired_episodes,
collection_status(), is_complete_series(), missing_episodes(),
is_ongoing(), is_ended(), Season.name, the aired<=expected validation,
and the TMDB status string mapping. These will reappear in a dedicated
ShowTracker layer (to be designed) combining the .alfred sidecar with
live TMDB data.

New SeasonMode enum (PACK / EPISODIC) computed at read time from the
season's structural shape — never stored, the YAML sidecar encodes the
mode via presence/absence of the episodes: block.

Test suite for the domain entirely rewritten to cover frozen invariants,
builder ordering, last-write-wins, from_existing round-trip, and
SeasonMode derivation. Full suite still green (1078 passed).
2026-05-22 16:09:37 +02:00
francwa 1427c8a54b docs(specs): add dot_alfred sidecar design doc
First entry in the new specs/ directory. Specifies the layout and
semantics of the per-show .alfred/ sidecar that will back the future
concrete TVShowRepository:

- One .alfred/ directory per show, containing show.yaml + one
  season_NN.yaml per season (zero-padded, season_00 for Specials).
- Per-episode entries store file size + mtime so cache lookups skip
  a full ffprobe rescan when nothing changed.
- Self-healing on drift (file missing/modified/new) without raising.
- Atomic writes via temp file + os.replace().
- Phased implementation plan (builder + freeze first, then
  serializer, then cache validator, then repo, then wiring).

No code yet — spec only, awaiting review before the implementation
phases. Companion entry in CHANGELOG (Added).
2026-05-21 18:05:55 +02:00
francwa 8491edac22 infra(gitignore): track specs/ + carve out private .claude/
The repo-level .gitignore had a blanket *.md rule with only
CHANGELOG.md exempted. Two adjustments:

- Allow specs/ to be tracked (design docs / RFCs live here, public).
- Restrict the README.md exception to the root (/README.md) so that
  per-directory README files (e.g. tests/fixtures/releases/README.md)
  stay ignored as before — no unintended scope creep.
- Explicitly ignore /.claude/, the private dev-docs sub-repo that
  lives inside the working tree but is versioned and pushed
  separately.

CHANGELOG: Internal entry.
2026-05-21 18:05:33 +02:00
francwa 02e478a157 refactor(domain): freeze Movie and Episode, switch track collections to tuple
Movie and Episode become @dataclass(frozen=True, eq=False), with
audio_tracks/subtitle_tracks held as tuple[...] instead of list[...].
Identity-based equality is preserved via the existing __eq__/__hash__.
__post_init__ coercion (imdb_id, title, season_number, episode_number)
uses object.__setattr__ to stay compatible with frozen.

The MediaWithTracks mixin contract is updated to tuple accordingly.

Callers projecting enrichment results (probe output, file metadata) now
rebuild via dataclasses.replace(...) — same pattern recently adopted for
ParsedRelease.

Season and TVShow stay mutable for now: freezing the aggregate root
would cascade a full reconstruction on every add_episode, deferred.
2026-05-21 13:40:22 +02:00
francwa 3dc73a5214 feat(release): add fullwidth vertical bar | (U+FF5C) to separators
CJK release names sometimes use the fullwidth vertical bar as a token
separator, as do occasional decorative YouTube-style uploads. Adding
the codepoint to separators.yaml lets the tokenizer split on it
instead of leaving the wide pipe glued onto an adjacent token.

The tokenizer in alfred/domain/release/parser/pipeline.py iterates
the separator list as plain strings (no regex), so a multi-byte
UTF-8 separator works without any code change.
2026-05-21 08:05:56 +02:00
francwa 88f156b7a4 refactor(subtitles): rename SubtitleCandidate → SubtitleScanResult
The old name conflated 'might become a placed subtitle' with 'what a
scan pass produced'. The class is the output of a scan/identify pass —
language/format may still be None while classification is in progress,
confidence reflects classifier certainty, raw_tokens holds filename
fragments under analysis. SubtitleScanResult says that directly.

Pure rename + refreshed docstring; no behavior change. Touches the
domain entity, the matcher/identifier/utils services, the
manage_subtitles use case, the placer, the metadata store, the
shared-media cross-ref comment, and 7 test modules.
2026-05-21 08:05:46 +02:00
francwa 5107cb32c0 feat(release): InspectedResult.recommended_action centralizes exclusion decision
Add a derived 'recommended_action' property on InspectedResult that
collapses the orchestrator's go / wait / skip decision into one value:

- 'skip'      → no main_video, or media_type == 'other'
- 'ask_user'  → media_type == 'unknown', or road == 'path_of_pain'
- 'process'   → confident parse with a main video on disk

The ordering is part of the contract (skip > ask_user > process) —
documented in the property docstring.

Until now every consumer (workflows, the agent, the orchestrator
sketch) had to re-derive this from the road / media_type / main_video
triple, with subtle drift between sites. One place, one rule.

Exposed through the analyze_release tool so the LLM can route on it.
Spec YAML updated to describe the new field.

Suite: 1083 passed (+6 new tests in tests/application/test_inspect.py
covering the four branches and the precedence rules).
2026-05-21 07:54:17 +02:00
francwa b7979c0f8b refactor(release): freeze ParsedRelease + enrich_from_probe returns new instance
ParsedRelease is now @dataclass(frozen=True). The enrichment passes that
used to patch fields in place now produce new instances:

- enrich_from_probe(parsed, info, kb) returns a new ParsedRelease via
  dataclasses.replace (no allocation when no field changed).
- inspect_release rebinds 'parsed' after detect_media_type (wrapped in
  MediaTypeToken — the strict isinstance check now also runs on
  replace) and after enrich_from_probe.

languages becomes a tuple[str, ...] so the VO is properly immutable.
Parser pipeline packs languages as a tuple in the assemble dict.

Callers updated: inspect_release, testing/recognize_folders_in_downloads.py.
Tests updated: 22 enrich_from_probe call sites rebound, language
assertions switched to tuple literals, test_release_fixtures normalizes
result['languages'] back to list for YAML-fixture comparison.

Suite: 1077 passed.
2026-05-21 07:51:49 +02:00
francwa 9f1ce94690 refactor(application): inject kb/prober into resolve_destination use cases
Remove the module-level _KB / _PROBER singletons from
alfred/application/filesystem/resolve_destination.py. The four
resolve_{season,episode,movie,series}_destination use cases now take
kb: ReleaseKnowledge and prober: MediaProber as required arguments,
matching the shape of inspect_release.

The singletons now live at the agent-tools frontier
(alfred/agent/tools/filesystem.py), where the LLM-facing wrappers
instantiate YamlReleaseKnowledge / FfprobeMediaProber once and thread
them through. The wrappers' Python signatures are unchanged — the
inspect-based JSON-schema generator in agent/registry.py still sees the
same LLM-passable params.

analyze_release drops the dirty 'from ... import _KB' indirection.

Tests inject their own stubs by keyword (prober=_StubProber(...)) via
thin convenience wrappers, replacing the prior
monkeypatch.setattr(rd, '_PROBER', ...) pattern.

testing/debug_release.py: instantiate YamlReleaseKnowledge() /
FfprobeMediaProber() inline at the two call sites.

Suite: 1077 passed.
2026-05-21 07:46:13 +02:00
francwa 5e0ed11672 refactor(release): rename ParsePath enum to TokenizationRoute
ParsePath collided with pathlib.Path in mental models, and was one
letter from the parse_path attribute that stores its value — confusion
on confusion. Road (EASY/SHITTY/PATH_OF_PAIN) is the parser-confidence
axis; TokenizationRoute (DIRECT/SANITIZED/AI) is the tokenization-method
axis. They're orthogonal and the new name makes that obvious.

Field name parse_path stays — it's the right name for the attribute
that *holds* the route. String values ("direct", "sanitized", "ai")
stay too, so YAML fixtures and the analyze_release tool spec are
unchanged. Only the type symbol changes:

- value_objects.py: class rename + docstring spelling out orthogonality
  with Road.
- services.py: 3 call sites.
- scoring.py: docstring cross-reference updated.
- tests/domain/release/test_parser_v2_scoring.py: import + 3 call sites.
2026-05-21 07:39:42 +02:00
francwa 0246f85ef8 refactor(release): move codec mappings from code to YAML knowledge
The three module-level dicts in enrich_from_probe (ffprobe codec name
to scene token, channel count to layout) were exactly the kind of
domain lookup table CLAUDE.md says belongs in YAML, not in Python.
Move them to alfred/knowledge/release/probe_mappings.yaml, load
through a new ReleaseKnowledge.probe_mappings port field, and add a
kb parameter to enrich_from_probe so the consumer reads the maps via
the same injection pattern as everything else.

- New knowledge file: alfred/knowledge/release/probe_mappings.yaml
- New loader: load_probe_mappings() in infrastructure/knowledge/release.py
  (normalizes channel-count keys back to int).
- Port: ReleaseKnowledge gains probe_mappings: dict.
- Adapter: YamlReleaseKnowledge populates it at __init__.
- Consumer: enrich_from_probe(parsed, info, kb) reads the three sub-maps
  from kb.probe_mappings; unknown codecs still fall back to uppercase
  raw value, same behaviour as before.
- Call sites updated: inspect_release passes kb through; the testing
  script gets its kb wiring (it was already broken since the
  ReleaseKnowledge refactor); all 22 enrich_from_probe call sites in
  tests/application/test_enrich_from_probe.py pass _KB.
2026-05-21 07:37:42 +02:00
francwa e62dc90bd1 refactor(release): make tech_string a derived property
ParsedRelease.tech_string was a stored str field re-computed in two
places (assemble() at parse time, enrich_from_probe() after the probe).
The second site was a reactive fix (e79ca46) for filename builders that
saw a stale value. Turn it into an @property so it stays in sync with
quality/source/codec by construction.

- Drop the field from the dataclass + the key from assemble()'s dict.
- Drop tech_string="" from parse_release's malformed-name fallback.
- Drop the manual recomputation at the end of enrich_from_probe.
- Inject the property into asdict() result in the fixtures runner
  (same treatment as is_season_pack).
- Update tests that passed tech_string= to the constructor; rewrite the
  TestTechString case that mutated p.tech_string manually.
2026-05-21 07:33:53 +02:00
francwa 688c37bbec docs(changelog): recap session 2026-05-20 tech-debt cleanup
Consolidate the five domain-purity refactors of the session under
[Unreleased]: RuleScopeLevel enum, FilePath VO post_init, Language
strict + from_raw, ParsedRelease.normalised → clean, ParsedRelease
enum strictness. Removes the duplicate min_movie_size_bytes entry
(now sits under its proper Removed section).
2026-05-20 23:57:06 +02:00
francwa 757e4045ee refactor(release): ParsedRelease.media_type & parse_path are strict enums
The fields were already typed as MediaTypeToken / ParsePath, but a
tolerant __post_init__ coerced raw strings into their enum form. With
MediaTypeToken(str, Enum) (and ParsePath idem), the coercion served no
purpose — callers that pass '.value' got back the enum anyway, and
callers that pass an unknown string got a ValidationError just like
they would now.

Strict mode: constructor rejects non-enum values directly. The two
in-tree builders (parse_release() and the parser pipeline) already
produce enum values; all .value sites have been removed. Drops the
unused _VALID_MEDIA_TYPES / _VALID_PARSE_PATHS lookup tables.
2026-05-20 23:52:30 +02:00
francwa c3767aacb6 refactor(release): rename ParsedRelease.normalised → clean
Le champ s'appelait normalised mais ne faisait pas la normalisation
suggérée par son nom (dots instead of spaces). En pratique il contient
raw - site_tag - apostrophes, qui sert uniquement à season_folder_name()
via _strip_episode_from_normalized. Renommé en 'clean' qui décrit ce
qu'il contient réellement, docstring corrigée.
2026-05-20 23:50:05 +02:00
francwa 5bcf22b408 refactor(shared): Language VO is strict; from_raw() factory for un-normalized input
object.__setattr__ inside __post_init__ on a frozen dataclass is a
code smell — it bypasses the immutability guarantee to mutate fields
mid-construction. Split the responsibilities:

* Direct constructor is strict — rejects un-normalized input (uppercase
  iso, whitespace in aliases, etc.) so once a Language exists in the
  system, its fields are guaranteed canonical.
* Language.from_raw() factory handles arbitrary YAML/user input — it
  lowercases the iso, dedups/normalizes aliases, then constructs.

Only caller that built from raw data (LanguageRegistry loading YAML)
moves to from_raw(). Test fixtures already pass normalized data so
they keep using the direct constructor.
2026-05-20 23:48:30 +02:00
francwa cfa9f54d9f refactor(shared): FilePath VO uses __post_init__ instead of custom __init__
Custom __init__ on a @dataclass(frozen=True) is a code smell — it
bypasses the generated dataclass __init__ and re-implements the
str/Path coercion + frozen-aware setattr by hand. Replaced with a
single __post_init__ that performs the same normalization. Same
public API (FilePath(str) and FilePath(Path) both work), same
behavior, no callers touched.
2026-05-20 23:47:03 +02:00
francwa f0aaf50c97 refactor(subtitles): RuleScope.level → RuleScopeLevel enum
Six niveaux possibles (global, release_group, movie, show, season,
episode) étaient passés en str libre, le commentaire docstring servant
de seule documentation. Introduit RuleScopeLevel(str, Enum) — toujours
sérialisable en YAML, mais le set fixe est désormais imposé par le
typage. to_dict() sort explicitement .value pour rester safe côté
écrivains YAML.
2026-05-20 23:46:22 +02:00
francwa a09262b33f chore(settings): remove unused min_movie_size_bytes
Le champ + son validator étaient orphelins depuis la suppression
de MovieService.validate_movie_file. L'exclusion par extension
(application/release/supported_media.py) + le PoP couvrent désormais
la règle 'vrai film vs sample'. Si on a un jour besoin d'un seuil de
taille, il ira dans data/knowledge/, pas dans settings.
2026-05-20 23:41:41 +02:00
francwa 9c7cd66d2b Merge branch 'refactor/flatten-shared-media' 2026-05-20 23:35:52 +02:00
francwa 83dbed887b refactor(domain): flatten shared/media package into single module
Six small files (audio, video, subtitle, info, matching, tracks_mixin
+ __init__) collapsed into one ~250 LoC media.py module. Python treats
media.py and media/__init__.py interchangeably, so the 12 import sites
that read 'from alfred.domain.shared.media import ...' continue to work
without changes.

Reasoning: the whole bounded context fits on one screen; splitting into
sub-modules added more navigation friction than it saved. Tests stay
green (1077 passed).
2026-05-20 23:35:49 +02:00
francwa 0c9489e16b Merge branch 'feat/parser-phase-d' 2026-05-20 23:30:36 +02:00
francwa 621bb96995 fix(release/parser): pre-strip apostrophes so titles like Don't parse cleanly
Apostrophes are in the forbidden-chars list, which made any release
with a title like "Don't" or "L'avare" short-circuit to the AI
fallback (parse_path=ai, everything UNKNOWN). They are now stripped
up front from the name before the well-formed check and tokenize,
so the parse completes normally. The raw name is preserved on the
VO; only the title field loses its apostrophe.

parse_path becomes 'sanitized' when an apostrophe was stripped, to
surface that the parser cleaned something up.

Fixtures updated:
- shitty/honey_uhd_hdr/ — went from total UNKNOWN to a clean parse
  (title=Honey.Dont, year=2025, quality=2160p, source=WEBRip,
  codec=x265, group=Amen).
- path_of_pain/the_prodigy_full_chaos/ — went from total failure to
  partial success (title, year, source, codec extracted). Remaining
  gaps (1080i, multi-word audio, Blu-ray-with-dash) are tracked
  separately in tech debt.
2026-05-20 23:29:10 +02:00
francwa 448ef3b79c fix(release/parser): recognize Sxx-yy season range as tv_complete
`Der.Tatortreiniger.S01-06.GERMAN...` previously parsed as a movie
with 'S01-06' glued to the title. The parser now matches the
season-range form in _parse_season_episode (returning season=first,
episode=None), and the assemble step detects the range token to
promote media_type to 'tv_complete'.

The first season is exposed as `season` so `is_season_pack`
fires (season is not None and episode is None) — useful for routing
to a series root folder.

Fixture shitty/tatortreiniger_flat_multiseason/ updated:
- title: Der.Tatortreiniger.S01-06 → Der.Tatortreiniger
- season: null → 1
- media_type: movie → tv_complete
- is_season_pack: false → true
2026-05-20 23:26:40 +02:00
francwa b1c7f35ffb fix(release/parser): drop pure-punctuation TITLE tokens at assembly
Releases using ' - ' as a separator (Vinyl - 1x01 - FHD) tokenize to
['Vinyl', '-', '1x01', '-', 'FHD'] — the standalone '-' tokens were
ending up in title_parts and leaked into the joined title
('Vinyl.-'). We can't add '-' to the separator list (it would break
codec-GROUP), so we filter at assembly: a TITLE token with no
alphanumeric characters carries no title content.

Side win: same logic eliminates the UTF-8 wide-pipe '|' from the
khruangbin_yt_wide_pipe fixture title.

Fixtures updated:
- shitty/vinyl_1x01_format/expected.yaml (title: Vinyl.- → Vinyl)
- path_of_pain/khruangbin_yt_wide_pipe/expected.yaml (| dropped)
2026-05-20 23:24:40 +02:00
francwa 5bbdc9081f fix(release/parser): collapse chained multi-episode markers to full range
S14E09E10E11 previously parsed to episode=9, episode_end=10 — E11
was silently dropped. The parser now takes episodes[-1] as
episode_end so the full chain is captured (episode=9, episode_end=11).
Intermediate values stay implied.

Fixture shitty/archer_multi_episode/ updated from anti-regression of
the bug to anti-regression of the fix.
2026-05-20 23:23:08 +02:00
francwa 5d7b214af2 Merge branch 'refactor/language-port' 2026-05-20 23:20:18 +02:00
francwa 18267d0165 refactor(language): LanguageRepository port + SubtitleKnowledgeBase wired to it
Mirror the MediaProber / FilesystemScanner pattern for language lookup:

- New Protocol `LanguageRepository` in alfred.domain.shared.ports
  covering from_iso, from_any, all, __contains__, __len__ — the
  surface previously coupled to the concrete LanguageRegistry.
- SubtitleKnowledgeBase types its `language_registry` parameter
  against the Protocol; the concrete LanguageRegistry stays in
  infrastructure as the YAML-backed adapter and remains the default
  when no repository is injected.
- New unit tests in tests/infrastructure/test_language_registry.py
  cover the adapter surface (from_iso, from_any, membership,
  case-insensitivity, non-string inputs).

Behaviour is unchanged for existing callers. The split opens the
door to in-memory fakes in future tests without loading the full
ISO 639 YAML.
2026-05-20 23:18:25 +02:00
francwa 19fe8a519a Merge branch 'feat/release-inspect-orchestrator'
Inspection pipeline groundwork:
- MediaProber.probe() port extension (full media inspection on the port)
- inspect_release orchestrator + InspectedResult frozen VO
- enrich_from_probe now refreshes tech_string
- resolve_*_destination use cases consume inspect_release
- detect_media_type & enrich_from_probe moved to application/release
2026-05-20 09:31:22 +02:00
francwa a0d1846ff2 refactor(release): move detect_media_type & enrich_from_probe to application/release
Both helpers are inspection-pipeline pieces, not filesystem use cases —
they belong next to inspect_release, not next to move_media /
resolve_destination / list_folder.

The move also kills the lazy import that was hiding inside
_resolve_parsed: alfred.application.filesystem.resolve_destination
no longer triggers a cycle through alfred.application.filesystem
__init__ when loading inspect_release. Top-level import restored.

Call sites updated: inspect.py, test_detect_media_type.py,
test_enrich_from_probe.py, testing/recognize_folders_in_downloads.py.
Module docstrings + test-file docstrings updated to match the new
location.
2026-05-20 09:29:58 +02:00
francwa 0fb59a4581 feat(filesystem): wire inspect_release into resolve_destination
The four resolve_*_destination use cases now route through a private
_resolve_parsed helper that picks the right entry point:

  - source path provided AND it exists -> inspect_release(name, path)
    runs the full pipeline (parse + media-type refinement + probe
    + enrich), so missing tech tokens (quality, codec, ...) get
    filled by ffprobe and the refreshed tech_string lands in the
    destination folder / file names.

  - source path missing or absent       -> parse_release(name) only,
    same behavior as before. Back-compat: tests using fake /dl/*.mkv
    paths still pass unchanged.

resolve_episode_destination / resolve_movie_destination reuse their
existing source_file parameter as the inspection target. The two
folder-move use cases (season / series) gain a new OPTIONAL
source_path parameter — threaded through the agent tool wrappers
and documented in the YAML specs.

The lazy import inside _resolve_parsed avoids a circular import:
inspect_release imports detect_media_type / enrich_from_probe from
the same application.filesystem package whose __init__ re-exports
resolve_destination.

Three new tests in TestProbeEnrichmentWiring with a stub MediaProber
prove the wiring: movie picks up probe quality, season picks it up
via source_path, and a missing path correctly skips probe (back-compat
guard).
2026-05-20 09:26:30 +02:00
francwa e79ca462b8 fix(release): refresh tech_string after enrich_from_probe
enrich_from_probe fills None fields on ParsedRelease (quality, source,
codec, audio_*, languages) but left tech_string at its parser-time
value — so the filename builders (movie_folder_name, episode_filename,
…) saw stale tech tokens even after a successful probe.

Re-derive tech_string the same way the parser does — quality.source.codec
joined by dots, skipping None — at the end of enrich_from_probe. Token-
level values still win because enrich only fills None fields.

Four new tests in TestTechString cover: enrichment rebuilds it,
existing source survives, no-info input leaves it untouched, fully
empty parsed produces ''.
2026-05-20 09:26:09 +02:00
francwa 03aa844d7d feat(release): inspect_release orchestrator + InspectedResult VO
New application-layer entry point that composes the four inspection
layers in one call:

  1. parse_release(name, kb)              -> (ParsedRelease, ParseReport)
  2. detect_media_type(parsed, path, kb)  -> patch parsed.media_type
  3. find_main_video(path, kb)            -> Path | None (top-level scan)
  4. prober.probe(video) + enrich         -> when video exists and
                                             media_type not in
                                             {unknown, other}

Returns a frozen InspectedResult(parsed, report, source_path,
main_video, media_info, probe_used). kb and prober are injected — no
module-level singletons in inspect.py.

analyze_release tool now delegates to inspect_release; its output
gains two fields, confidence (0-100) and road (easy/shitty/path_of_pain),
surfaced from ParseReport so the LLM can route by confidence. Spec
updated to document them.

12 new tests covering happy paths, probe gating (no video, media_type
'other', probe failure), mutation contract (detect refining
parsed.media_type, enrich filling None fields), resilience
(nonexistent path), and frozen contract. Suite: 1058 passing.
2026-05-20 09:15:29 +02:00
francwa c303efea48 refactor(probe): consolidate full probe() into MediaProber port
Add probe(video) -> MediaInfo | None to the MediaProber Protocol and
implement it on FfprobeMediaProber. The standalone
alfred/infrastructure/filesystem/ffprobe.py module is removed; all
callers (analyze_release / probe_media tools, testing scripts) now go
through the adapter.

Tests for the probe path moved to tests/infrastructure/test_ffprobe_prober.py
(patching subprocess.run at the adapter module level).

Unblocks the upcoming inspect_release orchestrator, which needs the
port — not a free function — to compose parse + main-video selection
+ probe in one shot.
2026-05-20 09:11:24 +02:00
francwa 5db350a1df Merge branch 'feat/release-parser-scoring' 2026-05-20 08:47:38 +02:00
francwa 12dc796ea2 docs(changelog): freeze confidence scoring + exclusion work block 2026-05-20 08:47:29 +02:00
francwa 9ddd85929e feat(release): pre-pipeline exclusion helpers
Add the application-layer helpers that decide which files are worth
parsing, sitting one notch above parse_release.

- is_supported_video(path, kb): extension-only check against
  kb.video_extensions. Lowercased suffix lookup. Directories and
  broken symlinks return False.
- find_main_video(folder, kb): top-level scan only (no recursion into
  subdirectories — releases that wrap their video in Sample/ are
  PATH_OF_PAIN territory). Lexicographically-first eligible file wins
  when several qualify (deterministic, no size-based ranking). A bare
  file as folder argument is supported for single-file releases.

No size threshold and no filename heuristics ('sample' / 'trailer'):
the parser's job is to extract structure, not to second-guess
non-standard release shapes. PoP catches the rest.

17 tests under tests/application/test_supported_media.py.
2026-05-20 01:34:32 +02:00
francwa ed7680b58f docs(changelog): log parse-confidence scoring + ParseReport tuple 2026-05-20 01:21:47 +02:00
francwa b4c9efd13b feat(release): parse_release returns (ParsedRelease, ParseReport)
Wire the scoring foundations into the parser entry point. parse_release
now returns a tuple — the structural ParsedRelease and a diagnostic
ParseReport carrying confidence (0-100), road
(EASY / SHITTY / PATH_OF_PAIN), the residual UNKNOWN tokens, and the
list of critical fields that couldn't be filled.

EASY is decided structurally (a group schema matched), independently
of the score. SHITTY vs PATH_OF_PAIN is decided by score against the
60 cutoff from scoring.yaml. Malformed names (forbidden chars) emit a
zero-confidence PoP report and short-circuit to parse_path=AI as
before.

ParsePath stays as-is (DIRECT / SANITIZED / AI) — it records *how* we
tokenized, not how confident we are. The two dimensions are now
properly separated.

Call sites propagated:
- alfred/application/filesystem/resolve_destination.py (4 occurrences)
- alfred/agent/tools/filesystem.py
- tests/domain/test_release.py
- tests/domain/test_release_fixtures.py
- tests/application/test_detect_media_type.py

New tests/domain/release/test_parser_v2_scoring.py (22 cases) locks
ParseReport validation, compute_score arithmetic, decide_road
thresholding, the collector helpers, and the end-to-end tuple contract.
2026-05-20 01:21:30 +02:00
francwa 98c688f29b feat(release): foundations for parse-confidence scoring
Add the building blocks for Phase A scoring without yet wiring them
into parse_release. Nothing changes at runtime — parse_release still
returns a single ParsedRelease — but the pieces needed to upgrade it
in a follow-up commit are now in place.

- alfred/knowledge/release/scoring.yaml: weights / penalties /
  thresholds. Title and media_type are heavy (30 / 20), structural
  fields medium (year 15, season 10), tech fields light (5 each).
  Unknown-token penalty 5 capped at -30. SHITTY/PoP cutoff at 60.
- load_scoring() loader with safe defaults baked in: a missing or
  partial YAML only de-tunes, never breaks.
- ReleaseKnowledge port grows a 'scoring: dict' field. YamlReleaseKnowledge
  populates it from load_scoring().
- New parser/scoring.py module with Road enum (EASY / SHITTY /
  PATH_OF_PAIN, distinct from ParsePath which records the tokenization
  route), and pure functions: compute_score, decide_road,
  collect_unknown_tokens, collect_missing_critical.
- ParseReport frozen VO in value_objects.py — exported alongside
  ParsedRelease.
2026-05-20 01:21:17 +02:00
francwa fcd80763e2 Merge branch 'refactor/release-parser-v2' 2026-05-20 01:08:20 +02:00
francwa 629387591f docs(changelog): freeze release parser v2 work block (2026-05-20) 2026-05-20 01:08:17 +02:00
francwa 230a7ab88a docs(changelog): log SHITTY simplification + distributor split 2026-05-20 01:03:52 +02:00
francwa 3737f66851 refactor(release): simplify SHITTY to dict-driven token tagging
Replace the ~480-line legacy heuristic block in services.py with a
small dict-driven pass in pipeline._annotate_shitty: each token is
looked up against the kb buckets (resolutions / sources / codecs /
distributors / year / sxxexx) with first-match-wins semantics, the
leftmost contiguous UNKNOWN run becomes the title, done.

SHITTY's scope is intentionally narrow — releases that *look* like
scene names but don't have a registered group schema. Anything more
exotic (parenthesized tech, bare-dashed title fragments, YT slugs,
franchise boxes) is PATH OF PAIN territory and stays out of here.

- annotate() no longer returns None; SHITTY is the always-on fallback
- services.py shrunk from ~525 to ~85 lines (legacy extractors gone)
- 4 fixtures get xfail markers documenting PoP-grade pathologies
  (deutschland franchise box, sleaford YT slug, super_mario bilingual,
  predator space-separators — the last one moved from shitty/ → pop/)
- ReleaseFixture grows xfail_reason; the parametrized suite wires the
  pytest.mark.xfail(strict=False) automatically
2026-05-20 01:03:25 +02:00
francwa fd3bd1ad8c feat(release): distinguish streaming distributors from sources
Introduce a separate dimension for streaming-platform tags (NF, AMZN,
DSNP, HMAX, ATVP, …) so they stop polluting the encoding-source field.
WEB-DL is the source; the platform that released it is the distributor.

- new distributors.yaml knowledge file
- ReleaseKnowledge port exposes distributors set
- TokenRole.DISTRIBUTOR + ParsedRelease.distributor field
- removed NF/AMZN/DSNP/HMAX/ATVP from sources.yaml
- notre_planete fixture now records distributor: NF
2026-05-20 01:03:11 +02:00
francwa 7dc7f0c241 feat(release): v2 enricher pass for audio/video-meta/edition/language
The EASY pipeline now extracts the full ParsedRelease surface from
known-group releases, not just the structural backbone. Behavior is
unchanged for releases that don't carry these tokens.

Pipeline (parser/pipeline.py):
- Structural walk (renamed _annotate_structural): no longer requires
  body to be fully consumed. Tokens passed over between schema chunks
  remain UNKNOWN so the enricher pass can claim them.
- _find_chunk(): scans forward in the body for the next token matching
  a given role, skipping already-annotated tokens. Lets optional and
  mandatory chunks both tolerate intercalated enricher tokens.
- _annotate_enrichers(): new non-positional pass. Walks UNKNOWN tokens
  and tags AUDIO_CODEC / AUDIO_CHANNELS / BIT_DEPTH / HDR / EDITION /
  LANGUAGE. Multi-token sequences from kb.audio / kb.video_meta /
  kb.editions are matched first (longest-first ordering preserved from
  the YAML), single tokens after.
- _apply_sequences(): mutates the token list, tagging the first token
  of a matched sequence with extra['sequence']=<canonical value> and
  trailing members with extra['sequence_member']='True' so assemble
  skips them.
- _detect_channel_pairs(): handles the '5.1' / '7.1' case where the
  '.' separator splits the layout into two tokens. Strips a trailing
  '-GROUP' suffix on the second before joining.

Assemble:
- New fields populated: languages (list), audio_codec, audio_channels,
  bit_depth, hdr_format, edition. Each role-handler skips
  sequence_member tokens.
- media_type heuristic extended: edition in {COMPLETE, INTEGRALE,
  COLLECTION} + no season → tv_complete (mirrors legacy).

Tests:
- 4 new TestEnrichers cases covering bit_depth+audio_codec+channels,
  HDR sequence + edition sequence + TrueHD.Atmos + 7.1, multi-language
  with DTS-HD.MA sequence, TV episode with single language.
- All 14 v2 tests + 30 fixture tests still green. Suite: 1011 passed,
  8 skipped.

Refs: project_release_parser_v2_specs (memory)
2026-05-20 00:26:05 +02:00
francwa 075a827b0e feat(release): wire v2 EASY path for known release groups
The annotate-based v2 pipeline now handles releases ending in -KONTRAST,
-ELiTE, or -RARBG. Unknown groups still fall through to the legacy
SHITTY heuristic in services.py — nothing changes for them.

Pipeline (alfred/domain/release/parser/pipeline.py):
- tokenize(): string-ops separator split, strips [site.tag] first.
- annotate(): right-to-left group detection (priority to codec-GROUP
  shape, fallback to any non-source dashed token), GroupSchema lookup
  via the kb port, then lockstep walk of tokens against schema chunks.
  Optional chunks skip on mismatch, mandatory mismatches return None so
  the caller falls back gracefully. CODEC pre-consumed by a codec-GROUP
  trailing token correctly skips the CODEC chunk in the body walk.
- assemble(): folds annotated tokens into a ParsedRelease-compatible
  dict (title joined by '.', group from the codec-GROUP token's extras).

Schema (alfred/domain/release/parser/schema.py):
- GroupSchema + SchemaChunk frozen value objects.
- TokenRole.GROUP added.

Port + adapter:
- ReleaseKnowledge.group_schema(name) lookup added (case-insensitive).
- YamlReleaseKnowledge loads alfred/knowledge/release/release_groups/
  *.yaml at construction time; learned overrides in
  data/knowledge/release/release_groups/ also picked up.

Knowledge:
- release_groups/kontrast.yaml, elite.yaml, rarbg.yaml declare the
  canonical chunk_order. ELiTE marks source as optional (Foundation.S02
  has no WEBRip token).

Services:
- parse_release tries the v2 path first; on None falls through to the
  legacy implementation untouched.

Tests:
- tests/domain/release/test_parser_v2_easy.py (10 cases) cover group
  detection (codec-GROUP, dashed-source skip, no-dash → unknown),
  schema-driven annotation (movie, TV episode, season pack with
  optional source, unknown group returns None), and field assembly.
- Existing tests/domain/test_release_fixtures.py (30 cases) stay green:
  5 EASY fixtures now produced by v2, 25 SHITTY/PATH OF PAIN fixtures
  still produced by the legacy path. Verified via spy on v2.assemble.

Suite: 1007 passed, 8 skipped.

Refs: project_release_parser_v2_specs (memory)
2026-05-20 00:21:11 +02:00
francwa a2c917618f feat(release): scaffold v2 parser package (annotate-based pipeline)
New package alfred/domain/release/parser/ lays the foundation for the
release parser refactor (specs in memory). Exposes:

- Token: frozen VO carrying text + stream index + TokenRole + extra dict.
  with_role() returns a new instance (no mutation).
- TokenRole: str-backed enum split into structural (TITLE/YEAR/SEASON_EP/
  GROUP), technical (RESOLUTION/SOURCE/CODEC/AUDIO_*/BIT_DEPTH/HDR/
  EDITION/LANGUAGE), and meta (SITE_TAG/UNKNOWN) families.
- pipeline.strip_site_tag(): pulls a [site.tag] prefix or suffix.
- pipeline.tokenize(): release name -> list[Token] (all UNKNOWN),
  string-ops split on kb.separators (no regex, per CLAUDE.md).
- pipeline.annotate(): documented stub. Walk order recorded in docstring
  (group right-to-left, then season/episode, year, tech, title).

Legacy parse_release in release.services remains the live implementation
until the annotate step lands. Scaffolding tests verify Token API,
site-tag stripping (prefix/suffix), and tokenize output shape.

Refs: project_release_parser_v2_specs (memory)
2026-05-20 00:12:33 +02:00
francwa 9f10f4e0ad Merge branch 'refactor/domain-release-knowledge'
Final DDD purification of the release parser. Domain layer no longer
imports anything from infrastructure, no YAML at import time, and
ParsedRelease's filesystem-builders are pure (Option B).

- ReleaseKnowledge Protocol port + YamlReleaseKnowledge adapter
- parse_release(name, kb) explicit injection
- ParsedRelease.title_sanitized field; builders accept already-safe strings
- Callers (resolve_destination, detect_media_type, find_video,
  analyze_release) thread the kb through
- 987 tests pass
2026-05-19 22:05:36 +02:00
francwa cd814c7922 docs(changelog): log refactor/domain-release-knowledge work block 2026-05-19 22:05:29 +02:00
francwa 6802933acd test(release): adapt suite to explicit ReleaseKnowledge injection
- test_release.py / test_release_fixtures.py: module-level
  _KB = YamlReleaseKnowledge() + thin _parse(name) helper threading it
  into parse_release. test_show_folder_name_strips_windows_chars renamed
  to test_show_folder_name_uses_already_safe_title to reflect the
  Option B contract (caller sanitizes via kb.sanitize_for_fs).
- test_detect_media_type.py: same _KB pattern, all
  detect_media_type(parsed, path) calls now pass kb.
- test_filesystem_extras.py: find_video_file(path) calls now pass kb.
- test_enrich_from_probe.py: _bare() helper adds the new
  title_sanitized field.
- test_resolve_destination.py: drop _sanitize import + TestSanitize
  class (helper deleted), add tmdb_title_safe arg to
  _resolve_series_folder calls.

987 passed, 8 skipped.
2026-05-19 22:05:26 +02:00
francwa bf37a9d09e refactor(release): thread ReleaseKnowledge through callers
Wires the new explicit-kb signatures into every caller of the release
parser and the filesystem-extension helpers.

- application/filesystem/resolve_destination.py: module-level singleton
  _KB: ReleaseKnowledge = YamlReleaseKnowledge(); each use case now calls
  parse_release(release_name, _KB) and sanitizes TMDB strings via
  _KB.sanitize_for_fs(...) before passing them to the pure ParsedRelease
  builders. Local _sanitize helper + _WIN_FORBIDDEN regex dropped.
- application/filesystem/detect_media_type.py: signature is now
  detect_media_type(parsed, source_path, kb); uses kb.metadata_extensions,
  kb.video_extensions, kb.non_video_extensions.
- infrastructure/filesystem/find_video.py: find_video_file(path, kb) uses
  kb.video_extensions instead of an imported constant.
- agent/tools/filesystem.py::analyze_release imports the application _KB
  singleton and passes it through to parse_release / detect_media_type /
  find_video_file.
2026-05-19 22:05:19 +02:00
francwa 4a74fff9cc refactor(release): purify domain — parse_release(name, kb) + ParsedRelease Option B
Removes the last domain → infrastructure leak in the release parser.

services.py:
- parse_release(name, kb) takes the knowledge as an explicit parameter.
- Every helper (_tokenize, _is_well_formed, _extract_tech,
  _extract_languages, _extract_audio, _extract_video_meta,
  _extract_edition, _extract_title, _infer_media_type) takes kb.
- No more module-level YAML loading.

value_objects.py — Option B:
- Sanitization happens once at parse time; ParsedRelease now carries
  a title_sanitized: str field alongside title.
- Builder methods (show_folder_name, episode_filename, movie_folder_name,
  movie_filename) become pure: they accept already-sanitized
  tmdb_title_safe / tmdb_episode_title_safe arguments. Callers at the
  use-case boundary sanitize via kb.sanitize_for_fs(...) before passing in.
- All domain-knowledge constants removed (_RESOLUTIONS, _SOURCES, _CODECS,
  _AUDIO, _VIDEO_META, _EDITIONS, _HDR_EXTRA, _MEDIA_TYPE_TOKENS,
  _LANGUAGE_TOKENS, _FORBIDDEN_CHARS, _*_EXTENSIONS, _WIN_FORBIDDEN_TABLE,
  _sanitize_for_fs). The module is now pure DDD.
2026-05-19 22:05:10 +02:00
francwa c3a3cb50c9 refactor(release): introduce ReleaseKnowledge Protocol port + YamlReleaseKnowledge adapter
Adds the port/adapter pair that lets the release domain consume parsing
knowledge without importing infrastructure or loading YAML at import time.

- alfred/domain/release/ports/knowledge.py declares the read-only query
  surface: token sets (resolutions, sources, codecs, language_tokens,
  forbidden_chars, hdr_extra), structured dicts (audio, video_meta,
  editions, media_type_tokens), separators list, file-extension sets,
  and sanitize_for_fs(text).
- alfred/infrastructure/knowledge/release_kb.py loads every YAML once
  at construction and exposes them as attributes, with an immutable
  str.maketrans table backing sanitize_for_fs.

No domain code is wired to the port yet — that lands in the next commit.
2026-05-19 22:05:01 +02:00
francwa 14941d47c0 Merge branch 'refactor/domain-io-extraction'
Extract all I/O (subprocess, filesystem, YAML loading) from the domain
layer via ports/adapters. domain/subtitles/ now has zero imports from
infrastructure/. The remaining domain → infra leak (release knowledge
loaded at import time) is documented in tech-debt for a dedicated branch.
2026-05-19 15:16:59 +02:00
francwa df798f55cc refactor(subtitles): introduce SubtitleKnowledge Protocol port
Domain services (SubtitleIdentifier, PatternDetector) used to import the
concrete SubtitleKnowledgeBase class directly from infrastructure for
their type hint. With this commit they depend on a structural Protocol
in alfred/domain/subtitles/ports/knowledge.py declaring just the 7
read-only query methods the domain actually consumes.

The concrete YAML-backed SubtitleKnowledgeBase in infrastructure remains
the sole adapter — no rename, no shim. With this change
alfred/domain/subtitles/ has zero imports from alfred/infrastructure/.

Also extend the changelog entry covering the full domain-io-extraction
branch.
2026-05-19 15:15:43 +02:00
francwa 535935cc73 docs(changelog): summarize refactor/domain-io-extraction work block 2026-05-19 15:11:17 +02:00
francwa 6e252d1e81 refactor(subtitles): inject default rules into SubtitleRuleSet.resolve()
aggregates.py used to call SubtitleKnowledgeBase().default_rules() via a
DEFAULT_RULES() helper, which silently pulled the infrastructure layer
(YAML loader) into the domain on every resolve.

Make the dependency explicit: resolve() now takes the default rules as
a parameter, and the caller (the ManageSubtitles use case) loads them
from the KB once and passes them in. Domain stays I/O-free.

- Drop DEFAULT_RULES helper and the SubtitleKnowledgeBase import from
  alfred/domain/subtitles/aggregates.py
- SubtitleRuleSet.resolve(default_rules: SubtitleMatchingRules)
- manage_subtitles use case passes kb.default_rules() at the call site
- Tests use a local SubtitleMatchingRules stand-in instead of relying
  on KB defaults
2026-05-19 15:10:06 +02:00
francwa 903e9e7117 refactor(subtitles): move SubtitlePlacer to application layer
The placer performs filesystem I/O (os.link) — it belongs in the
application layer, not the domain. Domain services should be pure.

- Move alfred/domain/subtitles/services/placer.py to
  alfred/application/subtitles/placer.py
- Move tests/domain/test_subtitle_placer.py to
  tests/application/test_subtitle_placer.py
- Update all callers (manage_subtitles use case, metadata store, tests)
- Drop placer re-exports from domain.subtitles.services.__init__
2026-05-19 15:07:39 +02:00
francwa 9556bf9e08 refactor(domain): strip live filesystem I/O from VOs and entities
DDD-pure cleanup — entities and value objects no longer query the world
at read time.

  FilePath: drop .exists() / .is_file() / .is_dir(). The VO is now a
    pure address; ask the injected FilesystemScanner for live state.
  Movie:    drop .has_file() / .is_downloaded(). Invariant: when the
    application sets file_path, it has already constated the file
    exists; downstream readers trust the snapshot.
  Episode:  same — drop .has_file() / .is_downloaded().
  SubtitlePlacer: drop the pre-check .exists() calls. The placer now
    attempts os.link() and reports FileNotFoundError / FileExistsError
    as skip reasons. Removes a TOCTOU race as a bonus.

Tests adjusted: the FilePath VO method tests are gone (the methods are
gone), test_has_file_false_when_no_path replaced by a plain assertion
on file_path is None. Placer tests are unchanged — the skip-reason
strings ('not found', 'already exists') match the new try/except paths.

The 'snapshot value objects' pattern (ProbedMediaInfo, TmdbMovieInfo)
that this cleanup enables is documented in refactor_domain_io.md, to
be applied when a future use case actually needs richer metadata —
not now, no speculative VOs.
2026-05-19 14:58:59 +02:00
francwa e6ee700825 refactor(subtitles): inject MediaProber/FilesystemScanner ports into domain services
Domain services no longer call subprocess or pathlib directly. Introduces
two Protocol ports in domain/shared/ports/:

  MediaProber.list_subtitle_streams(video) -> list[SubtitleStreamInfo]
  FilesystemScanner.scan_dir / stat / read_text  -> list[FileEntry] | ...

Concrete adapters live in infrastructure/:

  FfprobeMediaProber          (wraps subprocess + ffprobe + JSON)
  PathlibFilesystemScanner    (wraps pathlib + os reads)

SubtitleIdentifier and PatternDetector now take (kb, prober, scanner) at
construction time. Their internals work over FileEntry snapshots and
SubtitleStreamInfo records — no more ad-hoc Path.is_file/iterdir/stat or
embedded subprocess.run loops. _count_entries now takes raw SRT text
(returned by scanner.read_text) so SRT-only entry counting stays out of
the FS layer.

manage_subtitles use case instantiates the two adapters once and injects
them into both services. Tests pass real adapters and patch
`alfred.infrastructure.probe.ffprobe_prober.subprocess.run` for the
ffprobe-failure cases. _classify_single tests build FileEntry via a
small helper.

Domain is now free of subprocess / direct filesystem reads in the
subtitle pipeline. The only remaining I/O hooks are FilePath VO
convenience methods (exists/is_file/is_dir) which stay as a deliberate
affordance on the value object.
2026-05-19 14:52:24 +02:00
francwa ced72547f7 refactor(knowledge): extract YAML loaders from domain to infrastructure
The domain layer no longer reads YAML files. All knowledge loaders move
from `alfred/domain/*/knowledge/` to `alfred/infrastructure/knowledge/`:

  domain/release/knowledge.py
    → infrastructure/knowledge/release.py
  domain/shared/knowledge/language_registry.py
    → infrastructure/knowledge/language_registry.py
  domain/subtitles/knowledge/{loader,base}.py
    → infrastructure/knowledge/subtitles/{loader,base}.py

Callers in domain/release/{services,value_objects}.py,
domain/subtitles/{aggregates,services/*}.py, and
application/filesystem/manage_subtitles.py updated to absolute imports.
Re-exports of KnowledgeLoader/SubtitleKnowledgeBase from
domain/subtitles/__init__.py dropped (no shim per project convention).
Tests follow the moved targets.
2026-05-19 14:35:18 +02:00
francwa f338b08706 refactor(release): type media_type/parse_path as true enums
ParsedRelease.media_type is now MediaTypeToken (not str) and parse_path
is ParsePath (not str). __post_init__ keeps a tolerant constructor that
coerces raw strings via the enum, so callers passing 'movie'/'direct'
still work transparently. Since both enums inherit from str, existing
string comparisons and JSON serialization remain unchanged.
2026-05-19 14:21:27 +02:00
francwa da484d7474 refactor(release): typed enums + __post_init__ validation on ParsedRelease
ParsedRelease accepted any string for media_type/parse_path and had no
validation on numeric ranges (season=-5 was silently accepted). Tighten
both ends:

- New str-backed Enums MediaTypeToken and ParsePath. Inherit from str so
  every existing comparison ('== "movie"'), JSON serialization, and TMDB
  DTO interop keeps working unchanged.
- ParsedRelease.__post_init__ now validates: raw/group non-empty, year in
  1888-2100, season 0-100, episode 0-9999, episode_end >= episode,
  media_type/parse_path against the enum allowlist.
- services.py uses the enum .value members everywhere instead of bare
  string literals — kills the typo risk.
2026-05-19 14:17:56 +02:00
francwa 481eeb5afd refactor(domain): identity-based equality + dedup track helpers
Two related DDD fixes for Movie and Episode entities:

- Identity equality: @dataclass(eq=False) with custom __eq__/__hash__.
  Movie is identified by imdb_id, Episode by (season, episode) within
  the TVShow aggregate. Auto-generated field-by-field equality was
  incorrectly making two Movie instances with the same imdb_id but
  different audio_tracks compare unequal — breaks dedup/caching.

- MediaWithTracks mixin: the 5 audio/subtitle helpers
  (has_audio_in / audio_languages / has_subtitles_in / has_forced_subs /
  subtitle_languages) were duplicated verbatim between Movie and Episode.
  Extracted to shared/media/tracks_mixin.py; both entities now inherit.

Bonus: dropped the object.__setattr__ coercion dance in Movie.__post_init__
— the class isn't frozen so plain assignment is the right call.
2026-05-19 14:17:47 +02:00
francwa 7cd24f3a31 refactor(domain): freeze media track value objects
AudioTrack, VideoTrack, SubtitleTrack and MediaInfo are snapshots of a
single ffprobe run — model them as proper immutable value objects.

- @dataclass(frozen=True) on all four
- MediaInfo track collections become tuple[...] instead of list[...]
- ffprobe adapter rewritten to build tuples up-front instead of
  appending/setattr'ing on a constructed instance
2026-05-19 14:17:27 +02:00
francwa eb8995cfc3 refactor(subtitles): drop dead scanner module
SubtitleScanner was an earlier iteration superseded by SubtitleIdentifier
and never imported in production code (only by its own tests). Removing
both keeps the bounded context clean and shrinks the surface.
2026-05-19 14:17:15 +02:00
francwa f6eef59fca refactor: tech debt mini-pass (items 5, 6, 7, 20)
Low-risk cleanup items, no functional change to the parser. The
philosophy remains: keep the parser simple, the AI handles edge cases.

- Extract duplicated 'fs-safe title → dot-folder-name' regex into
  to_dot_folder_name() in domain/shared/value_objects.py. Used by both
  MovieTitle.normalized() and TVShow.get_folder_name() (item #5).
- ParsedRelease.languages now uses field(default_factory=list) instead
  of a manual __post_init__ assigning [] via object.__setattr__ (#6).
- tv_shows/entities.py module docstring: prepend ASCII ownership tree
  for quicker visual scan of the aggregate hierarchy (#7).
- file_extensions.yaml: split subtitle sidecars (.srt/.sub/.idx/.ass/.ssa)
  into a dedicated 'subtitle:' category instead of lumping them under
  'metadata:'. _METADATA_EXTENSIONS at the value_objects.py level remains
  the union of both — detect_media_type behavior unchanged. New loader
  load_subtitle_extensions() exposes the distinct subtitle set for future
  callers in the subtitles domain (#20).

Suite: 1020 passed, 8 skipped.
2026-05-18 16:24:28 +02:00
francwa 273510dff8 test(fixtures): seed PATH OF PAIN bucket with 10 worst-case fixtures
10 pathological release names mined from the real downloads folder.
Each fixture locks in the current parse_release output (including
its silent losses and false positives) so future parser improvements
are intentional, not silent drift.

Cases:
- Khruangbin yt-dlp slug (UTF-8 wide pipe '|', YT ID as group)
- Deutschland 83-86-89 franchise box (group=S03 misdetection)
- Chérie Le BéBé (accented chars preserved, VFF language)
- Jimmy Carr 8-word stand-up special title
- [ OxTorrent.vc ] prefix + XviD codec (site_tag prefix)
- Prodiges S12E01 with episode title + air-date silently lost
- The Prodigy: apostrophe + Blu-ray dash + 1080i + multi-word audio
  = full AI-path degeneration (everything UNKNOWN)
- Sleaford Mods yt-dlp slug (YT ID glued to year)
- Super Mario Bros [FR-EN] (bilingual tag mistaken for group)
- Gilmore Girls Complete S01-S07 (the well-behaved exception:
  COMPLETE token correctly drives tv_complete + REPACK + 10bit)

Also adds shitty + path_of_pain to the per-bucket sanity assertion.

Suite: 1020 passed, 8 skipped.
2026-05-18 15:57:56 +02:00
francwa c1831e3f46 test(fixtures): drop derry_duplicate_naming (was a copy-paste artifact)
The release name mixed two distinct releases — not a real-world case
worth anti-regression. SHITTY bucket now holds 14 fixtures (down from 15).
2026-05-18 15:51:11 +02:00
francwa aa182458b8 test(fixtures): seed SHITTY release bucket with 15 anti-regression cases
Add 15 expected.yaml fixtures under tests/fixtures/releases/shitty/
covering the awkward but real-world release names from the downloads
folder. Each fixture locks in the current parse_release behavior so
future parser changes are intentional, not silent drift.

Cases captured:
- Angel INTEGRALE 3-level hierarchy (tv_complete media_type)
- Buffy custom French title with dots preserved
- Archer S14E09E10E11 multi-episode (E11 lost — tech debt)
- Notre Planète lowercase s01e01
- Vinyl ' - 1x01 - FHD' (stray dash artifact — tech debt)
- Deutschland.83 (year-suffix as part of title)
- Tatortreiniger S01-06 range (falls to movie — tech debt)
- Derry Girls duplicated title
- Jurassic Park bare folder (media_type=unknown)
- La Nuit au Musée bilingual MULTI
- Chérie j'ai agrandi (ASCII-stripped apostrophe, parses fine)
- Honey Don't (unescaped apostrophe — full AI-path degeneration)
- Hook MULTi.SUBS movie with Subs/ folder
- Predator Badlands space separators (group=UNKNOWN — tech debt)
- Westworld S04 Subs.Only (no video file)

Each fixture also captures the future 3-flow routing (library /
torrents / seed_hardlinks) ahead of the organize_media refactor.

Suite: 1011 passed, 8 skipped.
2026-05-18 15:48:41 +02:00
francwa 774f71c8cc chore(gitignore): track CHANGELOG.md explicitly
The blanket *.md ignore was hiding CHANGELOG.md, forcing 'git add -f' on
every update. Allow-list it so the file lives under normal git tracking.
CLAUDE.md stays local (user keeps it personal until a dedicated repo).
2026-05-18 15:39:04 +02:00
francwa 7bc50fd5b8 test: add real-world release fixtures (EASY bucket)
Captures 5 canonical releases from /mnt/testipool/downloads as parametrized
fixtures under tests/fixtures/releases/easy/. Each fixture declares the
release name, expected ParsedRelease fields, original tree, and the future
routing (library / torrents / seed_hardlinks) for the upcoming organize_media
refactor.

Today only the 'parsed' section is asserted; tree is materialized into a
tmp_path to catch typos. Routing is captured ahead of the planner work — it
becomes verifiable once organize_media lands.

Cases: back_in_action (movie), slow_horses_single_ep (TV single),
foundation_season_pack (S02 + .nfo noise), long_walk_with_noise (movie +
KONTRAST.TOP.txt), sinners_yts (YTS bracket-heavy + Subs/ dir).

Also tracks CHANGELOG.md under [Unreleased] / Added.
2026-05-18 15:36:19 +02:00
francwa f17abdbaec chore: cleanup — remove shims, fix ruff warnings, ignore noisy rules
- Removed backward-compat shims _sanitise_for_fs / _strip_episode_from_normalised
  in domain/release/value_objects.py (zero callers).
- Fixed ruff warnings across the codebase:
    * PLW1510: explicit check=False on subprocess.run calls
    * PLC0415: promoted lazy imports to module top where no cycle exists
      (manage_subtitles, placer, qbittorrent/client, file_manager)
    * E402: fixed module-level import ordering in language_registry.py and
      subtitles/knowledge/loader.py
    * F841 / B007: removed unused locals (identifier.py)
    * C416: replaced unnecessary set comprehension with set() in
      release/knowledge.py
- Ruff config: ignore PLR0911/PLR0912 globally (noisy on mappers and
  orchestrator use-cases) and PLW0603 (intentional for the memory singleton).
- Updated tech debt memory: P1 done, ShowStatus actually complete (was a
  stale note).
2026-05-18 00:02:45 +02:00
francwa 1d50b63af2 Merge branch 'dev/sprint-cleanup'
Multi-week sprint: ISO 639-2/B language unification, release parser
unification + data-driven tokenizer, removal of fossil services
(movies/tv_shows/subtitles), subtitle services split into a package,
MediaInfo split, test suite expansion (990 passing).

See CHANGELOG.md [Unreleased] for the user-facing summary.
2026-05-17 23:42:05 +02:00
francwa 891ba502a2 chore: apply pre-commit auto-fixes (trim trailing whitespace, EOF) 2026-05-17 23:41:54 +02:00
francwa e07c9ec77b chore: sprint cleanup — language unification, parser unification, fossils removal
Several weeks of work accumulated without being committed. Grouped here for
clarity; see CHANGELOG.md [Unreleased] for the user-facing summary.

Highlights
----------

P1 #2 — ISO 639-2/B canonical migration
- New Language VO + LanguageRegistry (alfred/domain/shared/knowledge/).
- iso_languages.yaml as single source of truth for language codes.
- SubtitleKnowledgeBase now delegates lookup to LanguageRegistry; subtitles.yaml
  only declares subtitle-specific tokens (vostfr, vf, vff, …).
- SubtitlePreferences default → ["fre", "eng"]; subtitle filenames written as
  {iso639_2b}.srt (legacy fr.srt still read via alias).
- Scanner: dropped _LANG_KEYWORDS / _SDH_TOKENS / _FORCED_TOKENS /
  SUBTITLE_EXTENSIONS hardcoded dicts.
- Fixed: 'hi' token no longer marks SDH (conflicted with Hindi alias).
- Added settings.min_movie_size_bytes (was a module constant).

P1 #3 — Release parser unification + data-driven tokenizer
- parse_release() is now the single source of truth for release-name parsing.
- alfred/knowledge/release/separators.yaml declares the token separators used
  by the tokenizer (., space, [, ], (, ), _). New conventions can be added
  without code changes.
- Tokenizer now splits on any configured separator instead of name.split('.').
  Releases like 'The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]' parse via
  the direct path without sanitization fallback.
- Site-tag extraction always runs first; well-formedness only rejects truly
  forbidden chars.
- _parse_season_episode() extended with NxNN / NxNNxNN alt forms.
- Removed dead helpers: _sanitize, _normalize.

Domain cleanup
- Deleted fossil services with zero production callers:
    alfred/domain/movies/services.py
    alfred/domain/tv_shows/services.py
    alfred/domain/subtitles/services.py (replaced by subtitles/services/ package)
    alfred/domain/subtitles/repositories.py
- Split monolithic subtitle services into a package (identifier, matcher,
  placer, pattern_detector, utils) + dedicated knowledge/ package.
- MediaInfo split into dedicated package (alfred/domain/shared/media/:
  audio, video, subtitle, info, matching).

Persistence cleanup
- Removed dead JSON repositories (movie/subtitle/tvshow_repository.py).

Tests
- Major expansion of the test suite organized to mirror the source tree.
- Removed obsolete *_edge_cases test files superseded by structured tests.
- Suite: 990 passed, 8 skipped.

Misc
- .gitignore: exclude env_backup/ and *.bak.
- Adjustments across agent/llm, app.py, application/filesystem, and
  infrastructure/filesystem to align with the new domain layout.
2026-05-17 23:38:00 +02:00
francwa ba6f016d49 feat: generic MetadataStore + read_release_metadata + query_library
- Extract MetadataStore from SubtitleMetadataStore (alfred/infrastructure/metadata/).
  Generic load/save + typed update helpers (update_parse, update_probe, update_tmdb)
  for the per-release .alfred/metadata.yaml.
- SubtitleMetadataStore becomes a thin facade — owns subtitle_history shape,
  delegates I/O to MetadataStore.
- Agent._execute_tool_call auto-persists successful analyze_release / probe_media /
  find_media_imdb_id results to the release's .alfred file. find_media_imdb_id
  follows release_focus when it has no path argument.
- New tools:
  · read_release_metadata(release_path) — cacheable, key=release_path.
    Returns the .alfred content or has_metadata=false.
  · query_library(name) — substring scan across configured library roots.
- Both new tools added to CORE_TOOLS (always visible).
2026-05-15 11:02:25 +02:00
francwa 3c7c6695f2 feat(memory): Phase 1 — STM ToolResultsCache + ReleaseFocus + cache flag in YAML specs
Adds two STM components and a transparent cache hook in the agent loop so
read-only tools don't re-do work the agent already did in this session.

New STM components:
  - ToolResultsCache  — {tool_name: {key: result}}, session-scoped.
    to_dict() exposes only the key inventory (not payloads) to keep the
    prompt cheap.
  - ReleaseFocus      — current_release_path + working_set list, updated
    automatically when a path-keyed inspector runs.

YAML spec layer:
  - New optional 'cache: { key: <param_name> }' block in ToolSpec.
  - Validated at load time: cache.key must be a declared parameter.
  - Surfaced on Tool dataclass as cache_key: str | None.

Agent._execute_tool_call:
  - Pre-exec cache lookup; hit short-circuits and adds _from_cache=true.
  - Post-exec: stores successful results, updates release_focus for
    path-keyed tools, refreshes episodic.last_search_results when
    find_torrent's hit served the response (so get_torrent_by_index
    keeps pointing at the right list).

Cacheable tools (5): analyze_release, probe_media, list_folder,
find_media_imdb_id, find_torrent.
2026-05-15 10:44:14 +02:00
francwa 2db3198ef2 feat(agent): migrate all remaining tools to YAML specs (21/21 covered)
Adds YAML specs for the 14 tools that were still description-from-docstring:

  filesystem:
    - set_path_for_folder, list_folder, analyze_release, probe_media,
      move_media, manage_subtitles, create_seed_links, learn
  api:
    - find_media_imdb_id, find_torrent, get_torrent_by_index,
      add_torrent_to_qbittorrent, add_torrent_by_index
  language:
    - set_language

Each spec follows the established shape (summary / description /
when_to_use / when_not_to_use / next_steps / parameters with
why_needed + example / returns) and the Python function docstring is
slimmed to a one-line pointer.

Registry now reports: 21 tools, 21 with YAML spec, 0 doc-only.
2026-05-14 21:18:43 +02:00
francwa 23a9dd7990 refactor(memory): rename workflow.target -> params, type -> name
The Workflow STM component stored an active workflow as
{type, target, stage, started_at}. Now that start_workflow takes a
workflow_name and a params dict, those keys match what they actually
hold:

  type   -> name    (the YAML workflow name, e.g. media.organize_media)
  target -> params  (the dict passed to start_workflow)

ShortTermMemory.start_workflow parameters renamed accordingly. All
consumers (prompt builder workflow scope + STM context, start/end
workflow tools) updated.
2026-05-14 21:11:23 +02:00
francwa 74a52ba6a3 feat(agent): workflow-scoped tool catalog + start/end_workflow meta-tools
Introduce a scope-aware agent so the LLM never sees the full 21-tool
catalog at once. The system prompt now describes either:
  - idle mode: core noyau (5 tools: set_language, set_path_for_folder,
    list_folder, start_workflow, end_workflow) + a list of available
    workflows with their goals;
  - active mode: the noyau plus the tools declared by the active
    workflow's YAML, with the step plan inlined into the prompt.

Pieces:
- alfred/agent/tools/workflow.py: start_workflow / end_workflow tools
  (with YAML specs under tools/specs/) that drive memory.stm.workflow.
- alfred/agent/prompt.py: CORE_TOOLS constant, visible_tool_names(),
  filtered build_tools_spec() / _format_tools_description(), and a new
  _format_workflow_scope() section in the system prompt.
- alfred/agent/agent.py: WorkflowLoader wired into Agent, defensive
  out-of-scope check in _execute_tool_call.
- alfred/agent/registry.py: registers the two new meta-tools (21 total,
  7 with YAML spec).
- workflows/media.organize_media.yaml: tools/steps list refreshed to
  match the current resolver split (analyze_release, probe_media,
  resolve_*_destination, move_to_destination).
2026-05-14 21:07:36 +02:00
francwa 97adfbda45 refactor(workflows): adopt media.* naming convention
Rename workflow files and their 'name' field with a 'media.' domain
prefix to anticipate future multi-domain expansion (mail.*, calendar.*, ...).

- organize_media -> media.organize_media
- manage_subtitles -> media.manage_subtitles

WorkflowLoader picks them up unchanged (uses data['name']).
2026-05-14 20:55:35 +02:00
francwa 239fce9e4e chore(agent): remove dead parameters.py
The ParameterSchema / REQUIRED_PARAMETERS / get_missing_required_parameters
machinery in alfred/agent/parameters.py was used in early prototypes for
the prompt-required-params check but has been unwired from production for
several refactors. The new YAML tool-spec layer (alfred/agent/tools/specs/)
covers the same need (rich, LLM-facing parameter descriptions) without
the parallel registration plumbing.

Tests in tests/test_config_edge_cases.py still reference the deleted
module — left untouched per the project policy of treating test sync
as a dedicated end-of-week task.
2026-05-14 18:06:34 +02:00
francwa 99c95af64e feat(agent): YAML tool specs as the LLM-facing semantic layer
Introduce a first-class semantic layer for tool descriptions, separated
from Python signatures (which stay the source of truth for types and
required-ness).

New
- alfred/agent/tools/spec.py — ToolSpec / ParameterSpec / ReturnsSpec
  dataclasses with strict YAML validation (ToolSpecError on malformed
  or inconsistent specs). compile_description() builds the rich text
  passed to the LLM as Tool.description, with sections for summary,
  description, when_to_use, when_not_to_use, next_steps, and returns.
  compile_parameter_description() injects the 'why_needed' field next
  to each parameter so the LLM sees the *intent* of each argument.
- alfred/agent/tools/spec_loader.py — discovers tools/specs/*.yaml,
  enforces filename ↔ spec.name match, rejects duplicates.
- alfred/agent/tools/specs/ — one YAML per tool:
    * resolve_season_destination.yaml
    * resolve_episode_destination.yaml
    * resolve_movie_destination.yaml
    * resolve_series_destination.yaml
    * move_to_destination.yaml

Refactor
- alfred/agent/registry.py
    * _create_tool_from_function now takes an optional ToolSpec.
      When provided, the long description + per-parameter descriptions
      come from the spec; types and required-ness still come from the
      Python signature.
    * Cross-validates spec.parameters against the function signature —
      crashes on missing or extra entries.
    * make_tools() loads all specs at startup and hands the right one
      to each tool. Tools without a spec fall back to the old
      docstring-only behaviour, so the 14 not-yet-migrated tools keep
      working unchanged.
    * Adds 'array' and 'object' to the Python→JSON type mapping and
      handles Optional[X] / X | None annotations.

- alfred/agent/tools/filesystem.py
    * Drops the '_tool' suffix on the 4 resolve_* wrappers (option 1:
      alias the use-case imports as _resolve_*). Tool names exposed to
      the LLM now match the underlying use case verbatim.
    * Wrapper docstrings shrink to a one-liner pointing to the YAML
      spec — no more duplicated when_to_use/Args/Returns in Python.

Verified
- make_tools() loads 19 tools (5 with YAML spec, 14 doc-only).
- Compiled descriptions render cleanly with all sections.
2026-05-14 18:06:27 +02:00
francwa b5025bb5f8 refactor(resolve_destination): factor shared series-folder resolution + DTO base
- New _Clarification sentinel and _resolve_series_folder() helper —
  the three TV use cases now share one matching/clarification path
  instead of triplicating the same if/elif/else block.
- New _ResolvedDestinationBase carrying status/question/options/error/
  message plus a _base_dict() helper; the four concrete DTOs only
  declare their own ok-state fields and a slim to_dict().
- No behaviour change: same outputs for ok/needs_clarification/error
  cases (verified by import + DTO smoke tests).
2026-05-14 16:09:33 +02:00
francwa e45465d52d feat: split resolve_destination, persona-driven prompts, qBittorrent relocation
Destination resolution
- Replace the single ResolveDestinationUseCase with four dedicated
  functions, one per release type:
    resolve_season_destination    (pack season, folder move)
    resolve_episode_destination   (single episode, file move)
    resolve_movie_destination     (movie, file move)
    resolve_series_destination    (multi-season pack, folder move)
- Each returns a dedicated DTO carrying only the fields relevant to
  that release type — no more polymorphic ResolvedDestination with
  half the fields unused depending on the case.
- Looser series folder matching: exact computed-name match is reused
  silently; any deviation (different group, multiple candidates) now
  prompts the user with all options including the computed name.

Agent tools
- Four new tools wrapping the use cases above; old resolve_destination
  removed from the registry.
- New move_to_destination tool: create_folder + move, chained — used
  after a resolve_* call to perform the actual relocation.
- Low-level filesystem_operations module (create_folder, move via mv)
  for instant same-FS renames (ZFS).

Prompt & persona
- New PromptBuilder (alfred/agent/prompt.py) replacing prompts.py:
  identity + personality block, situational expressions, memory
  schema, episodic/STM/config context, tool catalogue.
- Per-user expression system: knowledge/users/common.yaml +
  {username}.yaml are merged at runtime; one phrase per situation
  (greeting/success/error/...) is sampled into the system prompt.

qBittorrent integration
- Credentials now come from settings (qbittorrent_url/username/password)
  instead of hardcoded defaults.
- New client methods: find_by_name, set_location, recheck — the trio
  needed to update a torrent's save path and re-verify after a move.
- Host→container path translation settings (qbittorrent_host_path /
  qbittorrent_container_path) for docker-mounted setups.

Subtitles
- Identifier: strip parenthesized qualifiers (simplified, brazil…) at
  tokenization; new _tokenize_suffix used for the episode_subfolder
  pattern so episode-stem tokens no longer pollute language detection.
- Placer: extract _build_dest_name so it can be reused by the new
  dry_run path in ManageSubtitlesUseCase.
- Knowledge: add yue, ell, ind, msa, rus, vie, heb, tam, tel, tha,
  hin, ukr; add 'fre' to fra; add 'simplified'/'traditional' to zho.

Misc
- LTM workspace: add 'trash' folder slot.
- Default LLM provider switched to deepseek.
- testing/debug_release.py: CLI to parse a release, hit TMDB, and
  dry-run the destination resolution end-to-end.
2026-05-14 05:01:59 +02:00
francwa 1723b9fa53 feat: release parser, media type detection, ffprobe integration
Replace the old domain/media release parser with a full rewrite under
domain/release/:
- ParsedRelease with media_type ("movie" | "tv_show" | "tv_complete" |
  "documentary" | "concert" | "other" | "unknown"), site_tag, parse_path,
  languages, audio_codec, audio_channels, bit_depth, hdr_format, edition
- Well-formedness check + sanitize pipeline (_is_well_formed, _sanitize,
  _strip_site_tag) before token-level parsing
- Multi-token sequence matching for audio (DTS-HD.MA, TrueHD.Atmos…),
  HDR (DV.HDR10…) and editions (DIRECTORS.CUT…)
- Knowledge YAML: file_extensions, release_format, languages, audio,
  video, editions, sites/c411

New infrastructure:
- ffprobe.py — single-pass probe returning MediaInfo (video, audio
  tracks, subtitle tracks)
- find_video.py — locate first video file in a release folder

New application helpers:
- detect_media_type — filesystem-based type refinement
- enrich_from_probe — fill missing ParsedRelease fields from MediaInfo

New agent tools:
- analyze_release — parse + detect type + ffprobe in one call
- probe_media — standalone ffprobe for a specific file

New domain value object:
- MediaInfo + AudioTrack + SubtitleTrack (domain/shared/media_info.py)

Testing CLIs:
- recognize_folders_in_downloads.py — full pipeline with colored output
- probe_video.py — display MediaInfo for a video file
2026-05-12 16:14:20 +02:00
francwa 249c5de76a feat: major architectural refactor
- Refactor memory system (episodic/STM/LTM with components)
- Implement complete subtitle domain (scanner, matcher, placer)
- Add YAML workflow infrastructure
- Externalize knowledge base (patterns, release groups)
- Add comprehensive testing suite
- Create manual testing CLIs
2026-05-11 21:55:06 +02:00
francwa 62b5d0b998 Settings + fix startup 2026-04-30 12:41:42 +02:00
francwa 610dee365c mess: UV + settings KISS + fixes 2026-04-24 18:10:55 +02:00
francwa 58408d0dbe fix: fixed vectordb loneliness 2026-01-06 04:39:42 +01:00
francwa 2f1ac3c758 infra: simplified mongodb healthcheck 2026-01-06 04:36:52 +01:00
francwa d3b69f7459 feat: enabled logging for alfred(-core) 2026-01-06 04:33:59 +01:00
francwa 50c8204fa0 infra: added granular logging configuration for mongodb 2026-01-06 02:50:49 +01:00
francwa 507fe0f40e chore: updated dependencies 2026-01-06 02:41:18 +01:00
francwa b7b40eada1 fix: set proper database name for mongodb 2026-01-06 02:33:35 +01:00
francwa 9765386405 feat: named docker image to avoid docker picking the wrong one 2026-01-06 02:19:00 +01:00
francwa aa89a3fb00 doc: updated README.md
Renovate Bot / renovate (push) Successful in 20s
2026-01-04 13:30:54 +01:00
francwa 64aeb5fc80 chore: removed deprecated cli.py file 2026-01-04 13:29:38 +01:00
francwa 9540520dc4 feat: updated default start mode from core to full 2026-01-03 05:49:51 +01:00
francwa 300ed387f5 fix: fixed build vars generation not being called 2026-01-01 06:00:55 +01:00
francwa dea81de5b5 fix: updated CI workflow and added .env.make generation for CI 2026-01-01 05:33:39 +01:00
francwa 01a00a12af chore: bump version 0.1.6 → 0.1.7
CI/CD Awesome Pipeline / Test (push) Successful in 5m54s
CI/CD Awesome Pipeline / Build & Push to Registry (push) Failing after 1m9s
2026-01-01 05:04:37 +01:00
francwa 504d0162bb infra: added proper settings handling & orchestration and app bootstrap (.env)
Reviewed-on: #18
2026-01-01 03:55:35 +00:00
francwa cda23d074f feat: added current alfred version from pyproject.py to healthcheck 2026-01-01 04:51:45 +01:00
francwa 0357108077 infra: added orchestration and app bootstrap (.env) 2026-01-01 04:48:32 +01:00
francwa ab1df3dd0f fix: forgot to lint/format 2026-01-01 04:48:32 +01:00
francwa c50091f6bf feat: added proper settings handling 2026-01-01 04:48:32 +01:00
francwa 8b406370f1 chore: updated some dependencies 2026-01-01 03:08:22 +01:00
francwa c56bf2b92c feat: added Claude.AI to available providers 2025-12-29 02:13:11 +01:00
francwa b1507db4d0 fix: fixed test image not being built before running tests
Renovate Bot / renovate (push) Successful in 1m26s
2025-12-28 13:24:03 +01:00
francwa 3074962314 infra: proper librechat integration & improved configuration handling
Reviewed-on: #11
2025-12-28 11:56:53 +00:00
francwa 84799879bb fix: added data dir to .dockerignore 2025-12-28 12:02:44 +01:00
francwa 1052c1b619 infra: added modules version to .env.example 2025-12-28 06:07:34 +01:00
francwa 9958b8e848 feat: created placeholders for various API models 2025-12-28 06:06:28 +01:00
francwa b15161dad7 fix: provided OpenAI API key and fixed docker-compose configuration to enable RAG service 2025-12-28 06:04:08 +01:00
francwa 52f025ae32 chore: ran linter & formatter again 2025-12-27 20:07:48 +01:00
francwa 2cfe7a035b infra: moved makefile logic to python and added .env setup 2025-12-27 19:51:40 +01:00
francwa 2441c2dc29 infra: rewrote docker-compose for proper integration of librechat 2025-12-27 19:49:43 +01:00
francwa 261a1f3918 fix: fixed real data directory being used in tests 2025-12-27 19:48:13 +01:00
francwa 253903a1e5 infra: made pyproject the SSOT 2025-12-27 19:46:27 +01:00
francwa 20a113e335 infra: updated .env.example 2025-12-27 19:43:31 +01:00
francwa fed83e7d79 chore: bumped fastapi version 2025-12-27 19:42:16 +01:00
francwa 3880a4ec49 chore: ran linter and formatter 2025-12-27 19:41:22 +01:00
francwa 6195abbaa5 chore: fixed imports and tests configuration 2025-12-27 19:39:36 +01:00
francwa b132554631 infra: updated .gitignore 2025-12-27 02:30:14 +01:00
francwa 561796cec1 infra: removed CORE_DIR variable from Makefile (not relevant anymore) 2025-12-24 11:28:28 +01:00
francwa 26d90acc16 infra: moved manifest files in librechat dir 2025-12-24 09:46:25 +01:00
francwa d8234b2958 chore: cleaned up .env.example 2025-12-24 08:17:47 +01:00
francwa 156d1fe567 fix: fixed Dockerfile 2025-12-24 08:17:12 +01:00
francwa f8eee120cf infra: backed up old deploy-compose 2025-12-24 08:16:30 +01:00
francwa c5e4a5e1a7 infra: removed occurences of alfred_api_key (not implemented after all) 2025-12-24 08:11:30 +01:00
francwa d10c9160f3 infra: renamed broken references to alfred 2025-12-24 08:09:26 +01:00
francwa 1f88e99e8b infra: reorganized repo 2025-12-24 07:50:09 +01:00
francwa e097a13221 chore: upgraded to python 3.14 and some dependencies
Reviewed-on: francwa/agent_media#8
2025-12-22 14:04:46 +00:00
francwa 086fff803d feat: configured CI/CD pipeline to allow build for feature branches 2025-12-22 14:48:50 +01:00
francwa 45fbf975b3 feat: improved rules for renovate 2025-12-22 14:25:16 +01:00
francwa b8f2798e29 chore: updated fastapi and uvicorn 2025-12-22 14:20:22 +01:00
francwa c762d91eb1 chore: updated to python 3.14 2025-12-22 14:08:27 +01:00
francwa 35a68387ab infra: use python version from pyproject.toml 2025-12-22 13:46:40 +01:00
francwa 9b13c69631 chore: updated meilisearch 2025-12-22 13:25:54 +01:00
francwa 2ca1ea29b2 infra: added meilisearch to renovate exclusion list 2025-12-22 13:22:55 +01:00
francwa 5e86615bde fix: added renovate token with proper permissions 2025-12-22 12:59:24 +01:00
francwa 6701a4b392 infra: added Renovate bot 2025-12-22 12:50:59 +01:00
francwa 68372405d6 fix: downgraded upload-artifact action to v3 from v4 2025-12-22 12:13:50 +01:00
francwa f1ea0de247 fix: fixed indentation error 2025-12-22 12:04:47 +01:00
francwa 974d008825 feat: finalized CI/CD pipeline setup 2025-12-22 11:59:36 +01:00
francwa 8a87d94e6d fix: use docker image for trivy vulnerability scanner
CI/CD Awesome Pipeline / Test (push) Successful in 1m23s
CI/CD Awesome Pipeline / Build & Push to Registry (push) Failing after 5m9s
2025-12-22 11:38:35 +01:00
francwa ec99a501fc fix! added directive to Dockerfile 2025-12-22 11:37:48 +01:00
434 changed files with 43188 additions and 10699 deletions
@@ -1,5 +1,5 @@
[tool.bumpversion] [tool.bumpversion]
current_version = "0.1.6" current_version = "0.1.7"
parse = "(?P<major>\\d+)\\.(?P<minor>\\d+)\\.(?P<patch>\\d+)" parse = "(?P<major>\\d+)\\.(?P<minor>\\d+)\\.(?P<patch>\\d+)"
serialize = ["{major}.{minor}.{patch}"] serialize = ["{major}.{minor}.{patch}"]
search = "{current_version}" search = "{current_version}"
+3 -7
View File
@@ -22,8 +22,7 @@ venv
.venv .venv
env env
.env .env
.env.* .env-
# IDE # IDE
.vscode .vscode
.idea .idea
@@ -41,11 +40,8 @@ docs/
*.md *.md
!README.md !README.md
# Tests # Data
tests/ data/
pytest.ini
# Data (will be mounted as volumes)
memory_data/ memory_data/
logs/ logs/
*.log *.log
+80
View File
@@ -0,0 +1,80 @@
# --- IMPORTANT ---
# Settings are split across multiple files for clarity.
# Files (loaded in this order, last wins):
# .env.alfred — app config and service addresses (safe to commit)
# .env.secrets — generated secrets, passwords, URIs and API keys (DO NOT COMMIT)
# .env.make — build metadata synced from pyproject.toml (safe to commit)
#
# To customize: edit .env.alfred for config, .env.secrets for secrets.
# --- Alfred ---
MAX_HISTORY_MESSAGES=10
MAX_TOOL_ITERATIONS=10
REQUEST_TIMEOUT=30
# LLM Settings
LLM_TEMPERATURE=0.2
# Persistence
DATA_STORAGE_DIR=data
# Network
HOST=0.0.0.0
PORT=3080
# --- DATABASES ---
# Passwords and connection URIs are auto-generated in .env.secrets.
# Edit host/port/user/dbname here if needed.
# MongoDB (Application Data)
MONGO_HOST=mongodb
MONGO_PORT=27017
MONGO_USER=alfred
MONGO_DB_NAME=alfred
# PostgreSQL (Vector Database / RAG)
POSTGRES_HOST=vectordb
POSTGRES_PORT=5432
POSTGRES_USER=alfred
POSTGRES_DB_NAME=alfred
# --- EXTERNAL SERVICES ---
# TMDB — Media metadata (required). Get your key at https://www.themoviedb.org/
# → TMDB_API_KEY goes in .env.secrets
TMDB_BASE_URL=https://api.themoviedb.org/3
# qBittorrent
# → QBITTORRENT_PASSWORD goes in .env.secrets
QBITTORRENT_URL=https://qb.lan.anustart.top
QBITTORRENT_USERNAME=letmein
QBITTORRENT_PORT=16140
# Path translation: host-side prefix → container-side prefix
QBITTORRENT_HOST_PATH=/mnt/testipool
QBITTORRENT_CONTAINER_PATH=/mnt/data
# Meilisearch
# → MEILI_MASTER_KEY goes in .env.secrets
# MEILI_ENABLED=false # KEY DOESN'T EXISTS => SEARCH IS THE PROPER KEY
SEARCH=false
MEILI_NO_ANALYTICS=true
MEILI_HOST=http://meilisearch:7700
# --- LLM CONFIGURATION ---
# Providers: local, openai, anthropic, deepseek, google, kimi
# → API keys go in .env.secrets
DEFAULT_LLM_PROVIDER=deepseek
# Local LLM (Ollama)
#OLLAMA_BASE_URL=http://ollama:11434
#OLLAMA_MODEL=llama3.3:latest
OLLAMA_BASE_URL=http://10.0.0.11:11434
OLLAMA_MODEL=glm-4.7-flash:latest
# --- RAG ENGINE ---
RAG_ENABLED=TRUE
RAG_API_URL=http://rag_api:8000
RAG_API_PORT=8000
EMBEDDINGS_PROVIDER=ollama
EMBEDDINGS_MODEL=nomic-embed-text
+61 -57
View File
@@ -1,69 +1,73 @@
# Agent Media - Environment Variables # --- IMPORTANT ---
# Settings are split across multiple files for clarity.
# Files (loaded in this order, last wins):
# .env.alfred — app config and service addresses (safe to commit)
# .env.secrets — generated secrets, passwords, URIs and API keys (DO NOT COMMIT)
# .env.make — build metadata synced from pyproject.toml (safe to commit)
#
# To customize: edit .env.alfred for config, .env.secrets for secrets.
# LibreChat Security Keys # --- Alfred ---
# Generate secure keys with: openssl rand -base64 32 MAX_HISTORY_MESSAGES=10
JWT_SECRET=your-super-secret-jwt-key-change-this-in-production MAX_TOOL_ITERATIONS=10
JWT_REFRESH_SECRET=your-super-secret-refresh-key-change-this-too REQUEST_TIMEOUT=30
# Generate with: openssl rand -hex 16 (for CREDS_KEY) # LLM Settings
CREDS_KEY=your-32-character-secret-key-here LLM_TEMPERATURE=0.2
# Generate with: openssl rand -hex 8 (for CREDS_IV) # Persistence
CREDS_IV=your-16-character-iv-here DATA_STORAGE_DIR=data
# LibreChat Configuration # Network
DOMAIN_CLIENT=http://localhost:3080 HOST=0.0.0.0
DOMAIN_SERVER=http://localhost:3080 PORT=3080
# Session expiry (in milliseconds) # --- DATABASES ---
# Default: 15 minutes # Passwords and connection URIs are auto-generated in .env.secrets.
SESSION_EXPIRY=900000 # Edit host/port/user/dbname here if needed.
# Refresh token expiry (in milliseconds) # MongoDB (Application Data)
# Default: 7 days MONGO_HOST=mongodb
REFRESH_TOKEN_EXPIRY=604800000 MONGO_PORT=27017
MONGO_USER=alfred
MONGO_DB_NAME=LibreChat
# Meilisearch Configuration # PostgreSQL (Vector Database / RAG)
# Master key for Meilisearch (generate with: openssl rand -base64 32) POSTGRES_HOST=vectordb
MEILI_MASTER_KEY=DrhYf7zENyR6AlUCKmnz0eYASOQdl6zxH7s7MKFSfFU POSTGRES_PORT=5432
POSTGRES_USER=alfred
POSTGRES_DB_NAME=alfred
# PostgreSQL Configuration (for RAG API) # --- EXTERNAL SERVICES ---
POSTGRES_DB=librechat_rag
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
# RAG API Configuration (Vector Database) # TMDB — Media metadata (required). Get your key at https://www.themoviedb.org/
RAG_COLLECTION_NAME=testcollection # → TMDB_API_KEY goes in .env.secrets
RAG_EMBEDDINGS_PROVIDER=openai TMDB_BASE_URL=https://api.themoviedb.org/3
RAG_EMBEDDINGS_MODEL=text-embedding-3-small
# API Keys # qBittorrent
# OpenAI API Key (required for RAG embeddings) # → QBITTORRENT_PASSWORD goes in .env.secrets
OPENAI_API_KEY=your-openai-api-key-here QBITTORRENT_URL=http://qbittorrent:16140
# Deepseek API Key (for LLM in agent-brain)
DEEPSEEK_API_KEY=your-deepseek-api-key-here
# Agent Brain Configuration
# LLM Provider (deepseek or ollama)
LLM_PROVIDER=deepseek
# Memory storage directory (inside container)
MEMORY_STORAGE_DIR=/data/memory
# API Key for agent-brain (used by LibreChat custom endpoint)
AGENT_BRAIN_API_KEY=agent-brain-secret-key
# External Services (Optional)
# TMDB API Key (for movie metadata)
TMDB_API_KEY=your-tmdb-key
# qBittorrent Configuration
QBITTORRENT_URL=http://localhost:8080
QBITTORRENT_USERNAME=admin QBITTORRENT_USERNAME=admin
QBITTORRENT_PASSWORD=adminpass QBITTORRENT_PORT=16140
# Debug Options # Meilisearch
DEBUG_LOGGING=false # → MEILI_MASTER_KEY goes in .env.secrets
DEBUG_CONSOLE=false MEILI_ENABLED=FALSE
MEILI_NO_ANALYTICS=TRUE
MEILI_HOST=http://meilisearch:7700
# --- LLM CONFIGURATION ---
# Providers: local, openai, anthropic, deepseek, google, kimi
# → API keys go in .env.secrets
DEFAULT_LLM_PROVIDER=local
# Local LLM (Ollama)
OLLAMA_BASE_URL=http://ollama:11434
OLLAMA_MODEL=llama3.3:latest
# --- RAG ENGINE ---
RAG_ENABLED=TRUE
RAG_API_URL=http://rag_api:8000
RAG_API_PORT=8000
EMBEDDINGS_PROVIDER=ollama
EMBEDDINGS_MODEL=nomic-embed-text
+878
View File
@@ -0,0 +1,878 @@
#=====================================================================#
# LibreChat Configuration #
#=====================================================================#
# Please refer to the reference documentation for assistance #
# with configuring your LibreChat environment. #
# #
# https://www.librechat.ai/docs/configuration/dotenv #
#=====================================================================#
#==================================================#
# Server Configuration #
#==================================================#
HOST=localhost
PORT=3080
MONGO_URI=mongodb://127.0.0.1:27017/LibreChat
#The maximum number of connections in the connection pool. */
MONGO_MAX_POOL_SIZE=
#The minimum number of connections in the connection pool. */
MONGO_MIN_POOL_SIZE=
#The maximum number of connections that may be in the process of being established concurrently by the connection pool. */
MONGO_MAX_CONNECTING=
#The maximum number of milliseconds that a connection can remain idle in the pool before being removed and closed. */
MONGO_MAX_IDLE_TIME_MS=
#The maximum time in milliseconds that a thread can wait for a connection to become available. */
MONGO_WAIT_QUEUE_TIMEOUT_MS=
# Set to false to disable automatic index creation for all models associated with this connection. */
MONGO_AUTO_INDEX=
# Set to `false` to disable Mongoose automatically calling `createCollection()` on every model created on this connection. */
MONGO_AUTO_CREATE=
DOMAIN_CLIENT=http://localhost:3080
DOMAIN_SERVER=http://localhost:3080
NO_INDEX=true
# Use the address that is at most n number of hops away from the Express application.
# req.socket.remoteAddress is the first hop, and the rest are looked for in the X-Forwarded-For header from right to left.
# A value of 0 means that the first untrusted address would be req.socket.remoteAddress, i.e. there is no reverse proxy.
# Defaulted to 1.
TRUST_PROXY=1
# Minimum password length for user authentication
# Default: 8
# Note: When using LDAP authentication, you may want to set this to 1
# to bypass local password validation, as LDAP servers handle their own
# password policies.
# MIN_PASSWORD_LENGTH=8
# When enabled, the app will continue running after encountering uncaught exceptions
# instead of exiting the process. Not recommended for production unless necessary.
# CONTINUE_ON_UNCAUGHT_EXCEPTION=false
#===============#
# JSON Logging #
#===============#
# Use when process console logs in cloud deployment like GCP/AWS
CONSOLE_JSON=false
#===============#
# Debug Logging #
#===============#
DEBUG_LOGGING=true
DEBUG_CONSOLE=false
# Set to true to enable agent debug logging
AGENT_DEBUG_LOGGING=false
# Enable memory diagnostics (logs heap/RSS snapshots every 60s, auto-enabled with --inspect)
# MEM_DIAG=true
#=============#
# Permissions #
#=============#
# UID=1000
# GID=1000
#==============#
# Node Options #
#==============#
# NOTE: NODE_MAX_OLD_SPACE_SIZE is NOT recognized by Node.js directly.
# This variable is used as a build argument for Docker or CI/CD workflows,
# and is NOT used by Node.js to set the heap size at runtime.
# To configure Node.js memory, use NODE_OPTIONS, e.g.:
# NODE_OPTIONS="--max-old-space-size=6144"
# See: https://nodejs.org/api/cli.html#--max-old-space-sizesize-in-mib
NODE_MAX_OLD_SPACE_SIZE=6144
#===============#
# Configuration #
#===============#
# Use an absolute path, a relative path, or a URL
# CONFIG_PATH="/alternative/path/to/librechat.yaml"
#==================#
# Langfuse Tracing #
#==================#
# Get Langfuse API keys for your project from the project settings page: https://cloud.langfuse.com
# LANGFUSE_PUBLIC_KEY=
# LANGFUSE_SECRET_KEY=
# LANGFUSE_BASE_URL=
#===================================================#
# Endpoints #
#===================================================#
# ENDPOINTS=openAI,assistants,azureOpenAI,google,anthropic
PROXY=
#===================================#
# Known Endpoints - librechat.yaml #
#===================================#
# https://www.librechat.ai/docs/configuration/librechat_yaml/ai_endpoints
# ANYSCALE_API_KEY=
# APIPIE_API_KEY=
# COHERE_API_KEY=
# DEEPSEEK_API_KEY=
# DATABRICKS_API_KEY=
# FIREWORKS_API_KEY=
# GROQ_API_KEY=
# HUGGINGFACE_TOKEN=
# MISTRAL_API_KEY=
# OPENROUTER_KEY=
# PERPLEXITY_API_KEY=
# SHUTTLEAI_API_KEY=
# TOGETHERAI_API_KEY=
# UNIFY_API_KEY=
# XAI_API_KEY=
#============#
# Anthropic #
#============#
ANTHROPIC_API_KEY=user_provided
# ANTHROPIC_MODELS=claude-sonnet-4-6,claude-opus-4-6,claude-opus-4-20250514,claude-sonnet-4-20250514,claude-3-7-sonnet-20250219,claude-3-5-sonnet-20241022,claude-3-5-haiku-20241022,claude-3-opus-20240229,claude-3-sonnet-20240229,claude-3-haiku-20240307
# ANTHROPIC_REVERSE_PROXY=
# Set to true to use Anthropic models through Google Vertex AI instead of direct API
# ANTHROPIC_USE_VERTEX=
# ANTHROPIC_VERTEX_REGION=us-east5
#============#
# Azure #
#============#
# Note: these variables are DEPRECATED
# Use the `librechat.yaml` configuration for `azureOpenAI` instead
# You may also continue to use them if you opt out of using the `librechat.yaml` configuration
# AZURE_OPENAI_DEFAULT_MODEL=gpt-3.5-turbo # Deprecated
# AZURE_OPENAI_MODELS=gpt-3.5-turbo,gpt-4 # Deprecated
# AZURE_USE_MODEL_AS_DEPLOYMENT_NAME=TRUE # Deprecated
# AZURE_API_KEY= # Deprecated
# AZURE_OPENAI_API_INSTANCE_NAME= # Deprecated
# AZURE_OPENAI_API_DEPLOYMENT_NAME= # Deprecated
# AZURE_OPENAI_API_VERSION= # Deprecated
# AZURE_OPENAI_API_COMPLETIONS_DEPLOYMENT_NAME= # Deprecated
# AZURE_OPENAI_API_EMBEDDINGS_DEPLOYMENT_NAME= # Deprecated
#=================#
# AWS Bedrock #
#=================#
# BEDROCK_AWS_DEFAULT_REGION=us-east-1 # A default region must be provided
# BEDROCK_AWS_ACCESS_KEY_ID=someAccessKey
# BEDROCK_AWS_SECRET_ACCESS_KEY=someSecretAccessKey
# BEDROCK_AWS_SESSION_TOKEN=someSessionToken
# Note: This example list is not meant to be exhaustive. If omitted, all known, supported model IDs will be included for you.
# BEDROCK_AWS_MODELS=anthropic.claude-sonnet-4-6,anthropic.claude-opus-4-6-v1,anthropic.claude-3-5-sonnet-20240620-v1:0,meta.llama3-1-8b-instruct-v1:0
# Cross-region inference model IDs: us.anthropic.claude-sonnet-4-6,us.anthropic.claude-opus-4-6-v1,global.anthropic.claude-opus-4-6-v1
# See all Bedrock model IDs here: https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html#model-ids-arns
# Notes on specific models:
# The following models are not support due to not supporting streaming:
# ai21.j2-mid-v1
# The following models are not support due to not supporting conversation history:
# ai21.j2-ultra-v1, cohere.command-text-v14, cohere.command-light-text-v14
#============#
# Google #
#============#
GOOGLE_KEY=user_provided
# GOOGLE_REVERSE_PROXY=
# Some reverse proxies do not support the X-goog-api-key header, uncomment to pass the API key in Authorization header instead.
# GOOGLE_AUTH_HEADER=true
# Gemini API (AI Studio)
# GOOGLE_MODELS=gemini-3.1-pro-preview,gemini-3.1-pro-preview-customtools,gemini-3.1-flash-lite-preview,gemini-2.5-pro,gemini-2.5-flash,gemini-2.5-flash-lite,gemini-2.0-flash,gemini-2.0-flash-lite
# Vertex AI
# GOOGLE_MODELS=gemini-3.1-pro-preview,gemini-3.1-pro-preview-customtools,gemini-3.1-flash-lite-preview,gemini-2.5-pro,gemini-2.5-flash,gemini-2.5-flash-lite,gemini-2.0-flash-001,gemini-2.0-flash-lite-001
# GOOGLE_TITLE_MODEL=gemini-2.0-flash-lite-001
# Google Cloud region for Vertex AI (used by both chat and image generation)
# GOOGLE_LOC=us-central1
# Alternative region env var for Gemini Image Generation
# GOOGLE_CLOUD_LOCATION=global
# Vertex AI Service Account Configuration
# Path to your Google Cloud service account JSON file
# GOOGLE_SERVICE_KEY_FILE=/path/to/service-account.json
# Google Safety Settings
# NOTE: These settings apply to both Vertex AI and Gemini API (AI Studio)
#
# For Vertex AI:
# To use the BLOCK_NONE setting, you need either:
# (a) Access through an allowlist via your Google account team, or
# (b) Switch to monthly invoiced billing: https://cloud.google.com/billing/docs/how-to/invoiced-billing
#
# For Gemini API (AI Studio):
# BLOCK_NONE is available by default, no special account requirements.
#
# Available options: BLOCK_NONE, BLOCK_ONLY_HIGH, BLOCK_MEDIUM_AND_ABOVE, BLOCK_LOW_AND_ABOVE
#
# GOOGLE_SAFETY_SEXUALLY_EXPLICIT=BLOCK_ONLY_HIGH
# GOOGLE_SAFETY_HATE_SPEECH=BLOCK_ONLY_HIGH
# GOOGLE_SAFETY_HARASSMENT=BLOCK_ONLY_HIGH
# GOOGLE_SAFETY_DANGEROUS_CONTENT=BLOCK_ONLY_HIGH
# GOOGLE_SAFETY_CIVIC_INTEGRITY=BLOCK_ONLY_HIGH
#========================#
# Gemini Image Generation #
#========================#
# Gemini Image Generation Tool (for Agents)
# Supports multiple authentication methods in priority order:
# 1. User-provided API key (via GUI)
# 2. GEMINI_API_KEY env var (admin-configured)
# 3. GOOGLE_KEY env var (shared with Google chat endpoint)
# 4. Vertex AI service account (via GOOGLE_SERVICE_KEY_FILE)
# Option A: Use dedicated Gemini API key for image generation
# GEMINI_API_KEY=your-gemini-api-key
# Vertex AI model for image generation (defaults to gemini-2.5-flash-image)
# GEMINI_IMAGE_MODEL=gemini-2.5-flash-image
#============#
# OpenAI #
#============#
OPENAI_API_KEY=user_provided
# OPENAI_MODELS=gpt-5,gpt-5-codex,gpt-5-mini,gpt-5-nano,o3-pro,o3,o4-mini,gpt-4.1,gpt-4.1-mini,gpt-4.1-nano,o3-mini,o1-pro,o1,gpt-4o,gpt-4o-mini
DEBUG_OPENAI=false
# TITLE_CONVO=false
# OPENAI_TITLE_MODEL=gpt-4o-mini
# OPENAI_SUMMARIZE=true
# OPENAI_SUMMARY_MODEL=gpt-4o-mini
# OPENAI_FORCE_PROMPT=true
# OPENAI_REVERSE_PROXY=
# OPENAI_ORGANIZATION=
#====================#
# Assistants API #
#====================#
ASSISTANTS_API_KEY=user_provided
# ASSISTANTS_BASE_URL=
# ASSISTANTS_MODELS=gpt-4o,gpt-4o-mini,gpt-3.5-turbo-0125,gpt-3.5-turbo-16k-0613,gpt-3.5-turbo-16k,gpt-3.5-turbo,gpt-4,gpt-4-0314,gpt-4-32k-0314,gpt-4-0613,gpt-3.5-turbo-0613,gpt-3.5-turbo-1106,gpt-4-0125-preview,gpt-4-turbo-preview,gpt-4-1106-preview
#==========================#
# Azure Assistants API #
#==========================#
# Note: You should map your credentials with custom variables according to your Azure OpenAI Configuration
# The models for Azure Assistants are also determined by your Azure OpenAI configuration.
# More info, including how to enable use of Assistants with Azure here:
# https://www.librechat.ai/docs/configuration/librechat_yaml/ai_endpoints/azure#using-assistants-with-azure
CREDS_KEY=f34be427ebb29de8d88c107a71546019685ed8b241d8f2ed00c3df97ad2566f0
CREDS_IV=e2341419ec3dd3d19b13a1a87fafcbfb
# Azure AI Search
#-----------------
AZURE_AI_SEARCH_SERVICE_ENDPOINT=
AZURE_AI_SEARCH_INDEX_NAME=
AZURE_AI_SEARCH_API_KEY=
AZURE_AI_SEARCH_API_VERSION=
AZURE_AI_SEARCH_SEARCH_OPTION_QUERY_TYPE=
AZURE_AI_SEARCH_SEARCH_OPTION_TOP=
AZURE_AI_SEARCH_SEARCH_OPTION_SELECT=
# OpenAI Image Tools Customization
#----------------
# IMAGE_GEN_OAI_API_KEY= # Create or reuse OpenAI API key for image generation tool
# IMAGE_GEN_OAI_BASEURL= # Custom OpenAI base URL for image generation tool
# IMAGE_GEN_OAI_AZURE_API_VERSION= # Custom Azure OpenAI deployments
# IMAGE_GEN_OAI_MODEL=gpt-image-1 # OpenAI image model (e.g., gpt-image-1, gpt-image-1.5)
# IMAGE_GEN_OAI_DESCRIPTION=
# IMAGE_GEN_OAI_DESCRIPTION_WITH_FILES=Custom description for image generation tool when files are present
# IMAGE_GEN_OAI_DESCRIPTION_NO_FILES=Custom description for image generation tool when no files are present
# IMAGE_EDIT_OAI_DESCRIPTION=Custom description for image editing tool
# IMAGE_GEN_OAI_PROMPT_DESCRIPTION=Custom prompt description for image generation tool
# IMAGE_EDIT_OAI_PROMPT_DESCRIPTION=Custom prompt description for image editing tool
# DALL·E
#----------------
# DALLE_API_KEY=
# DALLE3_API_KEY=
# DALLE2_API_KEY=
# DALLE3_SYSTEM_PROMPT=
# DALLE2_SYSTEM_PROMPT=
# DALLE_REVERSE_PROXY=
# DALLE3_BASEURL=
# DALLE2_BASEURL=
# DALL·E (via Azure OpenAI)
# Note: requires some of the variables above to be set
#----------------
# DALLE3_AZURE_API_VERSION=
# DALLE2_AZURE_API_VERSION=
# Flux
#-----------------
FLUX_API_BASE_URL=https://api.us1.bfl.ai
# FLUX_API_BASE_URL = 'https://api.bfl.ml';
# Get your API key at https://api.us1.bfl.ai/auth/profile
# FLUX_API_KEY=
# Google
#-----------------
GOOGLE_SEARCH_API_KEY=
GOOGLE_CSE_ID=
# Stable Diffusion
#-----------------
SD_WEBUI_URL=http://host.docker.internal:7860
# Tavily
#-----------------
TAVILY_API_KEY=
# Traversaal
#-----------------
TRAVERSAAL_API_KEY=
# WolframAlpha
#-----------------
WOLFRAM_APP_ID=
# Zapier
#-----------------
ZAPIER_NLA_API_KEY=
#==================================================#
# Search #
#==================================================#
SEARCH=true
MEILI_NO_ANALYTICS=true
MEILI_HOST=http://0.0.0.0:7700
MEILI_MASTER_KEY=DrhYf7zENyR6AlUCKmnz0eYASOQdl6zxH7s7MKFSfFCt
# Optional: Disable indexing, useful in a multi-node setup
# where only one instance should perform an index sync.
# MEILI_NO_SYNC=true
#==================================================#
# Speech to Text & Text to Speech #
#==================================================#
STT_API_KEY=
TTS_API_KEY=
#==================================================#
# RAG #
#==================================================#
# More info: https://www.librechat.ai/docs/configuration/rag_api
# RAG_OPENAI_BASEURL=
# RAG_OPENAI_API_KEY=
# RAG_USE_FULL_CONTEXT=
# EMBEDDINGS_PROVIDER=openai
# EMBEDDINGS_MODEL=text-embedding-3-small
#===================================================#
# User System #
#===================================================#
#========================#
# Moderation #
#========================#
OPENAI_MODERATION=false
OPENAI_MODERATION_API_KEY=
# OPENAI_MODERATION_REVERSE_PROXY=
BAN_VIOLATIONS=true
BAN_DURATION=1000 * 60 * 60 * 2
BAN_INTERVAL=20
LOGIN_VIOLATION_SCORE=1
REGISTRATION_VIOLATION_SCORE=1
CONCURRENT_VIOLATION_SCORE=1
MESSAGE_VIOLATION_SCORE=1
NON_BROWSER_VIOLATION_SCORE=20
TTS_VIOLATION_SCORE=0
STT_VIOLATION_SCORE=0
FORK_VIOLATION_SCORE=0
IMPORT_VIOLATION_SCORE=0
FILE_UPLOAD_VIOLATION_SCORE=0
LOGIN_MAX=7
LOGIN_WINDOW=5
REGISTER_MAX=5
REGISTER_WINDOW=60
LIMIT_CONCURRENT_MESSAGES=true
CONCURRENT_MESSAGE_MAX=2
LIMIT_MESSAGE_IP=true
MESSAGE_IP_MAX=40
MESSAGE_IP_WINDOW=1
LIMIT_MESSAGE_USER=false
MESSAGE_USER_MAX=40
MESSAGE_USER_WINDOW=1
ILLEGAL_MODEL_REQ_SCORE=5
#========================#
# Balance #
#========================#
# CHECK_BALANCE=false
# START_BALANCE=20000 # note: the number of tokens that will be credited after registration.
#========================#
# Registration and Login #
#========================#
ALLOW_EMAIL_LOGIN=true
ALLOW_REGISTRATION=true
ALLOW_SOCIAL_LOGIN=false
ALLOW_SOCIAL_REGISTRATION=false
ALLOW_PASSWORD_RESET=false
# ALLOW_ACCOUNT_DELETION=true # note: enabled by default if omitted/commented out
ALLOW_UNVERIFIED_EMAIL_LOGIN=true
SESSION_EXPIRY=1000 * 60 * 15
REFRESH_TOKEN_EXPIRY=(1000 * 60 * 60 * 24) * 7
JWT_SECRET=16f8c0ef4a5d391b26034086c628469d3f9f497f08163ab9b40137092f2909ef
JWT_REFRESH_SECRET=eaa5191f2914e30b9387fd84e254e4ba6fc51b4654968a9b0803b456a54b8418
# Discord
DISCORD_CLIENT_ID=
DISCORD_CLIENT_SECRET=
DISCORD_CALLBACK_URL=/oauth/discord/callback
# Facebook
FACEBOOK_CLIENT_ID=
FACEBOOK_CLIENT_SECRET=
FACEBOOK_CALLBACK_URL=/oauth/facebook/callback
# GitHub
GITHUB_CLIENT_ID=
GITHUB_CLIENT_SECRET=
GITHUB_CALLBACK_URL=/oauth/github/callback
# GitHub Enterprise
# GITHUB_ENTERPRISE_BASE_URL=
# GITHUB_ENTERPRISE_USER_AGENT=
# Google
GOOGLE_CLIENT_ID=
GOOGLE_CLIENT_SECRET=
GOOGLE_CALLBACK_URL=/oauth/google/callback
# Apple
APPLE_CLIENT_ID=
APPLE_TEAM_ID=
APPLE_KEY_ID=
APPLE_PRIVATE_KEY_PATH=
APPLE_CALLBACK_URL=/oauth/apple/callback
# OpenID
OPENID_CLIENT_ID=
OPENID_CLIENT_SECRET=
OPENID_ISSUER=
OPENID_SESSION_SECRET=
OPENID_SCOPE="openid profile email"
OPENID_CALLBACK_URL=/oauth/openid/callback
OPENID_REQUIRED_ROLE=
OPENID_REQUIRED_ROLE_TOKEN_KIND=
OPENID_REQUIRED_ROLE_PARAMETER_PATH=
OPENID_ADMIN_ROLE=
OPENID_ADMIN_ROLE_PARAMETER_PATH=
OPENID_ADMIN_ROLE_TOKEN_KIND=
# Set to determine which user info property returned from OpenID Provider to store as the User's username
OPENID_USERNAME_CLAIM=
# Set to determine which user info property returned from OpenID Provider to store as the User's name
OPENID_NAME_CLAIM=
# Set to determine which user info claim to use as the email/identifier for user matching (e.g., "upn" for Entra ID)
# When not set, defaults to: email -> preferred_username -> upn
OPENID_EMAIL_CLAIM=
# Optional audience parameter for OpenID authorization requests
OPENID_AUDIENCE=
OPENID_BUTTON_LABEL=
OPENID_IMAGE_URL=
# Set to true to automatically redirect to the OpenID provider when a user visits the login page
# This will bypass the login form completely for users, only use this if OpenID is your only authentication method
OPENID_AUTO_REDIRECT=false
# Set to true to use PKCE (Proof Key for Code Exchange) for OpenID authentication
OPENID_USE_PKCE=false
#Set to true to reuse openid tokens for authentication management instead of using the mongodb session and the custom refresh token.
OPENID_REUSE_TOKENS=
#By default, signing key verification results are cached in order to prevent excessive HTTP requests to the JWKS endpoint.
#If a signing key matching the kid is found, this will be cached and the next time this kid is requested the signing key will be served from the cache.
#Default is true.
OPENID_JWKS_URL_CACHE_ENABLED=
OPENID_JWKS_URL_CACHE_TIME= # 600000 ms eq to 10 minutes leave empty to disable caching
#Set to true to trigger token exchange flow to acquire access token for the userinfo endpoint.
OPENID_ON_BEHALF_FLOW_FOR_USERINFO_REQUIRED=
OPENID_ON_BEHALF_FLOW_USERINFO_SCOPE="user.read" # example for Scope Needed for Microsoft Graph API
# Set to true to use the OpenID Connect end session endpoint for logout
OPENID_USE_END_SESSION_ENDPOINT=
# URL to redirect to after OpenID logout (defaults to ${DOMAIN_CLIENT}/login)
OPENID_POST_LOGOUT_REDIRECT_URI=
# Maximum logout URL length before using logout_hint instead of id_token_hint (default: 2000)
OPENID_MAX_LOGOUT_URL_LENGTH=
#========================#
# SharePoint Integration #
#========================#
# Requires Entra ID (OpenID) authentication to be configured
# Enable SharePoint file picker in chat and agent panels
# ENABLE_SHAREPOINT_FILEPICKER=true
# SharePoint tenant base URL (e.g., https://yourtenant.sharepoint.com)
# SHAREPOINT_BASE_URL=https://yourtenant.sharepoint.com
# Microsoft Graph API And SharePoint scopes for file picker
# SHAREPOINT_PICKER_SHAREPOINT_SCOPE==https://yourtenant.sharepoint.com/AllSites.Read
# SHAREPOINT_PICKER_GRAPH_SCOPE=Files.Read.All
#========================#
# SAML
# Note: If OpenID is enabled, SAML authentication will be automatically disabled.
SAML_ENTRY_POINT=
SAML_ISSUER=
SAML_CERT=
SAML_CALLBACK_URL=/oauth/saml/callback
SAML_SESSION_SECRET=
# Attribute mappings (optional)
SAML_EMAIL_CLAIM=
SAML_USERNAME_CLAIM=
SAML_GIVEN_NAME_CLAIM=
SAML_FAMILY_NAME_CLAIM=
SAML_PICTURE_CLAIM=
SAML_NAME_CLAIM=
# Logint buttion settings (optional)
SAML_BUTTON_LABEL=
SAML_IMAGE_URL=
# Whether the SAML Response should be signed.
# - If "true", the entire `SAML Response` will be signed.
# - If "false" or unset, only the `SAML Assertion` will be signed (default behavior).
# SAML_USE_AUTHN_RESPONSE_SIGNED=
#===============================================#
# Microsoft Graph API / Entra ID Integration #
#===============================================#
# Enable Entra ID people search integration in permissions/sharing system
# When enabled, the people picker will search both local database and Entra ID
USE_ENTRA_ID_FOR_PEOPLE_SEARCH=false
# When enabled, entra id groups owners will be considered as members of the group
ENTRA_ID_INCLUDE_OWNERS_AS_MEMBERS=false
# Microsoft Graph API scopes needed for people/group search
# Default scopes provide access to user profiles and group memberships
OPENID_GRAPH_SCOPES=User.Read,People.Read,GroupMember.Read.All
# LDAP
LDAP_URL=
LDAP_BIND_DN=
LDAP_BIND_CREDENTIALS=
LDAP_USER_SEARCH_BASE=
#LDAP_SEARCH_FILTER="mail="
LDAP_CA_CERT_PATH=
# LDAP_TLS_REJECT_UNAUTHORIZED=
# LDAP_STARTTLS=
# LDAP_LOGIN_USES_USERNAME=true
# LDAP_ID=
# LDAP_USERNAME=
# LDAP_EMAIL=
# LDAP_FULL_NAME=
#========================#
# Email Password Reset #
#========================#
EMAIL_SERVICE=
EMAIL_HOST=
EMAIL_PORT=25
EMAIL_ENCRYPTION=
EMAIL_ENCRYPTION_HOSTNAME=
EMAIL_ALLOW_SELFSIGNED=
# Leave both empty for SMTP servers that do not require authentication
EMAIL_USERNAME=
EMAIL_PASSWORD=
EMAIL_FROM_NAME=
EMAIL_FROM=noreply@librechat.ai
#========================#
# Mailgun API #
#========================#
# MAILGUN_API_KEY=your-mailgun-api-key
# MAILGUN_DOMAIN=mg.yourdomain.com
# EMAIL_FROM=noreply@yourdomain.com
# EMAIL_FROM_NAME="LibreChat"
# # Optional: For EU region
# MAILGUN_HOST=https://api.eu.mailgun.net
#========================#
# Firebase CDN #
#========================#
FIREBASE_API_KEY=
FIREBASE_AUTH_DOMAIN=
FIREBASE_PROJECT_ID=
FIREBASE_STORAGE_BUCKET=
FIREBASE_MESSAGING_SENDER_ID=
FIREBASE_APP_ID=
#========================#
# S3 AWS Bucket #
#========================#
AWS_ENDPOINT_URL=
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=
AWS_BUCKET_NAME=
# Required for path-style S3-compatible providers (MinIO, Hetzner, Backblaze B2, etc.)
# that don't support virtual-hosted-style URLs (bucket.endpoint). Not needed for AWS S3.
# AWS_FORCE_PATH_STYLE=false
#========================#
# Azure Blob Storage #
#========================#
AZURE_STORAGE_CONNECTION_STRING=
AZURE_STORAGE_PUBLIC_ACCESS=false
AZURE_CONTAINER_NAME=files
#========================#
# Shared Links #
#========================#
ALLOW_SHARED_LINKS=true
# Allows unauthenticated access to shared links. Defaults to false (auth required) if not set.
ALLOW_SHARED_LINKS_PUBLIC=false
#==============================#
# Static File Cache Control #
#==============================#
# Leave commented out to use defaults: 1 day (86400 seconds) for s-maxage and 2 days (172800 seconds) for max-age
# NODE_ENV must be set to production for these to take effect
# STATIC_CACHE_MAX_AGE=172800
# STATIC_CACHE_S_MAX_AGE=86400
# If you have another service in front of your LibreChat doing compression, disable express based compression here
# DISABLE_COMPRESSION=true
# If you have gzipped version of uploaded image images in the same folder, this will enable gzip scan and serving of these images
# Note: The images folder will be scanned on startup and a ma kept in memory. Be careful for large number of images.
# ENABLE_IMAGE_OUTPUT_GZIP_SCAN=true
#===================================================#
# UI #
#===================================================#
APP_TITLE=LibreChat
# CUSTOM_FOOTER="My custom footer"
HELP_AND_FAQ_URL=https://librechat.ai
# SHOW_BIRTHDAY_ICON=true
# Google tag manager id
#ANALYTICS_GTM_ID=user provided google tag manager id
# limit conversation file imports to a certain number of bytes in size to avoid the container
# maxing out memory limitations by unremarking this line and supplying a file size in bytes
# such as the below example of 250 mib
# CONVERSATION_IMPORT_MAX_FILE_SIZE_BYTES=262144000
#===============#
# REDIS Options #
#===============#
# Enable Redis for caching and session storage
# USE_REDIS=true
# Enable Redis for resumable LLM streams (defaults to USE_REDIS value if not set)
# Set to false to use in-memory storage for streams while keeping Redis for other caches
# USE_REDIS_STREAMS=true
# Single Redis instance
# REDIS_URI=redis://127.0.0.1:6379
# Redis cluster (multiple nodes)
# REDIS_URI=redis://127.0.0.1:7001,redis://127.0.0.1:7002,redis://127.0.0.1:7003
# Redis with TLS/SSL encryption and CA certificate
# REDIS_URI=rediss://127.0.0.1:6380
# REDIS_CA=/path/to/ca-cert.pem
# Elasticache may need to use an alternate dnsLookup for TLS connections. see "Special Note: Aws Elasticache Clusters with TLS" on this webpage: https://www.npmjs.com/package/ioredis
# Enable alternative dnsLookup for redis
# REDIS_USE_ALTERNATIVE_DNS_LOOKUP=true
# Redis authentication (if required)
# REDIS_USERNAME=your_redis_username
# REDIS_PASSWORD=your_redis_password
# Redis key prefix configuration
# Use environment variable name for dynamic prefix (recommended for cloud deployments)
# REDIS_KEY_PREFIX_VAR=K_REVISION
# Or use static prefix directly
# REDIS_KEY_PREFIX=librechat
# Redis connection limits
# REDIS_MAX_LISTENERS=40
# Redis ping interval in seconds (0 = disabled, >0 = enabled)
# When set to a positive integer, Redis clients will ping the server at this interval to keep connections alive
# When unset or 0, no pinging is performed (recommended for most use cases)
# REDIS_PING_INTERVAL=300
# Force specific cache namespaces to use in-memory storage even when Redis is enabled
# Comma-separated list of CacheKeys
# Defaults to CONFIG_STORE,APP_CONFIG so YAML-derived config stays per-container (safe for blue/green deployments)
# Set to empty string to force all namespaces through Redis: FORCED_IN_MEMORY_CACHE_NAMESPACES=
# FORCED_IN_MEMORY_CACHE_NAMESPACES=CONFIG_STORE,APP_CONFIG
# Leader Election Configuration (for multi-instance deployments with Redis)
# Duration in seconds that the leader lease is valid before it expires (default: 25)
# LEADER_LEASE_DURATION=25
# Interval in seconds at which the leader renews its lease (default: 10)
# LEADER_RENEW_INTERVAL=10
# Maximum number of retry attempts when renewing the lease fails (default: 3)
# LEADER_RENEW_ATTEMPTS=3
# Delay in seconds between retry attempts when renewing the lease (default: 0.5)
# LEADER_RENEW_RETRY_DELAY=0.5
#==================================================#
# Others #
#==================================================#
# You should leave the following commented out #
# NODE_ENV=
# E2E_USER_EMAIL=
# E2E_USER_PASSWORD=
#=====================================================#
# Cache Headers #
#=====================================================#
# Headers that control caching of the index.html #
# Default configuration prevents caching to ensure #
# users always get the latest version. Customize #
# only if you understand caching implications. #
# INDEX_CACHE_CONTROL=no-cache, no-store, must-revalidate
# INDEX_PRAGMA=no-cache
# INDEX_EXPIRES=0
# no-cache: Forces validation with server before using cached version
# no-store: Prevents storing the response entirely
# must-revalidate: Prevents using stale content when offline
#=====================================================#
# OpenWeather #
#=====================================================#
OPENWEATHER_API_KEY=
#====================================#
# LibreChat Code Interpreter API #
#====================================#
# https://code.librechat.ai
# LIBRECHAT_CODE_API_KEY=your-key
#======================#
# Web Search #
#======================#
# Note: All of the following variable names can be customized.
# Omit values to allow user to provide them.
# For more information on configuration values, see:
# https://librechat.ai/docs/features/web_search
# Search Provider (Required)
# SERPER_API_KEY=your_serper_api_key
# Scraper (Required)
# FIRECRAWL_API_KEY=your_firecrawl_api_key
# Optional: Custom Firecrawl API URL
# FIRECRAWL_API_URL=your_firecrawl_api_url
# Reranker (Required)
# JINA_API_KEY=your_jina_api_key
# or
# COHERE_API_KEY=your_cohere_api_key
#======================#
# MCP Configuration #
#======================#
# Treat 401/403 responses as OAuth requirement when no oauth metadata found
# MCP_OAUTH_ON_AUTH_ERROR=true
# Timeout for OAuth detection requests in milliseconds
# MCP_OAUTH_DETECTION_TIMEOUT=5000
# Cache connection status checks for this many milliseconds to avoid expensive verification
# MCP_CONNECTION_CHECK_TTL=60000
# Skip code challenge method validation (e.g., for AWS Cognito that supports S256 but doesn't advertise it)
# When set to true, forces S256 code challenge even if not advertised in .well-known/openid-configuration
# MCP_SKIP_CODE_CHALLENGE_CHECK=false
# Circuit breaker: max connect/disconnect cycles before tripping (per server)
# MCP_CB_MAX_CYCLES=7
# Circuit breaker: sliding window (ms) for counting cycles
# MCP_CB_CYCLE_WINDOW_MS=45000
# Circuit breaker: cooldown (ms) after the cycle breaker trips
# MCP_CB_CYCLE_COOLDOWN_MS=15000
# Circuit breaker: max consecutive failed connection rounds before backoff
# MCP_CB_MAX_FAILED_ROUNDS=3
# Circuit breaker: sliding window (ms) for counting failed rounds
# MCP_CB_FAILED_WINDOW_MS=120000
# Circuit breaker: base backoff (ms) after failed round threshold is reached
# MCP_CB_BASE_BACKOFF_MS=30000
# Circuit breaker: max backoff cap (ms) for exponential backoff
# MCP_CB_MAX_BACKOFF_MS=300000
+8
View File
@@ -0,0 +1,8 @@
# Auto-generated from pyproject.toml — do not edit manually
ALFRED_VERSION=0.1.7
PYTHON_VERSION=3.14.3
IMAGE_NAME=alfred_media_organizer
SERVICE_NAME=alfred
LIBRECHAT_VERSION=v0.8.4
RAG_VERSION=v0.7.3
UV_VERSION=0.11.6
+20 -14
View File
@@ -2,11 +2,10 @@ name: CI/CD Awesome Pipeline
on: on:
push: push:
branches: [main]
tags: tags:
- 'v*.*.*' - 'v*.*.*'
pull_request:
branches: [main] workflow_dispatch:
env: env:
REGISTRY_URL: ${{ vars.REGISTRY_URL || 'gitea.iswearihadsomethingforthis.net' }} REGISTRY_URL: ${{ vars.REGISTRY_URL || 'gitea.iswearihadsomethingforthis.net' }}
@@ -35,6 +34,9 @@ jobs:
- name: Checkout code - name: Checkout code
uses: actions/checkout@v4 uses: actions/checkout@v4
- name: Generate build variables
run: python scripts/generate_build_vars.py
- name: Load config from Makefile - name: Load config from Makefile
id: config id: config
run: make -s _ci-dump-config >> $GITHUB_OUTPUT run: make -s _ci-dump-config >> $GITHUB_OUTPUT
@@ -45,12 +47,12 @@ jobs:
with: with:
images: gitea.iswearihadsomethingforthis.net/francwa/${{ steps.config.outputs.image_name }} images: gitea.iswearihadsomethingforthis.net/francwa/${{ steps.config.outputs.image_name }}
tags: | tags: |
# Case 1 - Git Tag (v1.2.3) # Tagged (v1.2.3)
type=semver,pattern={{ version }} type=semver,pattern={{ version }}
# Case 2 - Push on main # Latest (main)
type=raw,value=latest,enable={{ is_default_branch }} type=raw,value=latest,enable={{ is_default_branch }}
# Both case - Commit sha # Feature branches
type=sha type=ref,event=branch
- name: Login to Gitea Registry - name: Login to Gitea Registry
uses: docker/login-action@v3 uses: docker/login-action@v3
@@ -64,7 +66,6 @@ jobs:
uses: docker/build-push-action@v5 uses: docker/build-push-action@v5
with: with:
context: . context: .
file: ./brain/Dockerfile
push: true push: true
tags: ${{ steps.meta.outputs.tags }} tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }} labels: ${{ steps.meta.outputs.labels }}
@@ -74,13 +75,18 @@ jobs:
RUNNER=${{ steps.config.outputs.runner }} RUNNER=${{ steps.config.outputs.runner }}
- name: 🛡️ Run Trivy Vulnerability Scanner - name: 🛡️ Run Trivy Vulnerability Scanner
uses: aquasecurity/trivy-action@master uses: docker://aquasec/trivy:latest
env: env:
TRIVY_USERNAME: ${{ gitea.actor }}
TRIVY_PASSWORD: ${{ secrets.G1T34_TOKEN }}
# Unset the fake GITHUB_TOKEN injected by Gitea # Unset the fake GITHUB_TOKEN injected by Gitea
GITHUB_TOKEN: "" GITHUB_TOKEN: ""
with: with:
image-ref: ${{ steps.meta.outputs.tags }} args: image --format table --output trivy-report.txt --exit-code 0 --ignore-unfixed --severity CRITICAL,HIGH gitea.iswearihadsomethingforthis.net/francwa/${{ steps.config.outputs.image_name }}:latest
format: 'table'
exit-code: '1' - name: 📤 Upload Security Report
ignore-unfixed: true uses: actions/upload-artifact@v3
severity: 'CRITAL, HIGH' with:
name: security-report
path: trivy-report.txt
retention-days: 7
+22
View File
@@ -0,0 +1,22 @@
name: Renovate Bot
on:
schedule:
# Every Monday 4AM
- cron: '0 4 * * 1'
workflow_dispatch:
jobs:
renovate:
runs-on: ubuntu-latest
steps:
- name: Run Renovate
uses: docker://renovate/renovate:latest
env:
RENOVATE_PLATFORM: "gitea"
RENOVATE_ENDPOINT: "https://gitea.iswearihadsomethingforthis.net/api/v1"
RENOVATE_TOKEN: "${{ secrets.RENOVATE_TOKEN }}"
RENOVATE_REPOSITORIES: '["${{ gitea.repository }}"]'
RENOVATE_GIT_AUTHOR: "Renovate Bot <renovate@bot.local>"
# Might need a free github token if lots of depencies
# RENOVATE_GITHUB_TOKEN: "${{ secrets.GITHUB_COM_TOKEN }}"
+24 -1
View File
@@ -55,7 +55,30 @@ coverage.xml
Thumbs.db Thumbs.db
# Secrets # Secrets
.env .env.secrets
# Backup files # Backup files
*.backup *.backup
*.bak
env_backup/
# Application data dir
data/*
# Application logs
logs/*
# Documentation folder
docs/
# .md files (project-level Markdown is brol-y; allow-list the ones we track)
*.md
!CHANGELOG.md
!/README.md
!specs/
!specs/**/*.md
# Private dev docs (separate git repo inside; see .claude/CLAUDE.md)
/.claude/
#
+1224
View File
File diff suppressed because it is too large Load Diff
+91
View File
@@ -0,0 +1,91 @@
# syntax=docker/dockerfile:1
# check=skip=InvalidDefaultArgInFrom
ARG PYTHON_VERSION
ARG UV_VERSION
# Stage 0: uv binary (workaround — --from doesn't support ARG expansion)
FROM ghcr.io/astral-sh/uv:${UV_VERSION} AS uv-bin
# ===========================================
# Stage 1: Builder
# ===========================================
FROM python:${PYTHON_VERSION}-slim-bookworm AS builder
ENV DEBIAN_FRONTEND=noninteractive \
PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
UV_PROJECT_ENVIRONMENT=/venv
# Install build dependencies
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt,sharing=locked \
apt-get update \
&& apt-get install -y --no-install-recommends build-essential
# Install uv globally
COPY --from=uv-bin /uv /usr/local/bin/uv
WORKDIR /tmp
COPY pyproject.toml uv.lock Makefile ./
# Install dependencies into /venv
RUN --mount=type=cache,target=/root/.cache/uv uv sync
COPY scripts/ ./scripts/
COPY .env.example ./
# ===========================================
# Stage 2: Testing
# ===========================================
FROM builder AS test
RUN --mount=type=cache,target=/root/.cache/uv uv sync --group dev
COPY alfred/ ./alfred
COPY scripts ./scripts
COPY tests/ ./tests
# ===========================================
# Stage 3: Runtime
# ===========================================
FROM python:${PYTHON_VERSION}-slim-bookworm AS runtime
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PYTHONPATH=/home/appuser \
PATH="/venv/bin:$PATH"
# Install runtime dependencies
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt,sharing=locked \
apt-get update \
&& apt-get install -y --no-install-recommends ca-certificates
# Create non-root user
RUN useradd -m -u 1000 -s /bin/bash appuser
# Create data directories
RUN mkdir -p /data /logs \
&& chown -R appuser:appuser /data /logs
USER appuser
WORKDIR /home/appuser
# Copy venv from builder stage
COPY --from=builder /venv /venv
# Copy application code
COPY --chown=appuser:appuser alfred/ ./alfred
COPY --chown=appuser:appuser scripts/ ./scripts
COPY --chown=appuser:appuser .env.example ./
COPY --chown=appuser:appuser pyproject.toml ./
VOLUME ["/data", "/logs"]
EXPOSE 8000
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD python -c "import requests; requests.get('http://localhost:8000/health', timeout=5).raise_for_status()" || exit 1
CMD ["python", "-m", "uvicorn", "alfred.app:app", "--host", "0.0.0.0", "--port", "8000"]
+157 -229
View File
@@ -1,259 +1,187 @@
.POSIX:
.SUFFIXES:
.DEFAULT_GOAL := help .DEFAULT_GOAL := help
# --- SETTINGS --- # --- Load Config from pyproject.toml ---
CORE_DIR = brain export
IMAGE_NAME = agent_media -include .env.make
PYTHON_VERSION = 3.12.7
PYTHON_VERSION_SHORT = $(shell echo $(PYTHON_VERSION) | cut -d. -f1,2)
# Change to 'uv' when ready.
RUNNER ?= poetry
SERVICE_NAME = agent_media
export IMAGE_NAME # --- Profiles management ---
export PYTHON_VERSION # Usage: make up p=rag,meili
export PYTHON_VERSION_SHORT p ?= full
export RUNNER PROFILES_PARAM := COMPOSE_PROFILES=$(p)
# --- ADAPTERS --- # --- Commands ---
# UV uses "sync", Poetry uses "install". Both install DEV deps by default. DOCKER_COMPOSE := docker compose \
INSTALL_CMD = $(if $(filter uv,$(RUNNER)),sync,install) --env-file .env.alfred \
--env-file .env.secrets \
# --- MACROS --- --env-file .env.make
ARGS = $(filter-out $@,$(MAKECMDGOALS)) DOCKER_BUILD := DOCKER_BUILDKIT=1 docker build \
BUMP_CMD = cd $(CORE_DIR) && $(RUNNER) run bump-my-version bump
COMPOSE_CMD = docker-compose
DOCKER_CMD = docker build \
--build-arg PYTHON_VERSION=$(PYTHON_VERSION) \ --build-arg PYTHON_VERSION=$(PYTHON_VERSION) \
--build-arg PYTHON_VERSION_SHORT=$(PYTHON_VERSION_SHORT) \ --build-arg UV_VERSION=$(UV_VERSION)
--build-arg RUNNER=$(RUNNER) \
-f $(CORE_DIR)/Dockerfile \
-t $(IMAGE_NAME):latest .
RUNNER_ADD = cd $(CORE_DIR) && $(RUNNER) add # --- Phony ---
RUNNER_HOOKS = cd $(CORE_DIR) && $(RUNNER) run pre-commit install -c ../.pre-commit-config.yaml .PHONY: bootstrap up down restart logs ps shell build build-test install \
RUNNER_INSTALL = cd $(CORE_DIR) && $(RUNNER) $(INSTALL_CMD) update install-hooks test coverage lint format clean major minor patch help
RUNNER_RUN = cd $(CORE_DIR) && $(RUNNER) run
RUNNER_UPDATE = cd $(CORE_DIR) && $(RUNNER) update
# --- STYLES --- # --- Setup ---
B = \033[1m .env.alfred .env.librechat .env.secrets .env.make:
G = \033[32m @echo "Initializing environment..."
T = \033[36m @uv run python scripts/bootstrap.py \
R = \033[0m && echo "✓ Environment ready" \
|| (echo "✗ Environment setup failed" && exit 1)
# --- TARGETS --- bootstrap: .env.alfred .env.librechat .env.secrets .env.make
.PHONY: add build build-test check-docker check-runner clean coverage down format help init-dotenv install install-hooks lint logs major minor patch prune ps restart run shell test up update _check_branch _ci-dump-config _ci-run-tests _push_tag
# Catch-all for args # --- Docker ---
%: up: .env.alfred .env.secrets
@: @echo "Starting containers with profiles: [full]..."
@$(PROFILES_PARAM) $(DOCKER_COMPOSE) up -d --remove-orphans \
&& echo "✓ Containers started" \
|| (echo "✗ Failed to start containers" && exit 1)
add: check-runner down:
@echo "$(T) Adding dependency ($(RUNNER)): $(ARGS)$(R)" @echo "Stopping containers..."
$(RUNNER_ADD) $(ARGS) @$(PROFILES_PARAM) $(DOCKER_COMPOSE) down \
&& echo "✓ Containers stopped" \
|| (echo "✗ Failed to stop containers" && exit 1)
build: check-docker restart:
@echo "$(T)🐳 Building Docker image...$(R)" @echo "Restarting containers..."
$(DOCKER_CMD) @$(PROFILES_PARAM) $(DOCKER_COMPOSE) restart \
@echo "✅ Image $(IMAGE_NAME):latest ready." && echo "✓ Containers restarted" \
|| (echo "✗ Failed to restart containers" && exit 1)
build-test: check-docker logs:
@echo "$(T)🐳 Building test image (with dev deps)...$(R)" @echo "Following logs (Ctrl+C to exit)..."
docker build \ @$(PROFILES_PARAM) $(DOCKER_COMPOSE) logs -f
--build-arg RUNNER=$(RUNNER) \
--build-arg PYTHON_VERSION=$(PYTHON_VERSION) \
--build-arg PYTHON_VERSION_SHORT=$(PYTHON_VERSION_SHORT) \
-f $(CORE_DIR)/Dockerfile \
--target test \
-t $(IMAGE_NAME):test .
@echo "✅ Test image $(IMAGE_NAME):test ready."
check-docker: ps:
@command -v docker >/dev/null 2>&1 || { echo "$(R)❌ Docker not installed$(R)"; exit 1; } @echo "Container status:"
@docker info >/dev/null 2>&1 || { echo "$(R)❌ Docker daemon not running$(R)"; exit 1; } @$(PROFILES_PARAM) $(DOCKER_COMPOSE) ps
check-runner: shell:
@command -v $(RUNNER) >/dev/null 2>&1 || { echo "$(R)$(RUNNER) not installed$(R)"; exit 1; } @echo "Opening shell in $(SERVICE_NAME)..."
@$(DOCKER_COMPOSE) exec $(SERVICE_NAME) /bin/bash
# --- Build ---
build: .env.make
@echo "Building image $(IMAGE_NAME):latest ..."
@$(DOCKER_BUILD) -t $(IMAGE_NAME):latest . \
&& echo "✓ Build complete" \
|| (echo "✗ Build failed" && exit 1)
build-test: .env.make
@echo "Building test image $(IMAGE_NAME):test..."
@$(DOCKER_BUILD) --target test -t $(IMAGE_NAME):test . \
&& echo "✓ Test image built" \
|| (echo "✗ Build failed" && exit 1)
# --- Dependencies ---
install:
@echo "Installing dependencies with uv..."
@uv install \
&& echo "✓ Dependencies installed" \
|| (echo "✗ Installation failed" && exit 1)
install-hooks:
@echo "Installing pre-commit hooks..."
@uv run pre-commit install \
&& echo "✓ Hooks installed" \
|| (echo "✗ Hook installation failed" && exit 1)
update:
@echo "Updating dependencies with uv..."
@uv update \
&& echo "✓ Dependencies updated" \
|| (echo "✗ Update failed" && exit 1)
# --- Quality ---
test:
@echo "Running tests..."
@uv run pytest \
&& echo "✓ Tests passed" \
|| (echo "✗ Tests failed" && exit 1)
coverage:
@echo "Running tests with coverage..."
@uv run pytest --cov=. --cov-report=html --cov-report=term \
&& echo "✓ Coverage report generated" \
|| (echo "✗ Coverage failed" && exit 1)
lint:
@echo "Linting code..."
@uv run ruff check --fix . \
&& echo "✓ Linting complete" \
|| (echo "✗ Linting failed" && exit 1)
format:
@echo "Formatting code..."
@uv run ruff format . && uv run ruff check --fix . \
&& echo "✓ Code formatted" \
|| (echo "✗ Formatting failed" && exit 1)
clean: clean:
@echo "$(T)🧹 Cleaning caches...$(R)" @echo "Cleaning build artifacts..."
cd $(CORE_DIR) && rm -rf .ruff_cache __pycache__ .pytest_cache @rm -rf .ruff_cache __pycache__ .pytest_cache htmlcov .coverage
find $(CORE_DIR) -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true @find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true
find $(CORE_DIR) -type d -name ".pytest_cache" -exec rm -rf {} + 2>/dev/null || true @echo "✓ Cleanup complete"
find $(CORE_DIR) -type f -name "*.pyc" -delete 2>/dev/null || true
@echo "✅ Caches cleaned."
coverage: check-runner # --- Versioning ---
@echo "$(T)📊 Running tests with coverage...$(R)" major minor patch: _check-main
$(RUNNER_RUN) pytest --cov=. --cov-report=html --cov-report=term $(ARGS) @echo "Bumping $@ version..."
@echo "✅ Report generated in htmlcov/" @uv run bump-my-version bump $@ \
&& echo "✓ Version bumped" \
|| (echo "✗ Version bump failed" && exit 1)
down: check-docker @echo "Pushing tags..."
@echo "$(T)🛑 Stopping containers...$(R)" @git push --tags \
$(COMPOSE_CMD) down && echo "✓ Tags pushed" \
@echo "✅ System stopped." || (echo "✗ Push failed" && exit 1)
format: check-runner
@echo "$(T)✨ Formatting with Ruff...$(R)"
$(RUNNER_RUN) ruff format .
$(RUNNER_RUN) ruff check --fix .
@echo "✅ Code cleaned."
help:
@echo "$(B)Available commands:$(R)"
@echo ""
@echo "$(G)Setup:$(R)"
@echo " $(T)check-docker $(R) Verify Docker is installed and running."
@echo " $(T)check-runner $(R) Verify package manager ($(RUNNER))."
@echo " $(T)init-dotenv $(R) Create .env from .env.example with generated secrets."
@echo " $(T)install $(R) Install ALL dependencies (Prod + Dev)."
@echo " $(T)install-hooks $(R) Install git pre-commit hooks."
@echo ""
@echo "$(G)Docker:$(R)"
@echo " $(T)build $(R) Build the docker image (production)."
@echo " $(T)build-test $(R) Build the docker image (with dev deps for testing)."
@echo " $(T)down $(R) Stop and remove containers."
@echo " $(T)logs $(R) Follow logs."
@echo " $(T)prune $(R) Clean Docker system."
@echo " $(T)ps $(R) Show container status."
@echo " $(T)restart $(R) Restart all containers."
@echo " $(T)shell $(R) Open shell in container."
@echo " $(T)up $(R) Start the agent."
@echo ""
@echo "$(G)Development:$(R)"
@echo " $(T)add ... $(R) Add dependency (use --group dev or --dev if needed)."
@echo " $(T)clean $(R) Clean caches."
@echo " $(T)coverage $(R) Run tests with coverage."
@echo " $(T)format $(R) Format code (Ruff)."
@echo " $(T)lint $(R) Lint code without fixing."
@echo " $(T)test ... $(R) Run tests (local with $(RUNNER))."
@echo " $(T)update $(R) Update dependencies."
@echo ""
@echo "$(G)Versioning:$(R)"
@echo " $(T)major/minor/patch $(R) Bump version and push tag (triggers CI/CD)."
init-dotenv:
@echo "$(T)🔑 Initializing .env file...$(R)"
@if [ -f .env ]; then \
echo "$(R)⚠️ .env already exists. Skipping.$(R)"; \
exit 0; \
fi
@if [ ! -f .env.example ]; then \
echo "$(R)❌ .env.example not found$(R)"; \
exit 1; \
fi
@if ! command -v openssl >/dev/null 2>&1; then \
echo "$(R)❌ openssl not found. Please install it first.$(R)"; \
exit 1; \
fi
@echo "$(T) → Copying .env.example...$(R)"
@cp .env.example .env
@echo "$(T) → Generating secrets...$(R)"
@sed -i.bak "s|JWT_SECRET=.*|JWT_SECRET=$$(openssl rand -base64 32)|" .env
@sed -i.bak "s|JWT_REFRESH_SECRET=.*|JWT_REFRESH_SECRET=$$(openssl rand -base64 32)|" .env
@sed -i.bak "s|CREDS_KEY=.*|CREDS_KEY=$$(openssl rand -hex 16)|" .env
@sed -i.bak "s|CREDS_IV=.*|CREDS_IV=$$(openssl rand -hex 8)|" .env
@sed -i.bak "s|MEILI_MASTER_KEY=.*|MEILI_MASTER_KEY=$$(openssl rand -base64 32)|" .env
@sed -i.bak "s|AGENT_BRAIN_API_KEY=.*|AGENT_BRAIN_API_KEY=$$(openssl rand -base64 24)|" .env
@rm -f .env.bak
@echo "$(G)✅ .env created with generated secrets!$(R)"
@echo "$(T)⚠️ Don't forget to add your API keys:$(R)"
@echo " - OPENAI_API_KEY"
@echo " - DEEPSEEK_API_KEY"
@echo " - TMDB_API_KEY (optional)"
install: check-runner
@echo "$(T)📦 Installing FULL environment ($(RUNNER))...$(R)"
$(RUNNER_INSTALL)
@echo "✅ Environment ready (Prod + Dev)."
install-hooks: check-runner
@echo "$(T)🔧 Installing hooks...$(R)"
$(RUNNER_HOOKS)
@echo "✅ Hooks ready."
lint: check-runner
@echo "$(T)🔍 Linting code...$(R)"
$(RUNNER_RUN) ruff check .
logs: check-docker
@echo "$(T)📋 Following logs...$(R)"
$(COMPOSE_CMD) logs -f
major: _check_branch
@echo "$(T)💥 Bumping major...$(R)"
SKIP=all $(BUMP_CMD) major
@$(MAKE) -s _push_tag
minor: _check_branch
@echo "$(T)✨ Bumping minor...$(R)"
SKIP=all $(BUMP_CMD) minor
@$(MAKE) -s _push_tag
patch: _check_branch
@echo "$(T)🚀 Bumping patch...$(R)"
SKIP=all $(BUMP_CMD) patch
@$(MAKE) -s _push_tag
prune: check-docker
@echo "$(T)🗑️ Pruning Docker resources...$(R)"
docker system prune -af
@echo "✅ Docker cleaned."
ps: check-docker
@echo "$(T)📋 Container status:$(R)"
@$(COMPOSE_CMD) ps
restart: check-docker
@echo "$(T)🔄 Restarting containers...$(R)"
$(COMPOSE_CMD) restart
@echo "✅ Containers restarted."
run: check-runner
$(RUNNER_RUN) $(ARGS)
shell: check-docker
@echo "$(T)🐚 Opening shell in $(SERVICE_NAME)...$(R)"
$(COMPOSE_CMD) exec $(SERVICE_NAME) /bin/sh
test: check-runner
@echo "$(T)🧪 Running tests...$(R)"
$(RUNNER_RUN) pytest $(ARGS)
up: check-docker
@echo "$(T)🚀 Starting Agent Media...$(R)"
$(COMPOSE_CMD) up -d
@echo "✅ System is up."
update: check-runner
@echo "$(T)🔄 Updating dependencies...$(R)"
$(RUNNER_UPDATE)
@echo "✅ All packages up to date."
_check_branch:
@curr=$$(git rev-parse --abbrev-ref HEAD); \
if [ "$$curr" != "main" ]; then \
echo "❌ Error: not on the main branch"; exit 1; \
fi
# CI/CD helpers
_ci-dump-config: _ci-dump-config:
@echo "image_name=$(IMAGE_NAME)" @echo "image_name=$(IMAGE_NAME)"
@echo "python_version=$(PYTHON_VERSION)" @echo "python_version=$(PYTHON_VERSION)"
@echo "python_version_short=$(PYTHON_VERSION_SHORT)" @echo "uv_version=$(UV_VERSION)"
@echo "runner=$(RUNNER)"
@echo "service_name=$(SERVICE_NAME)" @echo "service_name=$(SERVICE_NAME)"
_ci-run-tests:build-test _ci-run-tests:build-test
@echo "$(T)🧪 Running tests in Docker...$(R)" @echo "Running tests in Docker..."
docker run --rm \ docker run --rm \
-e DEEPSEEK_API_KEY \ -e DEEPSEEK_API_KEY \
-e TMDB_API_KEY \ -e TMDB_API_KEY \
-e QBITTORRENT_URL \
$(IMAGE_NAME):test pytest $(IMAGE_NAME):test pytest
@echo " Tests passed." @echo " Tests passed."
_push_tag: _check-main:
@echo "$(T)📦 Pushing tag...$(R)" @test "$$(git rev-parse --abbrev-ref HEAD)" = "main" \
git push --tags || (echo "✗ ERROR: Not on main branch" && exit 1)
@echo "✅ Tag pushed. Check CI for build status."
# --- Help ---
help:
@echo "Cleverly Crafted Unawareness - Management Commands"
@echo ""
@echo "Usage: make [target] [p=profile1,profile2]"
@echo ""
@echo "Setup:"
@echo " bootstrap Generate .env.alfred, .env.librechat, .env.secrets and .env.make"
@echo ""
@echo "Docker:"
@echo " up Start containers (default profile: core)"
@echo " Example: make up p=rag,meili"
@echo " down Stop all containers"
@echo " restart Restart containers (supports p=...)"
@echo " logs Follow logs (supports p=...)"
@echo " ps Status of containers"
@echo " shell Open bash in the core container"
@echo " build Build the production Docker image"
@echo ""
@echo "Dev & Quality:"
@echo " setup Bootstrap .env and security keys"
@echo " install Install dependencies via uv"
@echo " test Run pytest suite"
@echo " coverage Run tests and generate HTML report"
@echo " lint/format Quality and style checks"
@echo ""
@echo "Release:"
@echo " major|minor|patch Bump version and push tags (main branch only)"
+433
View File
@@ -0,0 +1,433 @@
# Alfred Media Organizer 🎬
An AI-powered agent for managing your local media library with natural language. Search, download, and organize movies and TV shows effortlessly through a conversational interface.
[![Python 3.14](https://img.shields.io/badge/python-3.14-blue.svg)](https://www.python.org/downloads/)
[![uv](https://img.shields.io/badge/dependency%20manager-uv-purple)](https://github.com/astral-sh/uv)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff)
## ✨ Features
- 🤖 **Natural Language Interface** — Talk to your media library in plain language
- 🔍 **Smart Search** — Find movies and TV shows via TMDB with rich metadata
- 📥 **Torrent Integration** — Search and download via qBittorrent
- 🧠 **Contextual Memory** — Remembers your preferences and conversation history
- 📁 **Auto-Organization** — Moves and renames media files, resolves destinations, handles subtitles
- 🎞️ **Subtitle Pipeline** — Identifies, matches, and places subtitle tracks automatically
- 🔄 **Workflow Engine** — YAML-defined multi-step workflows (e.g. `organize_media`)
- 🌐 **OpenAI-Compatible API** — Works with any OpenAI-compatible client (LibreChat, OpenWebUI, etc.)
- 🔒 **Secure by Default** — Auto-generated secrets and encrypted credentials
## 🏗️ Architecture
Built with **Domain-Driven Design (DDD)** principles for clean separation of concerns:
```
alfred/
├── agent/ # AI agent orchestration
│ ├── llm/ # LLM clients (Ollama, DeepSeek)
│ ├── tools/ # Tool implementations (api, filesystem, language)
│ └── workflows/ # YAML-defined multi-step workflows
├── application/ # Use cases & DTOs
│ ├── movies/ # Movie search
│ ├── torrents/ # Torrent management
│ └── filesystem/ # File operations (move, list, subtitles, seed links)
├── domain/ # Business logic & entities
│ ├── media/ # Release parsing
│ ├── movies/ # Movie entities
│ ├── tv_shows/ # TV show entities & value objects
│ ├── subtitles/ # Subtitle scanner, services, knowledge base
│ └── shared/ # Common value objects (ImdbId, FilePath, FileSize)
└── infrastructure/ # External services & persistence
├── api/ # External API clients (TMDB, qBittorrent, Knaben)
├── filesystem/ # File manager (hard-link based, path-traversal safe)
├── persistence/ # Three-tier memory (LTM/STM/Episodic) + JSON repositories
└── subtitle/ # Subtitle infrastructure
```
### Key flows
**Agent execution:** `agent.step(user_input)` → LLM call → if tool_calls, execute each via registry → loop until no tool calls or `max_tool_iterations` → return final response.
**Media organization workflow:**
1. `resolve_destination` — Determines target folder/filename from release name
2. `move_media` — Hard-links file to library, deletes source
3. `manage_subtitles` — Scans, classifies, and places subtitle tracks
4. `create_seed_links` — Hard-links library file back to torrents/ for continued seeding
**Memory tiers:**
- **LTM** (`data/memory/ltm.json`) — Persisted config, media library, watchlist
- **STM** — Conversation history (capped at `MAX_HISTORY_MESSAGES`)
- **Episodic** — Transient search results, active downloads, recent errors
## 🚀 Quick Start
### Prerequisites
- **Python 3.14+**
- **uv** (dependency manager)
- **Docker & Docker Compose** (recommended for full stack)
- **API Keys:**
- TMDB API key ([get one here](https://www.themoviedb.org/settings/api))
- Optional: DeepSeek or other LLM provider keys
### Installation
```bash
# Clone the repository
git clone https://github.com/francwa/alfred_media_organizer.git
cd alfred_media_organizer
# Install dependencies
make install
# Install pre-commit hooks
make install-hooks
# Bootstrap environment (generates .env with secure secrets)
make bootstrap
# Validate your .env against the schema
make validate
# Edit .env with your API keys
nano .env
```
### Running with Docker (Recommended)
```bash
# Start all services (LibreChat + Alfred + MongoDB + Ollama)
make up
# Or start with specific profiles
make up p=rag,meili # Include RAG and Meilisearch
make up p=qbittorrent # Include qBittorrent
make up p=full # Everything
# View logs
make logs
# Stop all services
make down
```
The web interface will be available at **http://localhost:3080**
### Running Locally (Development)
```bash
uv run uvicorn alfred.app:app --reload --port 8000
```
## ⚙️ Configuration
### Settings system
`settings.toml` is the single source of truth. The schema flows:
```
settings.toml → settings_schema.py → settings_bootstrap.py → .env + .env.make → settings.py
```
To add a setting: define it in `settings.toml`, run `make bootstrap`, then access via `settings.my_new_setting`.
```bash
# First time setup
make bootstrap
# Validate existing .env against schema
make validate
# Re-run after settings.toml changes (existing secrets preserved)
make bootstrap
```
**Never commit `.env` or `.env.make`** — both are gitignored and auto-generated.
### Key settings (.env)
```bash
# --- CORE ---
MAX_HISTORY_MESSAGES=10
MAX_TOOL_ITERATIONS=10
# --- LLM ---
DEFAULT_LLM_PROVIDER=local # local (Ollama) | deepseek
OLLAMA_BASE_URL=http://ollama:11434
OLLAMA_MODEL=llama3.3:latest
LLM_TEMPERATURE=0.2
# --- API KEYS ---
TMDB_API_KEY=your-tmdb-key # Required for movie/show search
DEEPSEEK_API_KEY= # Optional
# --- SECURITY (auto-generated) ---
JWT_SECRET=<auto>
CREDS_KEY=<auto>
MONGO_PASSWORD=<auto>
```
## 🐳 Docker Services
### Docker Profiles
| Profile | Extra services | Use case |
|---------|---------------|----------|
| (default) | — | LibreChat + Alfred + MongoDB + Ollama |
| `meili` | Meilisearch | Fast full-text search |
| `rag` | RAG API + VectorDB (PostgreSQL) | Document retrieval |
| `qbittorrent` | qBittorrent | Torrent downloads |
| `full` | All of the above | Complete setup |
```bash
make up # Start (default profile)
make up p=full # Start with all services
make down # Stop
make restart # Restart
make logs # Follow logs
make ps # Container status
```
## 🛠️ Available Tools
| Tool | Description |
|------|-------------|
| `find_media_imdb_id` | Search for movies/TV shows on TMDB by title |
| `find_torrent` | Search for torrents across multiple indexers |
| `get_torrent_by_index` | Get detailed info about a specific result |
| `add_torrent_by_index` | Download a torrent from search results |
| `add_torrent_to_qbittorrent` | Add a torrent via magnet link directly |
| `resolve_destination` | Compute the target library path for a release |
| `move_media` | Hard-link a file to its library destination |
| `manage_subtitles` | Scan, classify, and place subtitle tracks |
| `create_seed_links` | Prepare torrent folder so qBittorrent keeps seeding |
| `learn` | Teach Alfred a new pattern (release group, naming convention) |
| `set_path_for_folder` | Configure folder paths |
| `list_folder` | List contents of a configured folder |
| `set_language` | Set preferred language for the session |
## 💬 Usage Examples
### Via Web Interface (LibreChat)
Navigate to **http://localhost:3080** and start chatting:
```
You: Find Inception in 1080p
Alfred: I found 3 torrents for Inception (2010):
1. Inception.2010.1080p.BluRay.x264 (150 seeders) - 2.1 GB
2. Inception.2010.1080p.WEB-DL.x265 (80 seeders) - 1.8 GB
3. Inception.2010.1080p.REMUX (45 seeders) - 25 GB
You: Download the first one
Alfred: ✓ Added to qBittorrent! Download started.
You: Organize the Breaking Bad S01 download
Alfred: ✓ Resolved destination: /tv_shows/Breaking.Bad/Season 01/
✓ Moved 6 episode files
✓ Placed 6 subtitle tracks (fr, en)
✓ Seed links created in /torrents/
```
### Via API
```bash
# Health check
curl http://localhost:8000/health
# Chat (OpenAI-compatible)
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "alfred",
"messages": [{"role": "user", "content": "Find The Matrix 4K"}]
}'
# List models
curl http://localhost:8000/v1/models
# View memory state
curl http://localhost:8000/memory/state
```
Alfred is compatible with any OpenAI-compatible client. Point it at `http://localhost:8000/v1`, model `alfred`.
## 🧠 Memory System
Alfred uses a three-tier memory system:
| Tier | Storage | Contents | Lifetime |
|------|---------|----------|----------|
| **LTM** | JSON file (`data/memory/ltm.json`) | Config, library, watchlist, learned patterns | Permanent |
| **STM** | RAM | Conversation history (capped) | Session |
| **Episodic** | RAM | Search results, active downloads, errors | Short-lived |
## 🧪 Development
### Running Tests
```bash
# Run full suite (parallel)
make test
# Run with coverage report
make coverage
# Run a single file
uv run pytest tests/test_agent.py -v
# Run a single class
uv run pytest tests/test_agent.py::TestAgentInit -v
# Skip slow tests
uv run pytest -m "not slow"
```
### Test coverage
The suite covers:
- **Agent loop** — tool execution, history, max iterations, error handling
- **Tool registry** — OpenAI schema format, parameter extraction
- **Prompts** — system prompt building, tool inclusion
- **Memory** — LTM/STM/Episodic operations, persistence
- **Filesystem tools** — path traversal security, folder listing
- **File manager** — hard-link, move, seed links (real filesystem, no mocks)
- **Application use cases** — `resolve_destination`, `create_seed_links`, `list_folder`, `move_media`
- **Domain** — TV show/movie entities, shared value objects (`ImdbId`, `FilePath`, `FileSize`), subtitle scanner
- **Repositories** — JSON-backed movie, TV show, subtitle repos
- **Bootstrap** — secret generation, idempotency, URI construction
- **Workflows** — YAML loading, structure validation
- **Configuration** — boundary validation for all settings
### Code Quality
```bash
make lint # Ruff check --fix
make format # Ruff format + check --fix
```
### Adding a New Tool
1. Implement the function in `alfred/agent/tools/`:
```python
# alfred/agent/tools/api.py
def my_new_tool(param: str) -> dict[str, Any]:
"""Short description shown to the LLM to decide when to call this tool."""
memory = get_memory()
# ...
return {"status": "ok", "data": result}
```
2. Register it in `alfred/agent/registry.py`:
```python
tool_functions = [
# ... existing tools ...
api_tools.my_new_tool,
]
```
The registry auto-generates the JSON schema from the function signature and docstring.
### Adding a Workflow
Create a YAML file in `alfred/agent/workflows/`:
```yaml
name: my_workflow
description: What this workflow does
steps:
- tool: resolve_destination
description: Find where the file should go
- tool: move_media
description: Move the file
```
Workflows are loaded automatically at startup.
### Version Management
```bash
# Must be on main branch
make patch # 0.1.7 → 0.1.8
make minor # 0.1.7 → 0.2.0
make major # 0.1.7 → 1.0.0
```
## 📚 API Reference
### Endpoints
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/health` | Health check |
| `GET` | `/v1/models` | List models (OpenAI-compatible) |
| `POST` | `/v1/chat/completions` | Chat (OpenAI-compatible, streaming supported) |
| `GET` | `/memory/state` | Full memory dump (debug) |
| `POST` | `/memory/clear-session` | Clear STM + Episodic |
| `GET` | `/memory/episodic/search-results` | Current search results |
## 🔧 Troubleshooting
### Agent doesn't respond
1. Check API keys in `.env`
2. Verify the LLM is running:
```bash
docker logs alfred-ollama
docker exec alfred-ollama ollama list
```
3. Check Alfred logs: `docker logs alfred-core`
### qBittorrent connection failed
1. Verify qBittorrent is running: `docker ps | grep qbittorrent`
2. Check credentials in `.env` (`QBITTORRENT_URL`, `QBITTORRENT_USERNAME`, `QBITTORRENT_PASSWORD`)
### Memory not persisting
1. Check `data/` directory is writable
2. Verify volume mounts in `docker-compose.yaml`
### Bootstrap fails
```bash
make validate # Check what's wrong with .env
make bootstrap # Regenerate (preserves existing secrets)
```
### Tests failing
```bash
uv run pytest tests/test_failing.py -v --tb=long
```
## 🤝 Contributing
1. Fork the repository
2. Create a feature branch: `git checkout -b feat/my-feature`
3. Make your changes + add tests
4. Run `make test && make lint && make format`
5. Commit with [Conventional Commits](https://www.conventionalcommits.org/): `feat:`, `fix:`, `docs:`, `refactor:`, `test:`, `chore:`, `infra:`
6. Open a Pull Request
## 📄 License
MIT License — see [LICENSE](LICENSE) file for details.
## 🙏 Acknowledgments
- [LibreChat](https://github.com/danny-avila/LibreChat) — Chat interface
- [Ollama](https://ollama.ai/) — Local LLM runtime
- [DeepSeek](https://www.deepseek.com/) — LLM provider
- [TMDB](https://www.themoviedb.org/) — Movie & TV database
- [qBittorrent](https://www.qbittorrent.org/) — Torrent client
- [FastAPI](https://fastapi.tiangolo.com/) — Web framework
- [uv](https://github.com/astral-sh/uv) — Fast Python package manager
---
<p align="center">Made with ❤️ by <a href="https://github.com/francwa">Francwa</a></p>
View File
@@ -1,6 +1,7 @@
"""Agent module for media library management.""" """Agent module for media library management."""
from alfred.settings import settings
from .agent import Agent from .agent import Agent
from .config import settings
__all__ = ["Agent", "settings"] __all__ = ["Agent", "settings"]
+153 -13
View File
@@ -3,13 +3,16 @@
import json import json
import logging import logging
from collections.abc import AsyncGenerator from collections.abc import AsyncGenerator
from pathlib import Path
from typing import Any from typing import Any
from infrastructure.persistence import get_memory from alfred.infrastructure.metadata_TO_CHECK import MetadataStore
from alfred.infrastructure.persistence_TO_CHECK import get_memory
from alfred.settings import settings
from .config import settings from .prompt import PromptBuilder
from .prompts import PromptBuilder
from .registry import Tool, make_tools from .registry import Tool, make_tools
from .workflows_TO_CHECK import WorkflowLoader
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
@@ -21,17 +24,20 @@ class Agent:
Uses OpenAI-compatible tool calling API. Uses OpenAI-compatible tool calling API.
""" """
def __init__(self, llm, max_tool_iterations: int = 5): def __init__(self, settings, llm, max_tool_iterations: int = 5):
""" """
Initialize the agent. Initialize the agent.
Args: Args:
settings: Application settings instance
llm: LLM client with complete() method llm: LLM client with complete() method
max_tool_iterations: Maximum number of tool execution iterations max_tool_iterations: Maximum number of tool execution iterations
""" """
self.settings = settings
self.llm = llm self.llm = llm
self.tools: dict[str, Tool] = make_tools() self.tools: dict[str, Tool] = make_tools(settings)
self.prompt_builder = PromptBuilder(self.tools) self.workflow_loader = WorkflowLoader()
self.prompt_builder = PromptBuilder(self.tools, self.workflow_loader)
self.max_tool_iterations = max_tool_iterations self.max_tool_iterations = max_tool_iterations
def step(self, user_input: str) -> str: def step(self, user_input: str) -> str:
@@ -78,7 +84,7 @@ class Agent:
tools_spec = self.prompt_builder.build_tools_spec() tools_spec = self.prompt_builder.build_tools_spec()
# Tool execution loop # Tool execution loop
for _iteration in range(self.max_tool_iterations): for _iteration in range(self.settings.max_tool_iterations):
# Call LLM with tools # Call LLM with tools
llm_result = self.llm.complete(messages, tools=tools_spec) llm_result = self.llm.complete(messages, tools=tools_spec)
@@ -136,7 +142,7 @@ class Agent:
memory.save() memory.save()
return final_response return final_response
def _execute_tool_call(self, tool_call: dict[str, Any]) -> dict[str, Any]: def _execute_tool_call(self, tool_call: dict[str, Any]) -> dict[str, Any]: # noqa: PLR0911
""" """
Execute a single tool call. Execute a single tool call.
@@ -165,29 +171,163 @@ class Agent:
"available_tools": available, "available_tools": available,
} }
# Defensive: reject calls to tools that are not currently in scope.
visible = set(self.prompt_builder.visible_tool_names())
if tool_name not in visible:
return {
"error": "tool_out_of_scope",
"message": (
f"Tool '{tool_name}' is not available in the current "
"workflow scope. Call end_workflow first or start the "
"appropriate workflow."
),
"available_tools": sorted(visible),
}
tool = self.tools[tool_name] tool = self.tools[tool_name]
memory = get_memory()
# Cache lookup — for tools flagged cacheable, short-circuit on hit.
cache_key_value = self._cache_key_for(tool, args)
if cache_key_value is not None:
cached = memory.stm.tool_results.get(tool_name, cache_key_value)
if cached is not None:
logger.info(f"Tool cache HIT: {tool_name}[{cache_key_value}]")
self._post_tool_side_effects(tool_name, args, cached, from_cache=True)
return {**cached, "_from_cache": True}
# Execute tool # Execute tool
try: try:
result = tool.func(**args) result = tool.func(**args)
return result
except KeyboardInterrupt: except KeyboardInterrupt:
# Don't catch KeyboardInterrupt - let it propagate # Don't catch KeyboardInterrupt - let it propagate
raise raise
except TypeError as e: except TypeError as e:
# Bad arguments # Bad arguments
memory = get_memory()
memory.episodic.add_error(tool_name, f"bad_args: {e}") memory.episodic.add_error(tool_name, f"bad_args: {e}")
return {"error": "bad_args", "message": str(e), "tool": tool_name} return {"error": "bad_args", "message": str(e), "tool": tool_name}
except Exception as e: except Exception as e:
# Other errors # Other errors
memory = get_memory()
memory.episodic.add_error(tool_name, str(e)) memory.episodic.add_error(tool_name, str(e))
return {"error": "execution_failed", "message": str(e), "tool": tool_name} return {"error": "execution_failed", "message": str(e), "tool": tool_name}
# Persist + side effects only on successful results.
if isinstance(result, dict) and result.get("status") == "ok":
if cache_key_value is not None:
memory.stm.tool_results.put(tool_name, cache_key_value, result)
self._post_tool_side_effects(tool_name, args, result, from_cache=False)
memory.save()
return result
@staticmethod
def _cache_key_for(tool: Tool, args: dict[str, Any]) -> str | None:
"""Return the cache key value for this call, or None if not cacheable."""
if tool.cache_key is None:
return None
value = args.get(tool.cache_key)
if value is None:
return None
return str(value)
def _post_tool_side_effects(
self,
tool_name: str,
args: dict[str, Any],
result: dict[str, Any],
*,
from_cache: bool,
) -> None:
"""
Tool-agnostic side effects applied after a successful run or cache hit.
Today:
- Update release_focus when a path-keyed inspector runs.
- Persist inspector results into the release's `.alfred/metadata.yaml`.
- Refresh episodic.last_search_results on find_torrent cache hits so
get_torrent_by_index keeps pointing at the right list.
"""
memory = get_memory()
tool = self.tools.get(tool_name)
# Release focus: any path-keyed inspector updates current_release_path.
if tool is not None and tool.cache_key in {"source_path"}:
path = args.get(tool.cache_key)
if isinstance(path, str) and path:
memory.stm.release_focus.focus(path)
# Persist inspector results to .alfred/metadata.yaml (skip on cache
# hit — the file is already up to date from the original run).
if not from_cache:
self._maybe_update_alfred(tool_name, args, result)
# Episodic refresh when find_torrent's cache short-circuits the call.
if from_cache and tool_name == "find_torrent":
torrents = result.get("torrents") or []
query = args.get("media_title") or ""
memory.episodic.store_search_results(
query=query, results=torrents, search_type="torrent"
)
def _maybe_update_alfred(
self,
tool_name: str,
args: dict[str, Any],
result: dict[str, Any],
) -> None:
"""
Persist a successful inspector result into the release's
`.alfred/metadata.yaml`. No-op when the release root can't be resolved.
"""
if tool_name not in {"analyze_release", "probe_media", "find_media_imdb_id"}:
return
release_root = self._resolve_release_root(tool_name, args)
if release_root is None:
return
try:
store = MetadataStore(release_root)
if tool_name == "analyze_release":
store.update_parse(result)
elif tool_name == "probe_media":
store.update_probe(result)
elif tool_name == "find_media_imdb_id":
store.update_tmdb(result)
except Exception as e:
logger.warning(
f"Failed to update .alfred for {tool_name} at {release_root}: {e}"
)
@staticmethod
def _resolve_release_root(
tool_name: str,
args: dict[str, Any],
) -> Path | None:
"""
Figure out which release folder owns this call.
- analyze_release / probe_media: derived from source_path
(folder kept as-is, file walked up to its parent).
- find_media_imdb_id: follow the current release focus in STM.
"""
if tool_name in {"analyze_release", "probe_media"}:
raw = args.get("source_path")
if not isinstance(raw, str) or not raw:
return None
path = Path(raw)
return path if path.is_dir() else path.parent
# find_media_imdb_id has no path arg — rely on release focus.
focus = get_memory().stm.release_focus.current_release_path
if not focus:
return None
path = Path(focus)
return path if path.is_dir() else path.parent
async def step_streaming( async def step_streaming(
self, user_input: str, completion_id: str, created_ts: int, model: str self, user_input: str, completion_id: str, created_ts: int, model: str
) -> AsyncGenerator[dict[str, Any], None]: ) -> AsyncGenerator[dict[str, Any]]:
""" """
Execute agent step with streaming support for LibreChat. Execute agent step with streaming support for LibreChat.
@@ -230,7 +370,7 @@ class Agent:
tools_spec = self.prompt_builder.build_tools_spec() tools_spec = self.prompt_builder.build_tools_spec()
# Tool execution loop # Tool execution loop
for _iteration in range(self.max_tool_iterations): for _iteration in range(self.settings.max_tool_iterations):
# Call LLM with tools # Call LLM with tools
llm_result = self.llm.complete(messages, tools=tools_spec) llm_result = self.llm.complete(messages, tools=tools_spec)
+79
View File
@@ -0,0 +1,79 @@
"""Expression loader — charge et merge les fichiers YAML d'expressions par user."""
import random
from pathlib import Path
import yaml
_USERS_DIR = Path(__file__).parent.parent / "knowledge" / "users"
def _load_yaml(path: Path) -> dict:
if not path.exists():
return {}
return yaml.safe_load(path.read_text(encoding="utf-8")) or {}
def load_expressions(username: str | None) -> dict:
"""
Charge common.yaml et le merge avec {username}.yaml.
Retourne un dict avec :
- nickname: str (surnom de l'user, ou username en fallback)
- expressions: dict[situation -> list[str]]
"""
common = _load_yaml(_USERS_DIR / "common.yaml")
user_data = _load_yaml(_USERS_DIR / f"{username}.yaml") if username else {}
# Merge expressions : common + user (les phrases user s'ajoutent)
common_exprs: dict[str, list] = common.get("expressions", {})
user_exprs: dict[str, list] = user_data.get("expressions", {})
merged: dict[str, list] = {}
all_situations = set(common_exprs) | set(user_exprs)
for situation in all_situations:
base = list(common_exprs.get(situation, []))
extra = list(user_exprs.get(situation, []))
merged[situation] = base + extra
nickname = user_data.get("user", {}).get("nickname") or username or "mec"
return {
"nickname": nickname,
"expressions": merged,
}
def pick(expressions: dict, situation: str, nickname: str | None = None) -> str:
"""
Pioche une expression aléatoire pour une situation donnée.
Résout {user} avec le nickname si fourni.
Retourne une string vide si la situation n'existe pas.
"""
options = expressions.get("expressions", {}).get(situation, [])
if not options:
return ""
chosen = random.choice(options)
if nickname:
chosen = chosen.replace("{user}", nickname)
return chosen
def build_expressions_context(username: str | None) -> dict:
"""
Point d'entrée principal.
Retourne :
- nickname: str
- samples: dict[situation -> une phrase résolue] — une seule par situation
"""
data = load_expressions(username)
nickname = data["nickname"]
samples = {
situation: pick(data, situation, nickname) for situation in data["expressions"]
}
return {
"nickname": nickname,
"samples": samples,
}
@@ -6,7 +6,9 @@ from typing import Any
import requests import requests
from requests.exceptions import HTTPError, RequestException, Timeout from requests.exceptions import HTTPError, RequestException, Timeout
from ..config import settings from alfred.settings import Settings
from alfred.settings import settings as default_settings
from .exceptions import LLMAPIError, LLMConfigurationError from .exceptions import LLMAPIError, LLMConfigurationError
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
@@ -21,6 +23,7 @@ class DeepSeekClient:
base_url: str | None = None, base_url: str | None = None,
model: str | None = None, model: str | None = None,
timeout: int | None = None, timeout: int | None = None,
settings: Settings | None = None,
): ):
""" """
Initialize DeepSeek client. Initialize DeepSeek client.
@@ -34,10 +37,11 @@ class DeepSeekClient:
Raises: Raises:
LLMConfigurationError: If API key is missing LLMConfigurationError: If API key is missing
""" """
self.api_key = api_key or settings.deepseek_api_key self.settings = settings or default_settings
self.base_url = base_url or settings.deepseek_base_url self.api_key = api_key or self.settings.deepseek_api_key
self.model = model or settings.model self.base_url = base_url or self.settings.deepseek_base_url
self.timeout = timeout or settings.request_timeout self.model = model or self.settings.deepseek_model
self.timeout = timeout or self.settings.request_timeout
if not self.api_key: if not self.api_key:
raise LLMConfigurationError( raise LLMConfigurationError(
@@ -94,7 +98,7 @@ class DeepSeekClient:
payload = { payload = {
"model": self.model, "model": self.model,
"messages": messages, "messages": messages,
"temperature": settings.temperature, "temperature": self.settings.llm_temperature,
} }
# Add tools if provided # Add tools if provided
@@ -1,13 +1,14 @@
"""Ollama LLM client with robust error handling.""" """Ollama LLM client with robust error handling."""
import logging import logging
import os
from typing import Any from typing import Any
import requests import requests
from requests.exceptions import HTTPError, RequestException, Timeout from requests.exceptions import HTTPError, RequestException, Timeout
from ..config import settings from alfred.settings import Settings
from alfred.settings import settings as default_settings
from .exceptions import LLMAPIError, LLMConfigurationError from .exceptions import LLMAPIError, LLMConfigurationError
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
@@ -32,6 +33,7 @@ class OllamaClient:
model: str | None = None, model: str | None = None,
timeout: int | None = None, timeout: int | None = None,
temperature: float | None = None, temperature: float | None = None,
settings: Settings | None = None,
): ):
""" """
Initialize Ollama client. Initialize Ollama client.
@@ -45,13 +47,12 @@ class OllamaClient:
Raises: Raises:
LLMConfigurationError: If configuration is invalid LLMConfigurationError: If configuration is invalid
""" """
self.base_url = base_url or os.getenv( self.settings = settings or default_settings
"OLLAMA_BASE_URL", "http://localhost:11434" self.base_url = base_url or self.settings.ollama_base_url
) self.model = model or self.settings.ollama_model
self.model = model or os.getenv("OLLAMA_MODEL", "llama3.2") self.timeout = timeout or self.settings.request_timeout
self.timeout = timeout or settings.request_timeout
self.temperature = ( self.temperature = (
temperature if temperature is not None else settings.temperature temperature if temperature is not None else self.settings.llm_temperature
) )
if not self.base_url: if not self.base_url:
+333
View File
@@ -0,0 +1,333 @@
"""Prompt builder for the agent system."""
import json
from typing import Any
from alfred.infrastructure.persistence_TO_CHECK import get_memory
from alfred.infrastructure.persistence_TO_CHECK.memory import MemoryRegistry
from .expressions import build_expressions_context
from .registry import Tool
from .workflows_TO_CHECK import WorkflowLoader
# Tools that are always available, regardless of workflow scope.
# Kept small on purpose — the noyau is what the agent uses to either
# answer trivially or pivot into a workflow.
CORE_TOOLS: tuple[str, ...] = (
"set_language",
"set_path_for_folder",
"list_folder",
"read_release_metadata",
"query_library",
"start_workflow",
"end_workflow",
)
class PromptBuilder:
"""Builds system prompts for the agent with memory context."""
def __init__(
self,
tools: dict[str, Tool],
workflow_loader: WorkflowLoader | None = None,
):
self.tools = tools
self.workflow_loader = workflow_loader or WorkflowLoader()
self._memory_registry = MemoryRegistry()
def _active_workflow(self, memory) -> dict | None:
"""Return the YAML definition of the active workflow, or None."""
current = memory.stm.workflow.current
if current is None:
return None
return self.workflow_loader.get(current.get("name"))
def visible_tool_names(self) -> list[str]:
"""
Return the names of the tools currently in scope.
- Idle (no workflow): core noyau only. The LLM enters a workflow
via start_workflow to access more tools.
- Workflow active: core noyau + the workflow's declared tools.
"""
memory = get_memory()
visible = set(CORE_TOOLS)
workflow = self._active_workflow(memory)
if workflow is not None:
for name in workflow.get("tools", []):
visible.add(name)
# Only return tools that actually exist in the registry.
return [name for name in self.tools if name in visible]
def _format_identity(self, memory) -> str:
"""Build Alfred's identity and personality section."""
username = memory.stm.get_entity("username")
expr = build_expressions_context(username)
nickname = expr["nickname"]
samples = expr["samples"]
# Format expressions as situational guidance for the LLM
expr_lines = []
situation_labels = {
"greeting": "Salutation",
"success": "Succès",
"working": "En cours",
"error": "Erreur",
"unclear": "Demande floue",
"warning": "Avertissement",
"not_found": "Introuvable",
}
for situation, label in situation_labels.items():
phrase = samples.get(situation, "")
if phrase:
expr_lines.append(f' {label}: "{phrase}"')
expressions_block = "\n".join(expr_lines)
return f"""Tu t'appelles Alfred. Tu es un assistant d'organisation de médiathèque — direct, opérationnel, légèrement impertinent.
PERSONNALITÉ:
- Tu parles franglais : français avec des touches d'anglais quand ça colle mieux
- Tu es sarcastique et n'as pas peur de te moquer si l'user fait une connerie
- Tu envoies chier poliment (mais clairement) quand la demande est trop floue
- Tu ne fais pas de blabla inutile. Si "ok c'est fait" suffit, c'est tout ce que tu dis
- Tu peux jurer (putain, merde, con, ...) — c'est naturel, pas du remplissage
- Jamais de "Great question!" ou de politesse creuse
USER COURANT: {nickname}
EXPRESSIONS À UTILISER (une par situation, naturellement intégrées dans ta réponse) :
{expressions_block}"""
def build_tools_spec(self) -> list[dict[str, Any]]:
"""Build the tool specification for the LLM API (scope-filtered)."""
visible = set(self.visible_tool_names())
tool_specs = []
for tool in self.tools.values():
if tool.name not in visible:
continue
spec = {
"type": "function",
"function": {
"name": tool.name,
"description": tool.description,
"parameters": tool.parameters,
},
}
tool_specs.append(spec)
return tool_specs
def _format_tools_description(self) -> str:
"""Format the currently-visible tools with description + params."""
visible = set(self.visible_tool_names())
visible_tools = [t for t in self.tools.values() if t.name in visible]
if not visible_tools:
return ""
return "\n".join(
f"- {tool.name}: {tool.description}\n"
f" Parameters: {json.dumps(tool.parameters, ensure_ascii=False)}"
for tool in visible_tools
)
def _format_workflow_scope(self, memory) -> str:
"""Describe the current workflow scope so the LLM has a plan."""
workflow = self._active_workflow(memory)
if workflow is None:
available = self.workflow_loader.names()
if not available:
return ""
lines = ["WORKFLOW SCOPE: idle (broad catalog narrowed to core noyau)."]
lines.append(
" Call start_workflow(workflow_name, params) to enter a scope."
)
lines.append(" Available workflows:")
for name in available:
wf = self.workflow_loader.get(name) or {}
desc = (wf.get("description") or "").strip().splitlines()
summary = desc[0] if desc else ""
lines.append(f" - {name}: {summary}")
return "\n".join(lines)
current = memory.stm.workflow.current or {}
lines = [
f"WORKFLOW SCOPE: active — {current.get('name')} "
f"(stage: {current.get('stage')})",
]
params = current.get("params")
if params:
lines.append(f" Params: {params}")
wf_desc = (workflow.get("description") or "").strip()
if wf_desc:
lines.append(f" Goal: {wf_desc}")
steps = workflow.get("steps", [])
if steps:
lines.append(" Steps:")
for step in steps:
step_id = step.get("id", "?")
step_tool = step.get("tool") or (
"ask_user" if step.get("ask_user") else ""
)
lines.append(f" - {step_id} ({step_tool})")
lines.append(" Call end_workflow(reason) when done, cancelled, or off-topic.")
return "\n".join(lines)
def _format_episodic_context(self, memory) -> str:
"""Format episodic memory context for the prompt."""
lines = []
if memory.episodic.last_search_results:
results = memory.episodic.last_search_results
result_list = results.get("results", [])
lines.append(
f"\nLAST SEARCH: '{results.get('query')}' ({len(result_list)} results)"
)
# Show first 5 results
for i, result in enumerate(result_list[:5]):
name = result.get("name", "Unknown")
lines.append(f" {i + 1}. {name}")
if len(result_list) > 5:
lines.append(f" ... and {len(result_list) - 5} more")
if memory.episodic.pending_question:
question = memory.episodic.pending_question
lines.append(f"\nPENDING QUESTION: {question.get('question')}")
lines.append(f" Type: {question.get('type')}")
if question.get("options"):
lines.append(f" Options: {len(question.get('options'))}")
if memory.episodic.active_downloads:
lines.append(f"\nACTIVE DOWNLOADS: {len(memory.episodic.active_downloads)}")
for dl in memory.episodic.active_downloads[:3]:
lines.append(f" - {dl.get('name')}: {dl.get('progress', 0)}%")
if memory.episodic.recent_errors:
lines.append("\nRECENT ERRORS (up to 3):")
for error in memory.episodic.recent_errors[-3:]:
lines.append(
f" - Action '{error.get('action')}' failed: {error.get('error')}"
)
# Unread events
unread = [e for e in memory.episodic.background_events if not e.get("read")]
if unread:
lines.append(f"\nUNREAD EVENTS: {len(unread)}")
for event in unread[:3]:
lines.append(f" - {event.get('type')}: {event.get('data')}")
return "\n".join(lines)
def _format_stm_context(self, memory) -> str:
"""Format short-term memory context for the prompt."""
lines = []
if memory.stm.current_workflow:
workflow = memory.stm.current_workflow
lines.append(
f"CURRENT WORKFLOW: {workflow.get('name')} (stage: {workflow.get('stage')})"
)
if workflow.get("params"):
lines.append(f" Params: {workflow.get('params')}")
if memory.stm.current_topic:
lines.append(f"CURRENT TOPIC: {memory.stm.current_topic}")
if memory.stm.extracted_entities:
lines.append("EXTRACTED ENTITIES:")
for key, value in memory.stm.extracted_entities.items():
lines.append(f" - {key}: {value}")
if memory.stm.language:
lines.append(f"CONVERSATION LANGUAGE: {memory.stm.language}")
return "\n".join(lines)
def _format_memory_schema(self) -> str:
"""Describe available memory components so the agent knows what to read/write and when."""
schema = self._memory_registry.schema()
tier_labels = {
"ltm": "LONG-TERM (persisted)",
"stm": "SHORT-TERM (session)",
"episodic": "EPISODIC (volatile)",
}
lines = ["MEMORY COMPONENTS:"]
for tier, components in schema.items():
if not components:
continue
lines.append(f"\n [{tier_labels.get(tier, tier.upper())}]")
for c in components:
access = c.get("access", "read")
lines.append(f" {c['name']} ({access}): {c['description']}")
for field_name, field_desc in c.get("fields", {}).items():
lines.append(f" · {field_name}: {field_desc}")
return "\n".join(lines)
def _format_config_context(self, memory) -> str:
"""Format configuration context."""
lines = ["CURRENT CONFIGURATION:"]
folders = {
**memory.ltm.workspace.as_dict(),
**memory.ltm.library_paths.to_dict(),
}
if folders:
for key, value in folders.items():
lines.append(f" - {key}: {value}")
else:
lines.append(" (no configuration set)")
return "\n".join(lines)
def build_system_prompt(self) -> str:
"""Build the complete system prompt."""
memory = get_memory()
# Identity + personality
identity = self._format_identity(memory)
# Language instruction
language_instruction = (
"Si la langue de l'user est différente de la langue courante en STM, "
"appelle `set_language` en premier avant de répondre."
)
# Configuration
config_section = self._format_config_context(memory)
# STM context
stm_context = self._format_stm_context(memory)
# Episodic context
episodic_context = self._format_episodic_context(memory)
# Memory schema
memory_schema = self._format_memory_schema()
# Workflow scope (active workflow plan or list of options)
workflow_section = self._format_workflow_scope(memory)
# Available tools (already filtered by scope)
tools_desc = self._format_tools_description()
tools_section = f"\nOUTILS DISPONIBLES:\n{tools_desc}" if tools_desc else ""
rules = """
RÈGLES:
- Utilise les outils pour accomplir les tâches, pas pour décorer
- Si des résultats de recherche sont dispo en mémoire épisodique, référence-les par index
- Confirme toujours avant une opération destructive (move, delete, overwrite)
- Réponses courtes — si c'est fait, dis-le en une ligne
- Si la demande est floue, demande un éclaircissement AVANT de lancer quoi que ce soit
"""
sections = [
identity,
language_instruction,
config_section,
stm_context,
episodic_context,
memory_schema,
workflow_section,
tools_section,
rules,
]
return "\n\n".join(s for s in sections if s and s.strip())
+178
View File
@@ -0,0 +1,178 @@
"""Tool registry — defines and registers all available tools for the agent."""
import inspect
import logging
from collections.abc import Callable
from dataclasses import dataclass
from typing import Any
from .tools_TO_CHECK.spec import ToolSpec, ToolSpecError
from .tools_TO_CHECK.spec_loader import load_tool_specs
logger = logging.getLogger(__name__)
@dataclass
class Tool:
"""Represents a tool that can be used by the agent."""
name: str
description: str
func: Callable[..., dict[str, Any]]
parameters: dict[str, Any]
cache_key: str | None = None # Parameter name to use as STM cache key.
_PY_TYPE_TO_JSON = {
str: "string",
int: "integer",
float: "number",
bool: "boolean",
list: "array",
dict: "object",
}
def _json_type_for(annotation) -> str:
"""Map a Python type annotation to a JSON Schema 'type' string."""
if annotation is inspect.Parameter.empty:
return "string"
# Strip Optional[X] / X | None to X.
args = getattr(annotation, "__args__", None)
if args:
non_none = [a for a in args if a is not type(None)]
if len(non_none) == 1:
annotation = non_none[0]
return _PY_TYPE_TO_JSON.get(annotation, "string")
def _create_tool_from_function(func: Callable, spec: ToolSpec | None = None) -> Tool:
"""
Create a Tool object from a function, optionally enriched with a spec.
Types and required-ness always come from the Python signature (source of
truth for the API contract). When a spec is provided, the description
and per-parameter docs come from the YAML spec instead of the docstring.
"""
sig = inspect.signature(func)
sig_params = {name: p for name, p in sig.parameters.items() if name != "self"}
if spec is not None:
_validate_spec_matches_signature(func.__name__, sig_params, spec)
description = spec.compile_description()
param_descriptions = {
name: spec.compile_parameter_description(name) for name in sig_params
}
else:
doc = inspect.getdoc(func)
description = doc.strip().split("\n")[0] if doc else func.__name__
param_descriptions = {name: f"Parameter {name}" for name in sig_params}
properties: dict[str, dict[str, Any]] = {}
required: list[str] = []
for param_name, param in sig_params.items():
properties[param_name] = {
"type": _json_type_for(param.annotation),
"description": param_descriptions[param_name],
}
if param.default is inspect.Parameter.empty:
required.append(param_name)
parameters = {
"type": "object",
"properties": properties,
"required": required,
}
cache_key = spec.cache.key if spec is not None and spec.cache is not None else None
return Tool(
name=func.__name__,
description=description,
func=func,
parameters=parameters,
cache_key=cache_key,
)
def _validate_spec_matches_signature(
func_name: str,
sig_params: dict[str, inspect.Parameter],
spec: ToolSpec,
) -> None:
"""Ensure every signature param has a spec entry and vice versa."""
sig_names = set(sig_params.keys())
spec_names = set(spec.parameters.keys())
missing_in_spec = sig_names - spec_names
if missing_in_spec:
raise ToolSpecError(
f"tool '{func_name}': spec is missing entries for parameter(s) "
f"{sorted(missing_in_spec)}"
)
extra_in_spec = spec_names - sig_names
if extra_in_spec:
raise ToolSpecError(
f"tool '{func_name}': spec has entries for unknown parameter(s) "
f"{sorted(extra_in_spec)} (not in function signature)"
)
def make_tools(settings) -> dict[str, Tool]:
"""
Create and register all available tools.
Args:
settings: Application settings instance.
Returns:
Dictionary mapping tool names to Tool objects.
"""
from .tools_TO_CHECK import api as api_tools # noqa: PLC0415
from .tools_TO_CHECK import filesystem as fs_tools # noqa: PLC0415
from .tools_TO_CHECK import language as lang_tools # noqa: PLC0415
from .tools_TO_CHECK import workflow as wf_tools # noqa: PLC0415
tool_functions = [
fs_tools.set_path_for_folder,
fs_tools.list_folder,
fs_tools.read_release_metadata,
fs_tools.query_library,
fs_tools.analyze_release,
fs_tools.probe_media,
fs_tools.resolve_season_destination,
fs_tools.resolve_episode_destination,
fs_tools.resolve_movie_destination,
fs_tools.resolve_series_destination,
fs_tools.move_media,
fs_tools.move_to_destination,
fs_tools.manage_subtitles,
fs_tools.create_seed_links,
fs_tools.learn,
api_tools.find_media_imdb_id,
api_tools.find_torrent,
api_tools.add_torrent_by_index,
api_tools.add_torrent_to_qbittorrent,
api_tools.get_torrent_by_index,
lang_tools.set_language,
wf_tools.start_workflow,
wf_tools.end_workflow,
]
specs = load_tool_specs()
tools: dict[str, Tool] = {}
for func in tool_functions:
spec = specs.get(func.__name__)
tool = _create_tool_from_function(func, spec=spec)
tools[tool.name] = tool
with_spec = sum(1 for fn in tool_functions if fn.__name__ in specs)
logger.info(
f"Registered {len(tools)} tools "
f"({with_spec} with YAML spec, {len(tools) - with_spec} doc-only): "
f"{list(tools.keys())}"
)
return tools
+23
View File
@@ -0,0 +1,23 @@
"""Tools module — agent-exposed wrappers.
Re-exports are intentionally minimal during the ``unfuck`` refactor.
Tool wiring (registry / specs / LLM-facing surface) is the last
chunk of work on this branch; until then, importers should reach
into the submodules directly (``alfred.agent.tools.filesystem``, …).
"""
from .api import (
add_torrent_by_index,
add_torrent_to_qbittorrent,
find_torrent,
get_torrent_by_index,
)
from .language import set_language
__all__ = [
"find_torrent",
"get_torrent_by_index",
"add_torrent_to_qbittorrent",
"add_torrent_by_index",
"set_language",
]
@@ -3,60 +3,53 @@
import logging import logging
from typing import Any from typing import Any
from application.movies import SearchMovieUseCase from alfred.application.movies_TO_CHECK import SearchMovieUseCase
from application.torrents import AddTorrentUseCase, SearchTorrentsUseCase from alfred.application.torrents_TO_CHECK import AddTorrentUseCase, SearchTorrentsUseCase
from infrastructure.api.knaben import knaben_client from alfred.application.tv_shows_TO_CHECK import SearchShowUseCase
from infrastructure.api.qbittorrent import qbittorrent_client from alfred.infrastructure.api_TO_CHECK.knaben import knaben_client
from infrastructure.api.tmdb import tmdb_client from alfred.infrastructure.api_TO_CHECK.qbittorrent import qbittorrent_client
from infrastructure.persistence import get_memory from alfred.infrastructure.api_TO_CHECK.tmdb import tmdb_client
from alfred.infrastructure.persistence_TO_CHECK import get_memory
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
def find_media_imdb_id(media_title: str) -> dict[str, Any]: def search_movies(media_title: str) -> dict[str, Any]:
""" """Thin tool wrapper — semantics live in alfred/agent/tools/specs/search_movies.yaml."""
Find the IMDb ID for a given media title using TMDB API.
Args:
media_title: Title of the media to search for.
Returns:
Dict with IMDb ID and media info, or error details.
"""
use_case = SearchMovieUseCase(tmdb_client) use_case = SearchMovieUseCase(tmdb_client)
response = use_case.execute(media_title) response = use_case.execute(media_title)
result = response.to_dict() result = response.to_dict()
if result.get("status") == "ok": if result.get("status") == "ok":
memory = get_memory() memory = get_memory()
memory.stm.set_entity( memory.stm.set_entity("last_movie_search", {"hits": result.get("hits", [])})
"last_media_search", memory.stm.set_topic("searching_movie")
{ logger.debug(
"title": result.get("title"), f"Stored movie search result in STM: {len(result.get('hits', []))} hits"
"imdb_id": result.get("imdb_id"), )
"media_type": result.get("media_type"),
"tmdb_id": result.get("tmdb_id"), return result
},
def search_shows(show_title: str) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/search_shows.yaml."""
use_case = SearchShowUseCase(tmdb_client)
response = use_case.execute(show_title)
result = response.to_dict()
if result.get("status") == "ok":
memory = get_memory()
memory.stm.set_entity("last_show_search", {"hits": result.get("hits", [])})
memory.stm.set_topic("searching_show")
logger.debug(
f"Stored show search result in STM: {len(result.get('hits', []))} hits"
) )
memory.stm.set_topic("searching_media")
logger.debug(f"Stored media search result in STM: {result.get('title')}")
return result return result
def find_torrent(media_title: str) -> dict[str, Any]: def find_torrent(media_title: str) -> dict[str, Any]:
""" """Thin tool wrapper — semantics live in alfred/agent/tools/specs/find_torrent.yaml."""
Find torrents for a given media title using Knaben API.
Results are stored in episodic memory so the user can reference them
by index (e.g., "download the 3rd one").
Args:
media_title: Title of the media to search for.
Returns:
Dict with torrent list or error details.
"""
logger.info(f"Searching torrents for: {media_title}") logger.info(f"Searching torrents for: {media_title}")
use_case = SearchTorrentsUseCase(knaben_client) use_case = SearchTorrentsUseCase(knaben_client)
@@ -76,17 +69,7 @@ def find_torrent(media_title: str) -> dict[str, Any]:
def get_torrent_by_index(index: int) -> dict[str, Any]: def get_torrent_by_index(index: int) -> dict[str, Any]:
""" """Thin tool wrapper — semantics live in alfred/agent/tools/specs/get_torrent_by_index.yaml."""
Get a torrent from the last search results by its index.
Allows the user to reference results by number after a search.
Args:
index: 1-based index of the torrent in the search results.
Returns:
Dict with torrent data or error if not found.
"""
logger.info(f"Getting torrent at index: {index}") logger.info(f"Getting torrent at index: {index}")
memory = get_memory() memory = get_memory()
@@ -113,15 +96,7 @@ def get_torrent_by_index(index: int) -> dict[str, Any]:
def add_torrent_to_qbittorrent(magnet_link: str) -> dict[str, Any]: def add_torrent_to_qbittorrent(magnet_link: str) -> dict[str, Any]:
""" """Thin tool wrapper — semantics live in alfred/agent/tools/specs/add_torrent_to_qbittorrent.yaml."""
Add a torrent to qBittorrent using a magnet link.
Args:
magnet_link: Magnet link of the torrent to add.
Returns:
Dict with success status or error details.
"""
logger.info("Adding torrent to qBittorrent") logger.info("Adding torrent to qBittorrent")
use_case = AddTorrentUseCase(qbittorrent_client) use_case = AddTorrentUseCase(qbittorrent_client)
@@ -157,17 +132,7 @@ def add_torrent_to_qbittorrent(magnet_link: str) -> dict[str, Any]:
def add_torrent_by_index(index: int) -> dict[str, Any]: def add_torrent_by_index(index: int) -> dict[str, Any]:
""" """Thin tool wrapper — semantics live in alfred/agent/tools/specs/add_torrent_by_index.yaml."""
Add a torrent from the last search results by its index.
Combines get_torrent_by_index and add_torrent_to_qbittorrent.
Args:
index: 1-based index of the torrent in the search results.
Returns:
Dict with success status or error details.
"""
logger.info(f"Adding torrent by index: {index}") logger.info(f"Adding torrent by index: {index}")
torrent_result = get_torrent_by_index(index) torrent_result = get_torrent_by_index(index)
+373
View File
@@ -0,0 +1,373 @@
"""Filesystem tools for folder management.
Thin wrappers around the 5 atomic filesystem use cases
(``alfred.application.filesystem``) plus a few self-contained tools
(``analyze_release``, ``probe_media``, ``learn``, …).
Tools removed during the ``unfuck`` filesystem refactor — to be
rewired in a later step:
- ``manage_subtitles`` (depends on the rewritten subtitle services)
- ``set_path_for_folder`` (no replacement use case yet)
- ``create_seed_links`` (flow has changed: hard-link straight to
library, no copy back; will be re-introduced per-file when the
organize-release workflow lands)
- ``resolve_season_destination`` / ``resolve_episode_destination``
/ ``resolve_movie_destination`` / ``resolve_series_destination``
(their use cases moved to ``_OLD`` files pending a rewrite)
"""
from pathlib import Path
from typing import Any
import yaml
import alfred as _alfred_pkg
from alfred.application.filesystem import (
DirectoryRoots,
create_dir_use_case,
list_dir_use_case,
move_file_use_case,
)
from alfred.infrastructure.knowledge_TO_CHECK.release_kb import YamlReleaseKnowledge
from alfred.infrastructure.metadata_TO_CHECK import MetadataStore
from alfred.infrastructure.persistence_TO_CHECK import get_memory
from alfred.infrastructure.probe_TO_CHECK import FfprobeMediaProber
# Agent-tools frontier: this is the legitimate home for the singletons that
# back every LLM-exposed wrapper. The use cases below take ``kb`` / ``prober``
# as required params; tests inject their own stubs.
_KB = YamlReleaseKnowledge()
_PROBER = FfprobeMediaProber()
_LEARNED_ROOT = Path(_alfred_pkg.__file__).parent.parent / "data" / "knowledge"
class _RootsNotConfigured(Exception):
"""Raised when one of the 4 expected roots is missing from memory."""
def __init__(self, missing: list[str]):
super().__init__(f"Roots not configured: {missing}")
self.missing = missing
def _load_directory_roots() -> DirectoryRoots:
"""Build :class:`DirectoryRoots` from the persisted memory.
Reads:
- ``ltm.workspace.download`` → ``downloads``
- ``ltm.workspace.torrent`` → ``torrents``
- ``ltm.library_paths['movies']`` → ``movies``
- ``ltm.library_paths['tv_shows']`` → ``tv_shows``
Raises:
_RootsNotConfigured: if any of the four paths is unset.
"""
memory = get_memory()
downloads = memory.ltm.workspace.download
torrents = memory.ltm.workspace.torrent
movies = memory.ltm.library_paths.get("movies")
tv_shows = memory.ltm.library_paths.get("tv_shows")
missing: list[str] = []
if not downloads:
missing.append("downloads")
if not torrents:
missing.append("torrents")
if not movies:
missing.append("movies")
if not tv_shows:
missing.append("tv_shows")
if missing:
raise _RootsNotConfigured(missing)
return DirectoryRoots(
downloads=Path(downloads),
torrents=Path(torrents),
movies=Path(movies),
tv_shows=Path(tv_shows),
)
def _roots_error(exc: _RootsNotConfigured) -> dict[str, Any]:
return {
"status": "error",
"error": "roots_not_configured",
"message": (
f"Missing roots: {exc.missing}. "
"Configure them via /set_path before using filesystem tools."
),
}
# ---------------------------------------------------------------------------
# 5 atomic filesystem tools — thin wrappers over the use cases.
# ---------------------------------------------------------------------------
def list_folder(path: str) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/list_folder.yaml."""
try:
roots = _load_directory_roots()
except _RootsNotConfigured as e:
return _roots_error(e)
return list_dir_use_case(Path(path), roots).to_dict()
def create_directory(path: str) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/create_directory.yaml."""
try:
roots = _load_directory_roots()
except _RootsNotConfigured as e:
return _roots_error(e)
return create_dir_use_case(Path(path), roots).to_dict()
def move_media(source: str, destination: str) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/move_media.yaml."""
try:
roots = _load_directory_roots()
except _RootsNotConfigured as e:
return _roots_error(e)
return move_file_use_case(Path(source), Path(destination), roots).to_dict()
def move_to_destination(source: str, destination: str) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/move_to_destination.yaml.
Convenience tool that creates the destination's parent directory
if missing, then moves the file. Saves the LLM from having to
chain ``create_directory`` + ``move_media`` explicitly.
"""
try:
roots = _load_directory_roots()
except _RootsNotConfigured as e:
return _roots_error(e)
dst = Path(destination)
mkdir_resp = create_dir_use_case(dst.parent, roots)
if mkdir_resp.status != "ok":
return mkdir_resp.to_dict()
return move_file_use_case(Path(source), dst, roots).to_dict()
# ---------------------------------------------------------------------------
# Self-contained tools — not impacted by the filesystem refactor.
# ---------------------------------------------------------------------------
def learn(pack: str, category: str, key: str, values: list[str]) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/learn.yaml."""
_VALID_PACKS = {"subtitles"}
_VALID_CATEGORIES = {"languages", "types", "formats"}
if pack not in _VALID_PACKS:
return {
"status": "error",
"error": "unknown_pack",
"message": f"Unknown pack '{pack}'. Valid: {sorted(_VALID_PACKS)}",
}
if category not in _VALID_CATEGORIES:
return {
"status": "error",
"error": "unknown_category",
"message": f"Unknown category '{category}'. Valid: {sorted(_VALID_CATEGORIES)}",
}
learned_path = _LEARNED_ROOT / "subtitles_learned.yaml"
_LEARNED_ROOT.mkdir(parents=True, exist_ok=True)
data: dict = {}
if learned_path.exists():
try:
with open(learned_path, encoding="utf-8") as f:
data = yaml.safe_load(f) or {}
except Exception as e:
return {"status": "error", "error": "read_failed", "message": str(e)}
cat_data = data.setdefault(category, {})
entry = cat_data.setdefault(key, {"tokens": []})
existing = entry.get("tokens", [])
new_tokens = [v for v in values if v not in existing]
entry["tokens"] = existing + new_tokens
tmp = learned_path.with_suffix(".yaml.tmp")
try:
with open(tmp, "w", encoding="utf-8") as f:
yaml.safe_dump(
data, f, allow_unicode=True, default_flow_style=False, sort_keys=False
)
tmp.rename(learned_path)
except Exception as e:
tmp.unlink(missing_ok=True)
return {"status": "error", "error": "write_failed", "message": str(e)}
return {
"status": "ok",
"pack": pack,
"category": category,
"key": key,
"added_count": len(new_tokens),
"tokens": entry["tokens"],
}
def analyze_release(release_name: str, source_path: str) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/analyze_release.yaml."""
from alfred.application.release_TO_CHECK import inspect_release # noqa: PLC0415
result = inspect_release(release_name, Path(source_path), _KB, _PROBER)
parsed = result.parsed
return {
"status": "ok",
"media_type": parsed.media_type,
"parse_path": parsed.parse_path,
"title": parsed.title,
"year": parsed.year,
"season": parsed.season,
"episode": parsed.episode,
"episode_end": parsed.episode_end,
"quality": parsed.quality,
"source": parsed.source,
"codec": parsed.codec,
"group": parsed.group,
"languages": parsed.languages,
"audio_codec": parsed.audio_codec,
"audio_channels": parsed.audio_channels,
"bit_depth": parsed.bit_depth,
"hdr_format": parsed.hdr_format,
"edition": parsed.edition,
"site_tag": parsed.site_tag,
"is_season_pack": parsed.is_season_pack,
"probe_used": result.probe_used,
"confidence": result.report.confidence,
"road": result.report.road,
"recommended_action": result.recommended_action,
}
def probe_media(source_path: str) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/probe_media.yaml."""
path = Path(source_path)
if not path.exists():
return {
"status": "error",
"error": "not_found",
"message": f"{source_path} does not exist",
}
media_info = _PROBER.probe(path)
if media_info is None:
return {
"status": "error",
"error": "probe_failed",
"message": "ffprobe failed to read the file",
}
return {
"status": "ok",
"video": {
"codec": media_info.video_codec,
"resolution": media_info.resolution,
"width": media_info.width,
"height": media_info.height,
"duration_seconds": media_info.duration_seconds,
"bitrate_kbps": media_info.bitrate_kbps,
},
"audio_tracks": [
{
"index": t.index,
"codec": t.codec,
"channels": t.channels,
"channel_layout": t.channel_layout,
"language": t.language,
"is_default": t.is_default,
}
for t in media_info.audio_tracks
],
"subtitle_tracks": [
{
"index": t.index,
"codec": t.codec,
"language": t.language,
"is_default": t.is_default,
"is_forced": t.is_forced,
}
for t in media_info.subtitle_tracks
],
"audio_languages": media_info.audio_languages,
"is_multi_audio": media_info.is_multi_audio,
}
def read_release_metadata(release_path: str) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/read_release_metadata.yaml."""
path = Path(release_path)
if not path.exists():
return {
"status": "error",
"error": "not_found",
"message": f"{release_path} does not exist",
}
root = path if path.is_dir() else path.parent
store = MetadataStore(root)
if not store.exists():
return {
"status": "ok",
"release_path": str(root),
"has_metadata": False,
"metadata": {},
}
return {
"status": "ok",
"release_path": str(root),
"has_metadata": True,
"metadata": store.load(),
}
def query_library(name: str) -> dict[str, Any]:
"""Thin tool wrapper — semantics live in alfred/agent/tools/specs/query_library.yaml."""
needle = name.strip().lower()
if not needle:
return {
"status": "error",
"error": "empty_name",
"message": "name must be a non-empty string",
}
memory = get_memory()
roots = memory.ltm.library_paths.to_dict() or {}
if not roots:
return {
"status": "error",
"error": "no_libraries",
"message": "No library paths configured — call set_path_for_folder first.",
}
matches: list[dict[str, Any]] = []
for collection, root in roots.items():
root_path = Path(root)
if not root_path.is_dir():
continue
for entry in root_path.iterdir():
if not entry.is_dir():
continue
if needle not in entry.name.lower():
continue
store = MetadataStore(entry)
matches.append(
{
"collection": collection,
"name": entry.name,
"path": str(entry),
"has_metadata": store.exists(),
}
)
return {
"status": "ok",
"query": name,
"match_count": len(matches),
"matches": matches,
}
@@ -3,21 +3,13 @@
import logging import logging
from typing import Any from typing import Any
from infrastructure.persistence import get_memory from alfred.infrastructure.persistence_TO_CHECK import get_memory
logger = logging.getLogger(__name__) logger = logging.getLogger(__name__)
def set_language(language: str) -> dict[str, Any]: def set_language(language: str) -> dict[str, Any]:
""" """Thin tool wrapper — semantics live in alfred/agent/tools/specs/set_language.yaml."""
Set the conversation language.
Args:
language: Language code (e.g., 'en', 'fr', 'es', 'de')
Returns:
Status dictionary
"""
try: try:
memory = get_memory() memory = get_memory()
memory.stm.set_language(language) memory.stm.set_language(language)
+221
View File
@@ -0,0 +1,221 @@
"""
ToolSpec — semantic description of a tool, loaded from YAML.
Each tool exposed to the agent has a matching YAML spec under
alfred/agent/tools/specs/{tool_name}.yaml. The spec carries everything the
LLM needs to decide *when* and *why* to call the tool — separated from the
Python signature, which remains the source of truth for *how* (types,
required-ness).
The YAML structure is documented in the dataclasses below. Loading a spec
validates its shape; missing or unexpected fields raise ToolSpecError.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from pathlib import Path
import yaml
class ToolSpecError(ValueError):
"""Raised when a YAML tool spec is malformed or inconsistent."""
@dataclass(frozen=True)
class ParameterSpec:
"""Semantic description of a single tool parameter."""
description: str # Short: what the value represents.
why_needed: str # Why the tool needs this — drives LLM reasoning.
example: str | None = None # Concrete example value, shown to the LLM.
@classmethod
def from_dict(cls, name: str, data: dict) -> ParameterSpec:
_require(data, "description", f"parameter '{name}'")
_require(data, "why_needed", f"parameter '{name}'")
return cls(
description=str(data["description"]).strip(),
why_needed=str(data["why_needed"]).strip(),
example=str(data["example"]).strip()
if data.get("example") is not None
else None,
)
@dataclass(frozen=True)
class ReturnsSpec:
"""Description of one possible return shape (ok / needs_clarification / error / ...)."""
description: str
fields: dict[str, str] = field(default_factory=dict)
@classmethod
def from_dict(cls, key: str, data: dict) -> ReturnsSpec:
_require(data, "description", f"returns.{key}")
fields = data.get("fields") or {}
if not isinstance(fields, dict):
raise ToolSpecError(
f"returns.{key}.fields must be a dict, got {type(fields).__name__}"
)
return cls(
description=str(data["description"]).strip(),
fields={str(k): str(v).strip() for k, v in fields.items()},
)
@dataclass(frozen=True)
class CacheSpec:
"""Marks a tool as cacheable in STM.tool_results, keyed by one of its parameters."""
key: str # Name of the parameter whose value is the cache key.
@classmethod
def from_dict(cls, data: dict) -> CacheSpec:
_require(data, "key", "cache")
return cls(key=str(data["key"]).strip())
@dataclass(frozen=True)
class ToolSpec:
"""Full semantic spec for one tool."""
name: str
summary: str # One-liner — becomes Tool.description.
description: str # Longer paragraph.
when_to_use: str
when_not_to_use: str | None
next_steps: str | None
parameters: dict[str, ParameterSpec] # name -> ParameterSpec
returns: dict[str, ReturnsSpec] # status_key -> ReturnsSpec
cache: CacheSpec | None = None # If present, tool is cached.
@classmethod
def from_yaml_path(cls, path: Path) -> ToolSpec:
with open(path, encoding="utf-8") as f:
data = yaml.safe_load(f) or {}
if not isinstance(data, dict):
raise ToolSpecError(f"{path}: top-level must be a mapping")
try:
return cls.from_dict(data)
except ToolSpecError as e:
raise ToolSpecError(f"{path}: {e}") from e
@classmethod
def from_dict(cls, data: dict) -> ToolSpec:
_require(data, "name", "spec")
_require(data, "summary", "spec")
_require(data, "description", "spec")
_require(data, "when_to_use", "spec")
params_raw = data.get("parameters") or {}
if not isinstance(params_raw, dict):
raise ToolSpecError("parameters must be a mapping")
parameters = {
pname: ParameterSpec.from_dict(pname, pdata or {})
for pname, pdata in params_raw.items()
}
returns_raw = data.get("returns") or {}
if not isinstance(returns_raw, dict):
raise ToolSpecError("returns must be a mapping")
returns = {
rkey: ReturnsSpec.from_dict(rkey, rdata or {})
for rkey, rdata in returns_raw.items()
}
cache_raw = data.get("cache")
if cache_raw is not None and not isinstance(cache_raw, dict):
raise ToolSpecError("cache must be a mapping")
cache = CacheSpec.from_dict(cache_raw) if cache_raw else None
spec = cls(
name=str(data["name"]).strip(),
summary=str(data["summary"]).strip(),
description=str(data["description"]).strip(),
when_to_use=str(data["when_to_use"]).strip(),
when_not_to_use=_strip_or_none(data.get("when_not_to_use")),
next_steps=_strip_or_none(data.get("next_steps")),
parameters=parameters,
returns=returns,
cache=cache,
)
if cache is not None and cache.key not in parameters:
raise ToolSpecError(
f"cache.key '{cache.key}' is not a declared parameter "
f"(declared: {sorted(parameters)})"
)
return spec
def compile_description(self) -> str:
"""
Build the long description text passed to the LLM as Tool.description.
Layout:
<summary>
<description>
When to use:
<when_to_use>
When NOT to use: (if present)
<when_not_to_use>
Next steps: (if present)
<next_steps>
Returns:
<status>: <description>
· <field>: <desc>
"""
parts = [self.summary, "", self.description]
parts += ["", "When to use:", _indent(self.when_to_use)]
if self.when_not_to_use:
parts += ["", "When NOT to use:", _indent(self.when_not_to_use)]
if self.next_steps:
parts += ["", "Next steps:", _indent(self.next_steps)]
if self.returns:
parts += ["", "Returns:"]
for status, ret in self.returns.items():
parts.append(f" {status}: {ret.description}")
for fname, fdesc in ret.fields.items():
parts.append(f" · {fname}: {fdesc}")
return "\n".join(parts)
def compile_parameter_description(self, name: str) -> str:
"""Build the JSON Schema 'description' field for one parameter."""
p = self.parameters.get(name)
if p is None:
raise ToolSpecError(f"tool '{self.name}': no spec for parameter '{name}'")
text = f"{p.description} (Why: {p.why_needed})"
if p.example:
text += f" Example: {p.example}"
return text
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _require(data: dict, key: str, where: str) -> None:
if data.get(key) is None or (isinstance(data[key], str) and not data[key].strip()):
raise ToolSpecError(f"{where}: missing required field '{key}'")
def _strip_or_none(value) -> str | None:
if value is None:
return None
s = str(value).strip()
return s or None
def _indent(text: str, prefix: str = " ") -> str:
return "\n".join(prefix + line for line in text.splitlines())
@@ -0,0 +1,53 @@
"""
ToolSpecLoader — discover and load all YAML tool specs from a directory.
Convention: one YAML file per tool, named exactly like the Python function
that implements it (e.g. resolve_season_destination.yaml).
"""
from __future__ import annotations
import logging
from pathlib import Path
from .spec import ToolSpec, ToolSpecError
logger = logging.getLogger(__name__)
_DEFAULT_SPECS_DIR = Path(__file__).parent / "specs"
def load_tool_specs(specs_dir: Path | None = None) -> dict[str, ToolSpec]:
"""
Load every {tool}.yaml under specs_dir into a {name -> ToolSpec} mapping.
Args:
specs_dir: Directory to scan. Defaults to alfred/agent/tools/specs/.
Returns:
Mapping from tool name to its parsed ToolSpec.
Raises:
ToolSpecError: if a spec is malformed, or if the filename doesn't
match the 'name' field inside the YAML.
"""
root = specs_dir or _DEFAULT_SPECS_DIR
if not root.exists():
logger.warning(f"Tool specs directory not found: {root}")
return {}
specs: dict[str, ToolSpec] = {}
for path in sorted(root.glob("*.yaml")):
spec = ToolSpec.from_yaml_path(path)
expected_name = path.stem
if spec.name != expected_name:
raise ToolSpecError(
f"{path}: filename stem '{expected_name}' "
f"does not match spec.name '{spec.name}'"
)
if spec.name in specs:
raise ToolSpecError(f"duplicate tool spec name: '{spec.name}'")
specs[spec.name] = spec
logger.info(f"Loaded {len(specs)} tool spec(s) from {root}")
return specs
@@ -0,0 +1,53 @@
name: add_torrent_by_index
summary: >
Pick a torrent from the last find_torrent results by index and add
it to qBittorrent in one call.
description: |
Convenience wrapper that combines get_torrent_by_index +
add_torrent_to_qbittorrent. Looks up the torrent at the given
1-based index, extracts its magnet link, and sends it to
qBittorrent. The result mirrors add_torrent_to_qbittorrent's, with
the chosen torrent's name appended on success.
when_to_use: |
The default action after find_torrent when the user picks a hit by
number ("download the second one"). One call, two side effects:
episodic memory updated + download started.
when_not_to_use: |
- When the user only wants to inspect, not download — use
get_torrent_by_index.
- When the magnet comes from outside the search results — use
add_torrent_to_qbittorrent directly.
next_steps: |
- On status=ok: confirm the download started and end the workflow
if not already ended.
- On status=error (not_found): the index is out of range; show the
available count from episodic memory.
- On status=error (no_magnet): the search result was malformed —
suggest re-running find_torrent.
parameters:
index:
description: 1-based position of the torrent in the last find_torrent results.
why_needed: |
Identifies which torrent to add. Out-of-range indices return
not_found.
example: 3
returns:
ok:
description: Torrent was added to qBittorrent.
fields:
status: "'ok'"
message: Confirmation message.
torrent_name: Name of the torrent that was added.
error:
description: Failed to add.
fields:
error: Short error code (not_found, no_magnet, ...).
message: Human-readable explanation.
@@ -0,0 +1,48 @@
name: add_torrent_to_qbittorrent
summary: >
Send a magnet link to qBittorrent and start the download.
description: |
Adds a torrent to qBittorrent using its WebUI API. On success, the
download is also recorded in episodic memory as an active_download
so the agent can track its progress later, the STM topic is set to
"downloading", and the current workflow is ended (the user typically
leaves the find-and-download scope at this point).
when_to_use: |
When the user provides a raw magnet link, or when chaining manually
after get_torrent_by_index. For the common "user picked search hit
N" case, prefer add_torrent_by_index — one call instead of two.
when_not_to_use: |
- For .torrent files (not supported by this tool — magnet only).
- When qBittorrent is not configured / reachable — the call will
fail and the user has to fix the config first.
next_steps: |
- On status=ok: the workflow is already ended; confirm to the user
that the download has started.
- On status=error: surface the message; common causes are auth
failure or qBittorrent being unreachable.
parameters:
magnet_link:
description: Magnet URI of the torrent to add (magnet:?xt=urn:btih:...).
why_needed: |
The actual payload sent to qBittorrent. Must be a full magnet
URI, not a hash alone.
example: "magnet:?xt=urn:btih:abc123..."
returns:
ok:
description: Torrent accepted by qBittorrent.
fields:
status: "'ok'"
message: Confirmation message.
error:
description: qBittorrent rejected the request or is unreachable.
fields:
error: Short error code.
message: Human-readable explanation.
@@ -0,0 +1,85 @@
name: analyze_release
summary: >
One-shot analyzer that parses a release name, detects its media type
from the folder layout, and enriches the result with ffprobe data.
description: |
Combines three steps in a single call so the agent gets a complete
picture before routing:
1. parse_release(release_name) — extracts title, year, season,
episode, quality, source, codec, group, languages, audio info,
HDR, edition, site tag.
2. detect_media_type(parsed, path) — uses the on-disk layout
(single file vs. folder, presence of S01 dirs, episode count)
to choose: movie / tv_episode / tv_season / tv_complete /
other / unknown.
3. ffprobe enrichment — when the media type is recognised, runs
ffprobe on the first video file found and fills in audio
codec/channels, bit depth, HDR format. Sets probe_used=true.
when_to_use: |
As the very first step of any organize workflow, right after
list_folder, on each release the user wants to handle. The output
drives which resolve_*_destination to call next.
when_not_to_use: |
- When you only need codec/audio info on a specific video file:
use probe_media (no parsing, no media-type detection).
- For releases the user has already analyzed earlier in the same
workflow — the parse is deterministic, no need to re-run.
next_steps: |
- media_type == movie → resolve_movie_destination
- media_type == tv_season → resolve_season_destination
- media_type == tv_episode → resolve_episode_destination
- media_type == tv_complete → resolve_series_destination
- media_type in (other, unknown) → ask the user what to do; do not
auto-route.
cache:
key: source_path
parameters:
release_name:
description: Raw release folder or file name as it appears on disk.
why_needed: |
Source of all the parsed tokens (quality, codec, group, ...).
Don't sanitise it — the parser relies on the exact spelling.
example: Breaking.Bad.S01.1080p.BluRay.x265-GROUP
source_path:
description: Absolute path to the release folder or file on disk.
why_needed: |
Required for layout-based media-type detection and for ffprobe
to find a video file inside the release.
example: /downloads/Breaking.Bad.S01.1080p.BluRay.x265-GROUP
returns:
ok:
description: Release analyzed.
fields:
status: "'ok'"
media_type: "One of: movie, tv_episode, tv_season, tv_complete, other, unknown."
parse_path: "Which parser branch was taken (debug)."
title: Parsed title.
year: Parsed year (int) or null.
season: Season number (int) or null.
episode: Episode number (int) or null.
episode_end: Range end episode (multi-episode releases) or null.
quality: Resolution token (e.g. 1080p, 2160p).
source: Source token (BluRay, WEB-DL, ...).
codec: Video codec token (x264, x265, ...).
group: Release group name or null.
languages: List of detected language tokens.
audio_codec: Audio codec from ffprobe (when probe_used=true).
audio_channels: Audio channel count from ffprobe.
bit_depth: Bit depth from ffprobe.
hdr_format: HDR format from ffprobe (HDR10, DV, ...) or null.
edition: Edition tag (Extended, Director's Cut, ...) or null.
site_tag: Source-site tag if present.
is_season_pack: True when the folder contains a full season.
probe_used: True when ffprobe successfully enriched the result.
confidence: Parser confidence score, 0100 (higher = more reliable).
road: "Parser road: 'easy' (group schema matched), 'shitty' (heuristic but acceptable), or 'path_of_pain' (low confidence — ask the user before auto-routing)."
recommended_action: "Orchestrator hint: 'process' (go straight to resolve_*_destination), 'ask_user' (media_type unknown or road=path_of_pain — confirm with the user first), or 'skip' (no main video, or media_type=other — nothing to organize)."
@@ -0,0 +1,59 @@
name: create_seed_links
summary: >
Recreate the original torrent folder structure with hard-links so
qBittorrent can keep seeding after the library move.
description: |
Hard-links the library video file back into torrents/<original_folder_name>/
and copies all remaining files from the original download folder
(subtitles, .nfo, .jpg, .txt, …) so the torrent data is complete on
disk. qBittorrent then sees the same content at the location it
expects and can keep seeding without rehashing the whole torrent.
when_to_use: |
Only when the user has confirmed they want to keep seeding after a
move. Call right after manage_subtitles (or after move_media if there
are no subs).
when_not_to_use: |
- When the user explicitly answered "no" to "keep seeding?".
- When the download was not from a torrent (e.g. direct download).
- Before the library file is in place — this tool reads it.
next_steps: |
- After success: optionally call qBittorrent to update the torrent's
save path / force a recheck (not yet covered by a tool).
- End the workflow.
parameters:
library_file:
description: Absolute path to the video file now in the library.
why_needed: |
The source for the hard-link — same inode means qBittorrent sees
identical bytes at the seeding path.
example: /tv_shows/Oz.1997.1080p.WEBRip.x265-KONTRAST/Season 03/Oz.S03E01.mkv
original_download_folder:
description: Absolute path to the original download folder.
why_needed: |
Provides the folder name to recreate under torrents/ and the
auxiliary files (subs, nfo, ...) to copy over.
example: /downloads/Oz.S03.1080p.WEBRip.x265-KONTRAST
returns:
ok:
description: Seeding folder rebuilt.
fields:
status: "'ok'"
torrent_subfolder: Absolute path of the recreated folder under torrents/.
linked_file: Absolute path of the hard-linked video.
copied_files: List of auxiliary files that were copied.
copied_count: Number of auxiliary files copied.
skipped: List of files skipped (already present, unreadable, ...).
error:
description: Failed to rebuild the seeding folder.
fields:
error: Short error code.
message: Human-readable explanation.
@@ -0,0 +1,48 @@
name: end_workflow
summary: >
Leave the current workflow scope and return to the broad-catalog mode.
description: |
Clears the active workflow from STM. After this call the visible tool
catalog returns to the core noyau plus start_workflow, so the agent is
ready to handle a different request.
when_to_use: |
- When all the workflow's steps have completed successfully.
- When the user explicitly cancels the current task.
- When the user changes subject mid-conversation and the active
workflow is no longer relevant.
- When an unrecoverable error makes continuing pointless — explain
in 'reason'.
when_not_to_use: |
- Do not call when there is no active workflow — it will return an
error. Just call start_workflow for the new request instead.
- Do not call mid-step just to "free up tools"; finish the step
or fail it explicitly first.
next_steps: |
- After ending, you can either call start_workflow for a new task or
answer the user directly from the broad catalog.
parameters:
reason:
description: Short reason for ending — completed, cancelled, changed_subject, error, ...
why_needed: |
Recorded in episodic memory for debugging and future audits. A
structured short string is more useful than a long sentence.
example: completed
returns:
ok:
description: Workflow ended; catalog is back to the broad noyau.
fields:
workflow: Name of the workflow that just ended.
reason: The reason that was passed in.
error:
description: Could not end — typically because nothing was active.
fields:
error: Short error code (no_active_workflow).
message: Human-readable explanation.
@@ -0,0 +1,56 @@
name: find_media_imdb_id
summary: >
Search TMDB for a media title and return its canonical title, year,
IMDb id, and TMDB id.
description: |
Looks up a title on TMDB and returns the canonical metadata needed by
the resolve_*_destination tools. On success, the result is also
stashed in short-term memory under "last_media_search" so later steps
in the workflow can read it without re-calling TMDB. The STM topic
is set to "searching_media".
when_to_use: |
Right after analyze_release, before calling resolve_*_destination —
the resolvers need the canonical title + year and refuse to guess
them from the raw release name.
when_not_to_use: |
- When you already have the IMDb id in STM from an earlier step in
the same workflow.
- For torrent search — use find_torrent instead.
next_steps: |
- On status=ok: call the appropriate resolve_*_destination with
tmdb_title and tmdb_year from the result.
- On status=error (not_found): show the error and ask the user for
a more precise title.
cache:
key: media_title
parameters:
media_title:
description: Title to search for. Free-form — TMDB does the matching.
why_needed: |
Drives the TMDB query. Pass a sanitized version (no resolution
tokens, no group name) for best results.
example: Breaking Bad
returns:
ok:
description: Match found.
fields:
status: "'ok'"
title: Canonical title as returned by TMDB.
year: Release year (movies) or first-air year (series).
media_type: "'movie' or 'tv'."
imdb_id: IMDb identifier (ttXXXXXXX) or null.
tmdb_id: TMDB numeric id.
error:
description: No match or API failure.
fields:
error: Short error code (not_found, api_error, ...).
message: Human-readable explanation.
@@ -0,0 +1,52 @@
name: find_torrent
summary: >
Search Knaben for torrents matching a media title; cache results in
episodic memory.
description: |
Queries the Knaben aggregator for up to 10 torrents matching the
given title, then stores the result list in episodic memory under
"last_search_results". The user can then refer to a torrent by
1-based index ("download the 3rd one") via get_torrent_by_index or
add_torrent_by_index. The STM topic is set to "selecting_torrent".
when_to_use: |
When the user wants to download something new — typically the first
step of a "find + download" sub-task. The agent should usually
pre-filter the title (canonical name + year) before searching for
cleaner results.
when_not_to_use: |
- For TMDB metadata lookup — use find_media_imdb_id.
- When a search was already performed in the same session and the
user is just picking from the existing list.
next_steps: |
- Present the indexed results to the user.
- Once chosen: call add_torrent_by_index(N) — that wraps
get_torrent_by_index + add_torrent_to_qbittorrent.
cache:
key: media_title
parameters:
media_title:
description: Title to search for on Knaben. Free-form.
why_needed: |
Drives the search query. Use the canonical title (from
find_media_imdb_id) plus quality preferences for better hits.
example: Inception 2010 1080p
returns:
ok:
description: Search returned a list of torrents.
fields:
status: "'ok'"
torrents: "List of {name, size, seeders, leechers, magnet, ...}, up to 10."
error:
description: Search failed.
fields:
error: Short error code.
message: Human-readable explanation.
@@ -0,0 +1,48 @@
name: get_torrent_by_index
summary: >
Retrieve a torrent from the last find_torrent search by its 1-based
index.
description: |
Reads episodic memory's last_search_results and returns the entry at
the given 1-based position. Pure lookup — does not start a download.
Fails when the search results are missing or the index is out of
range.
when_to_use: |
When the user references a search hit by number ("show me the second
one") but doesn't yet want to download — e.g. inspection, sharing
the magnet, ...
when_not_to_use: |
- When the user wants to start downloading: use add_torrent_by_index
instead (one call instead of two).
- When no search has been performed yet — the result will be
not_found.
next_steps: |
- Display the torrent to the user.
- If they then say "add it", call add_torrent_to_qbittorrent with the
magnet, or add_torrent_by_index with the same index.
parameters:
index:
description: 1-based position in the last find_torrent result list.
why_needed: |
Maps to a specific torrent entry. Out-of-range values return an
error, not a wraparound.
example: 3
returns:
ok:
description: Torrent found at that index.
fields:
status: "'ok'"
torrent: "Full torrent dict (name, size, seeders, leechers, magnet, ...)."
error:
description: No torrent at that index.
fields:
error: Short error code (not_found).
message: Human-readable explanation, e.g. "Search for torrents first."
@@ -0,0 +1,76 @@
name: learn
summary: >
Teach Alfred a new token mapping and persist it to the learned
knowledge pack so future scans recognise it.
description: |
Appends a new token (or list of tokens) to a key inside a knowledge
pack and writes the result to `data/knowledge/<pack>_learned.yaml`.
The change is persisted atomically (write-tmp + rename) so a crash
cannot corrupt the file. Currently only the `subtitles` pack is
supported.
when_to_use: |
When manage_subtitles returns needs_clarification with unresolved
tokens, after confirming with the user what the tokens mean. Call
once per (category, key) — multiple values can be added in a single
call.
when_not_to_use: |
- Without explicit user confirmation of what the token means.
- For knowledge that belongs in the static pack
(alfred/knowledge/<pack>.yaml) — that's editor territory, not
runtime learning.
next_steps: |
- After success: re-run the workflow step that triggered the
clarification (typically manage_subtitles) so the new mapping is
applied.
parameters:
pack:
description: Knowledge pack name. Currently only "subtitles" is supported.
why_needed: |
Decides which `*_learned.yaml` file under data/knowledge/ gets
written. The pack name is namespaced to avoid collisions across
domains.
example: subtitles
category:
description: Category within the pack — "languages", "types", or "formats".
why_needed: |
Different categories use different lookup tables at scan time.
A wrong category silently has no effect.
example: languages
key:
description: Canonical entry id — ISO 639-1 code, type name, format name.
why_needed: |
The destination bucket for the new tokens. Existing tokens under
this key are kept; only new values are appended.
example: es
values:
description: List of token spellings to add.
why_needed: |
Release groups use many spellings for the same language/type;
pass them all in one call instead of multiple round-trips.
example: '["spanish", "espanol", "spa"]'
returns:
ok:
description: Mapping saved.
fields:
status: "'ok'"
pack: Name of the pack that was written to.
category: Category that was updated.
key: Key that was updated.
added_count: Number of values that were actually new (deduplicated).
tokens: Full updated token list for that key.
error:
description: Save failed.
fields:
error: Short error code (unknown_pack, unknown_category, read_failed, write_failed).
message: Human-readable explanation.
@@ -0,0 +1,63 @@
name: list_folder
summary: >
List the contents of a configured folder, optionally below a
relative subpath.
description: |
Reads a folder previously configured via set_path_for_folder and
returns its entries (files + directories). A relative `path` lets you
drill down without re-specifying the absolute root each time. Path
traversal is rejected (no `..`, no absolute paths) so the agent
cannot escape the configured root.
when_to_use: |
- At the start of an organize workflow to discover what's available
in the download folder.
- To browse a library collection ("what tv shows do I have?").
- As a sanity check before any move to confirm the target exists.
when_not_to_use: |
- For folders that are not configured — call set_path_for_folder
first.
- To list arbitrary system paths — this tool is intentionally scoped
to the known roots.
next_steps: |
- After listing the download folder: typically call analyze_release
on a specific entry.
- After listing a library folder: use the result to disambiguate a
destination during resolve_*_destination.
cache:
key: path
parameters:
folder_type:
description: Logical folder key (download, torrent, movie, tv_show, ...).
why_needed: |
Resolves to an absolute root through LTM. Must have been set via
set_path_for_folder beforehand.
example: download
path:
description: Relative subpath inside the root (default ".").
why_needed: |
Lets you drill into a subfolder without expanding the root. No
".." or absolute path is allowed.
example: Breaking.Bad.S01.1080p.BluRay.x265-GROUP
returns:
ok:
description: Listing returned.
fields:
status: "'ok'"
folder_type: The key that was listed.
path: The relative path that was listed.
entries: List of {name, type, size?} for each entry.
error:
description: Could not list the folder.
fields:
error: Short error code (folder_not_configured, path_not_found, path_traversal, ...).
message: Human-readable explanation.
@@ -0,0 +1,67 @@
name: manage_subtitles
summary: >
Detect, filter, and place subtitle tracks next to a video that has just
been organised into the library.
description: |
Scans the source video's surroundings for subtitle files
(.srt, .ass, .ssa, .vtt, .sub), classifies them by language and type
(standard / SDH / forced), filters by the user's SubtitlePreferences
(languages, min size, keep_sdh, keep_forced), and hard-links the
passing files next to the destination video using the convention
`<lang>.<ext>`, `<lang>.sdh.<ext>`, `<lang>.forced.<ext>`.
If no subtitles are found, returns status=ok with placed_count=0 — not
an error.
when_to_use: |
Always after a successful move_media / move_to_destination, before
closing the workflow. Pass the original source path (where subs live)
and the new library path (where they should land).
when_not_to_use: |
- Do not call before the video itself has been moved — the destination
must exist for hard-links to make sense.
- Skip when the user explicitly asks not to handle subtitles.
next_steps: |
- On status=ok: continue with create_seed_links (if seeding) or end
the workflow.
- On status=needs_clarification: ask the user about the unresolved
tokens, then optionally call learn() to teach the new mapping.
parameters:
source_video:
description: Absolute path to the original video file (in the download folder).
why_needed: |
Subtitles typically live next to the source, either as siblings or
in a Subs/ subfolder. The scanner walks from this path.
example: /downloads/Oz.S03.1080p.WEBRip.x265-KONTRAST/Oz.S03E01.mkv
destination_video:
description: Absolute path to the video file in its library location.
why_needed: |
Subtitles are hard-linked next to this file so media players pick
them up automatically.
example: /tv_shows/Oz.1997.1080p.WEBRip.x265-KONTRAST/Season 03/Oz.S03E01.mkv
returns:
ok:
description: Subtitles scanned (and possibly placed).
fields:
status: "'ok'"
placed: List of {source, destination, filename} for each linked file.
placed_count: Number of subtitle files placed.
skipped_count: Number of subtitle files filtered out.
needs_clarification:
description: One or more tokens could not be classified.
fields:
unresolved: List of unrecognised tokens with their context.
question: Human-readable question to relay to the user.
error:
description: Scan or placement failed.
fields:
error: Short error code.
message: Human-readable explanation.
@@ -0,0 +1,58 @@
name: move_media
summary: >
Safely move a media file with copy + integrity check + delete source.
description: |
Copies the source file to the destination with an integrity check,
then deletes the source. Slower than move_to_destination (which is a
plain rename) but safer across filesystems where rename is not atomic
or when you want a checksum verification.
when_to_use: |
Use to move a single file across filesystems or when paranoia about
data integrity is justified — e.g. moving a finished download from a
scratch disk to the main library array.
when_not_to_use: |
- For same-filesystem moves where speed matters: use move_to_destination
(instant rename on ZFS/ext4 within the same dataset).
- For folder-level moves of complete packs: use move_to_destination —
move_media is a single-file operation.
next_steps: |
- After a successful move: call manage_subtitles to place any subtitle
tracks, then create_seed_links if the user wants to keep seeding.
- On error: surface the error code (file_not_found, destination_exists,
integrity_check_failed) and ask the user how to proceed.
parameters:
source:
description: Absolute path to the source video file.
why_needed: |
The file being moved. Typically lives under the downloads folder
after a torrent completes.
example: /downloads/Inception.2010.1080p.BluRay.x265-GROUP/movie.mkv
destination:
description: Absolute path of the destination file — must not already exist.
why_needed: |
Where the file lands in the library. Comes from a resolve_*_destination
call so the naming convention is respected.
example: /movies/Inception.2010.1080p.BluRay.x265-GROUP/Inception.2010.1080p.BluRay.x265-GROUP.mkv
returns:
ok:
description: Move succeeded.
fields:
status: "'ok'"
source: Absolute path of the source (now gone).
destination: Absolute path of the destination (now in place).
filename: Basename of the destination file.
size: Size in bytes.
error:
description: Move failed.
fields:
error: Short error code (file_not_found, destination_exists, integrity_check_failed, ...).
message: Human-readable explanation.
@@ -0,0 +1,55 @@
name: move_to_destination
summary: >
Move a file or folder to a destination, creating parent directories as needed.
description: |
Performs an actual move on disk. Uses the system 'mv' command, so on the
same filesystem (e.g. ZFS) this is an instant rename. Creates the parent
directory of the destination if it doesn't exist yet, then moves. Returns
before/after paths on success, or an error if the destination already
exists or the source can't be moved.
when_to_use: |
Use after one of the resolve_*_destination tools returned status=ok, to
perform the move it described. The 'source' and 'destination' arguments
come directly from the resolved paths.
when_not_to_use: |
- Never move when status was not 'ok' (clarification still pending or
error happened) — that would leave the library in a half-broken state.
- Don't use this for the seed-link step; use create_seed_links for that.
next_steps: |
- After a successful move: call manage_subtitles to place any subtitle
tracks, then create_seed_links to keep qBittorrent seeding.
- On error: surface the message; do not retry blindly — check whether
the destination already exists or the source path is correct.
parameters:
source:
description: Absolute path to the source file or folder to move.
why_needed: |
The thing being moved. Comes from the user's download folder or from
a previous tool's output.
example: /downloads/Oz.S03.1080p.WEBRip.x265-KONTRAST
destination:
description: Absolute path of the destination — must not already exist.
why_needed: |
Where to put the source. Comes from a resolve_*_destination call so
that the path matches the library's naming convention.
example: /tv_shows/Oz.1997.1080p.WEBRip.x265-KONTRAST/Oz.S03.1080p.WEBRip.x265-KONTRAST
returns:
ok:
description: Move succeeded.
fields:
source: Absolute path of the source (now gone).
destination: Absolute path of the destination (now in place).
error:
description: Move failed.
fields:
error: Short error code (source_not_found, destination_exists, mkdir_failed, move_failed).
message: Human-readable explanation of what went wrong.
@@ -0,0 +1,56 @@
name: probe_media
summary: >
Run ffprobe on a single video file and return its technical details.
description: |
Inspects a specific video file with ffprobe and returns codec,
resolution, duration, bitrate, the list of audio tracks (with
language and channel layout), and the list of embedded subtitle
tracks. Independent of any release-name parsing — works on any file
you can point at.
when_to_use: |
- To inspect a file's audio/subtitle tracks before deciding what to
do (e.g. choose a default audio language).
- To verify a video's resolution / codec when the release name is
unreliable.
- As a building block when analyze_release is overkill.
when_not_to_use: |
- For full release routing — analyze_release does parsing + media
type detection + probe in one call.
- On non-video files — ffprobe will return probe_failed.
next_steps: |
- The returned info typically feeds a user-facing decision (e.g.
"this is 7.1 DTS, want to keep it?"); rarely chained directly to
another tool.
cache:
key: source_path
parameters:
source_path:
description: Absolute path to the video file to probe.
why_needed: |
ffprobe needs the exact file (not a folder). For releases use
analyze_release; for a known file path, pass it here.
example: /downloads/Inception.2010.1080p.BluRay.x265-GROUP/movie.mkv
returns:
ok:
description: Probe succeeded.
fields:
status: "'ok'"
video: "Dict with codec, resolution, width, height, duration_seconds, bitrate_kbps."
audio_tracks: "List of {index, codec, channels, channel_layout, language, is_default}."
subtitle_tracks: "List of {index, codec, language, is_default, is_forced}."
audio_languages: List of language codes present in audio tracks.
is_multi_audio: True when more than one audio language is present.
error:
description: Probe failed.
fields:
error: Short error code (not_found, probe_failed).
message: Human-readable explanation.
@@ -0,0 +1,54 @@
name: query_library
summary: >
Find release folders across all configured library roots whose name
contains a substring (case-insensitive).
description: |
Scans every configured library root (movies, tv_shows, …) at depth 1
and returns folders whose name contains the query. For each match,
reports whether a `.alfred/metadata.yaml` exists — handy to spot
releases that have not been inspected yet. Does not recurse into
seasons / episodes; one entry per release folder.
when_to_use: |
- To answer "do I already have X?" without listing whole library
roots one by one.
- To pick the release_path to feed read_release_metadata or any
inspector tool.
when_not_to_use: |
- To list the *whole* library — that scan should live behind a
dedicated tool (not implemented yet).
- To browse a single root — use list_folder instead, it's cheaper
and doesn't open every library.
next_steps: |
- When one match is found: feed its path to read_release_metadata or
analyze_release.
- When several match: surface the indexed list to the user and ask
which one they mean.
parameters:
name:
description: Case-insensitive substring of the release name to look for.
why_needed: |
Library folders are named after the release (Title.Year.... or
Title (Year)). A substring is enough to catch typical user
phrasings ("foundation", "inception 2010").
example: foundation
returns:
ok:
description: Scan completed (possibly zero matches).
fields:
status: "'ok'"
query: The query string as received.
match_count: Number of matching folders.
matches: "List of {collection, name, path, has_metadata}."
error:
description: Scan could not run.
fields:
error: Short error code (no_libraries, empty_name).
message: Human-readable explanation.
@@ -0,0 +1,55 @@
name: read_release_metadata
summary: >
Read the `.alfred/metadata.yaml` file for a release folder.
description: |
Returns whatever has been previously persisted by inspector tools
(analyze_release, probe_media, find_media_imdb_id) and by the subtitle
pipeline. Works for any folder — download or library — as long as the
release has been touched at least once. Missing metadata is not an
error: the tool returns `has_metadata=false` with an empty dict.
when_to_use: |
- Before re-running analyze_release / probe_media on a release you
might have already seen — saves a full re-inspection.
- To answer "what do we know about X?" without scanning.
- To list which releases in a library have no `.alfred` yet (loop +
`has_metadata`).
when_not_to_use: |
- To search a library by name — use query_library.
- When you need a fresh probe/parse — call the inspector directly,
the result will be persisted automatically.
next_steps: |
- If `has_metadata=false`, decide whether to inspect now
(analyze_release / probe_media).
- If `has_metadata=true`, read `metadata.parse`, `metadata.probe`,
`metadata.tmdb` blocks before deciding next actions.
cache:
key: release_path
parameters:
release_path:
description: Absolute path to the release folder (or any file inside it).
why_needed: |
The store lives at `<release_root>/.alfred/metadata.yaml`. A file
path is auto-resolved to its parent folder.
example: /mnt/library/tv_shows/Foundation.2021.1080p.WEBRip.x265-RARBG
returns:
ok:
description: Release inspected (file may or may not exist).
fields:
status: "'ok'"
release_path: Absolute path of the release folder.
has_metadata: True if `.alfred/metadata.yaml` exists.
metadata: Full content of the file, or empty dict.
error:
description: Path does not exist on disk.
fields:
error: Short error code (not_found).
message: Human-readable explanation.
@@ -0,0 +1,93 @@
name: resolve_episode_destination
summary: >
Compute destination paths for a single TV episode file (file move).
description: |
Resolves the target series folder, season subfolder, and full destination
filename for a single-episode release. Returns paths only — does not move
anything. If a series folder with a different name already exists, returns
needs_clarification.
when_to_use: |
Use after analyze_release has identified the release as a single episode
(media_type=tv_show, season AND episode both set). TMDB must already be
queried for the canonical title/year, and optionally the episode title.
when_not_to_use: |
- Season packs (folder containing many episodes): use resolve_season_destination.
- Multi-season packs: use resolve_series_destination.
- Movies: use resolve_movie_destination.
next_steps: |
- On status=ok: call move_to_destination with the source video file and
destination=library_file.
- On status=needs_clarification: present question/options to the user,
then re-call with confirmed_folder set.
- On status=error: surface the message; do not move.
parameters:
release_name:
description: Raw release file name (with extension).
why_needed: |
Drives extraction of quality/source/codec/group, which become part of
the destination filename so each file is self-describing.
example: Oz.S03E01.1080p.WEBRip.x265-KONTRAST.mkv
source_file:
description: Absolute path to the source video file on disk.
why_needed: |
Used to read the source file extension (.mkv, .mp4, .avi…) for the
destination filename — release names don't always carry the extension.
example: /downloads/Oz.S03E01.1080p.WEBRip.x265-KONTRAST/file.mkv
tmdb_title:
description: Canonical show title from TMDB.
why_needed: |
Title prefix for both the series folder and the destination filename;
ensures consistent naming across all episodes of the show.
example: Oz
tmdb_year:
description: Show start year from TMDB.
why_needed: |
Disambiguates remakes/reboots sharing a title; year is part of the
series folder identity.
example: "1997"
tmdb_episode_title:
description: Episode title from TMDB. Optional.
why_needed: |
When present, the destination filename embeds the episode title for
human-readability (e.g. Oz.S01E01.The.Routine...).
example: The Routine
confirmed_folder:
description: Folder name the user picked after needs_clarification.
why_needed: |
Forces the use case to skip detection and use this exact folder name.
example: Oz.1997.1080p.WEBRip.x265-KONTRAST
returns:
ok:
description: Paths resolved; ready to move the episode file.
fields:
series_folder: Absolute path to the series root folder.
season_folder: Absolute path to the season subfolder.
library_file: Absolute path to the destination .mkv file (move target).
series_folder_name: Series folder name for display.
season_folder_name: Season folder name for display.
filename: Destination filename for display.
is_new_series_folder: True if the series folder doesn't exist yet.
needs_clarification:
description: A folder exists with a different name; user must choose.
fields:
question: Human-readable question.
options: List of folder names to pick from.
error:
description: Resolution failed.
fields:
error: Short error code.
message: Human-readable explanation.
@@ -0,0 +1,72 @@
name: resolve_movie_destination
summary: >
Compute destination paths for a movie file (file move).
description: |
Resolves the target movie folder and full destination filename for a movie
release. Returns paths only — does not move anything. Movies do not have
the existing-folder disambiguation problem that TV shows have (each
release lands in its own folder named after the canonical title + year +
tech).
when_to_use: |
Use after analyze_release has identified the release as a movie
(media_type=movie). TMDB must already be queried for the canonical title
and release year.
when_not_to_use: |
- TV shows in any form: use resolve_season_destination /
resolve_episode_destination / resolve_series_destination.
- Documentaries when they're treated as series rather than standalone
films: route them through the TV-show resolvers.
next_steps: |
- On status=ok: call move_to_destination with the source video file and
destination=library_file.
- On status=error: surface the message; do not move.
parameters:
release_name:
description: Raw release folder or file name.
why_needed: |
Drives extraction of quality/source/codec/group/edition tokens, which
become part of both the movie folder and filename so each release is
self-describing on disk.
example: Inception.2010.1080p.BluRay.x265-GROUP
source_file:
description: Absolute path to the source video file on disk.
why_needed: |
Used to read the file extension for the destination filename.
example: /downloads/Inception.2010.1080p.BluRay.x265-GROUP/movie.mkv
tmdb_title:
description: Canonical movie title from TMDB.
why_needed: |
Title prefix for the destination folder/file; ensures the library
uses the canonical title and not a sanitized release-name title.
example: Inception
tmdb_year:
description: Movie release year from TMDB.
why_needed: |
Disambiguates remakes that share a title (Dune 1984 vs Dune 2021)
and locks the folder identity in time.
example: "2010"
returns:
ok:
description: Paths resolved; ready to move.
fields:
movie_folder: Absolute path to the movie folder.
library_file: Absolute path to the destination .mkv file (move target).
movie_folder_name: Folder name for display.
filename: Destination filename for display.
is_new_folder: True if the movie folder doesn't exist yet.
error:
description: Resolution failed.
fields:
error: Short error code (e.g. library_not_set).
message: Human-readable explanation.
@@ -0,0 +1,95 @@
name: resolve_season_destination
summary: >
Compute destination paths for a season pack (folder move) in the TV library.
description: |
Resolves the target series folder and season subfolder for a complete-season
download. Returns the paths only — does not perform any move. If a series
folder for this show already exists in the library with a different name
(different group/quality/source), returns needs_clarification so the user
can decide whether to merge into the existing folder or create a new one.
when_to_use: |
Use after analyze_release has identified the release as a season pack
(media_type=tv_show, season set, episode unset). TMDB must already be
queried so tmdb_title and tmdb_year are canonical values, not raw tokens
from the release name.
when_not_to_use: |
- Single-episode files: use resolve_episode_destination instead.
- Multi-season packs (S01-S05 etc.): use resolve_series_destination.
- Movies: use resolve_movie_destination.
next_steps: |
- On status=ok: call move_to_destination with source=<download folder> and
destination=season_folder.
- On status=needs_clarification: present the question and options to the
user, then re-call this tool with confirmed_folder set to the user's pick.
- On status=error: surface the message to the user; do not move anything.
parameters:
release_name:
description: Raw release folder name as it appears on disk.
why_needed: |
Drives extraction of quality/source/codec/group tokens — these are
embedded in the target folder name (Title.Year.Quality.Source.Codec-GROUP)
to make releases self-describing on the filesystem.
example: Oz.S03.1080p.WEBRip.x265-KONTRAST
tmdb_title:
description: Canonical show title from TMDB.
why_needed: |
Builds the title prefix of the folder name. Must come from TMDB to
avoid typos and variant spellings present in the raw release name.
example: Oz
tmdb_year:
description: Show start year from TMDB.
why_needed: |
Disambiguates shows that share a title across decades (e.g. multiple
remakes of "The Office") and locks the folder identity.
example: "1997"
confirmed_folder:
description: |
Folder name chosen by the user after a previous needs_clarification
response.
why_needed: |
Short-circuits the existing-folder detection and forces the use case
to use this exact folder name, even if it doesn't match the computed
one.
example: Oz.1997.1080p.WEBRip.x265-KONTRAST
source_path:
description: |
Absolute path to the release folder on disk. Optional.
why_needed: |
When provided, the tool runs ffprobe on the main video inside the
folder and uses the probe data to fill quality/codec tokens that
may be missing from the release name. The enriched tech tokens
end up in the destination folder name, so providing source_path
gives more accurate names for releases with sparse metadata.
example: /downloads/Oz.S03.1080p.WEBRip.x265-KONTRAST
returns:
ok:
description: Paths resolved unambiguously; ready to move.
fields:
series_folder: Absolute path to the series root folder.
season_folder: Absolute path to the season subfolder (move target).
series_folder_name: Just the series folder name, for display.
season_folder_name: Just the season folder name, for display.
is_new_series_folder: True if the series folder doesn't exist yet.
needs_clarification:
description: A folder already exists with a different name; ask the user.
fields:
question: Human-readable question for the user.
options: List of folder names the user can pick from.
error:
description: Resolution failed (config missing, invalid release name, etc.).
fields:
error: Short error code (e.g. library_not_set).
message: Human-readable explanation.
@@ -0,0 +1,87 @@
name: resolve_series_destination
summary: >
Compute the destination path for a complete multi-season series pack (folder move).
description: |
Resolves the target series folder for a pack that contains multiple seasons
(e.g. S01-S05 in a single release). Returns only the series folder — the
whole source folder is moved as-is into the library, no per-season
restructuring. If a folder with a different name already exists for this
show, returns needs_clarification.
when_to_use: |
Use after analyze_release has identified the release as a complete-series
pack (media_type=tv_complete, or multi-season indicators). TMDB must
already be queried for canonical title/year.
when_not_to_use: |
- Single-season packs: use resolve_season_destination.
- Single episodes: use resolve_episode_destination.
- Movies: use resolve_movie_destination.
next_steps: |
- On status=ok: call move_to_destination with source=<download folder> and
destination=series_folder.
- On status=needs_clarification: ask the user, re-call with
confirmed_folder set.
- On status=error: surface the message; do not move.
parameters:
release_name:
description: Raw release folder name as it appears on disk.
why_needed: |
Drives extraction of quality/source/codec/group tokens for the target
folder name, even though the multi-season structure inside is kept
as-is.
example: The.Wire.S01-S05.1080p.BluRay.x265-GROUP
tmdb_title:
description: Canonical show title from TMDB.
why_needed: |
Title prefix of the series folder; comes from TMDB to avoid raw
release-name spellings.
example: The Wire
tmdb_year:
description: Show start year from TMDB.
why_needed: |
Disambiguates shows that share a title across eras and locks the
folder identity.
example: "2002"
confirmed_folder:
description: Folder name chosen by the user after needs_clarification.
why_needed: |
Forces the use case to use this exact folder name and skip detection.
example: The.Wire.2002.1080p.BluRay.x265-GROUP
source_path:
description: |
Absolute path to the release folder on disk. Optional.
why_needed: |
When provided, the tool runs ffprobe on the main video inside the
folder and uses probe data to fill quality/codec tokens that may
be missing from the release name, producing a more accurate
destination folder name.
example: /downloads/The.Wire.S01-S05.1080p.BluRay.x265-GROUP
returns:
ok:
description: Path resolved; ready to move the pack.
fields:
series_folder: Absolute path to the destination series folder.
series_folder_name: Folder name for display.
is_new_series_folder: True if the folder doesn't exist yet.
needs_clarification:
description: A folder exists with a different name; ask the user.
fields:
question: Human-readable question.
options: List of folder names to pick from.
error:
description: Resolution failed.
fields:
error: Short error code.
message: Human-readable explanation.
@@ -0,0 +1,47 @@
name: set_language
summary: >
Set the conversation language so all subsequent assistant messages
match it.
description: |
Persists an ISO 639-1 language code in short-term memory under
conversation.language. Read by the prompt builder and any tool that
needs to localise output. Does not validate the code against an ISO
list — the LLM is trusted to pass a sensible value.
when_to_use: |
As the very first call when the user writes in a language different
from the current STM language. Doing it before answering avoids a
mid-reply switch.
when_not_to_use: |
- On every turn — only when the language actually changes.
- To pick a subtitle language — that lives in SubtitlePreferences,
not the conversation language.
next_steps: |
- After success: continue the user's request in the newly set
language.
parameters:
language:
description: ISO 639-1 language code (en, fr, es, de, ...).
why_needed: |
Identifies the target language unambiguously across the UI and
any localisation logic.
example: fr
returns:
ok:
description: Language saved.
fields:
status: "'ok'"
message: Confirmation message.
language: The language code that was saved.
error:
description: Could not save the language.
fields:
status: "'error'"
error: Short error code or exception message.
@@ -0,0 +1,58 @@
name: set_path_for_folder
summary: >
Configure where a known folder lives on disk (download, torrent, or
any library collection).
description: |
Stores an absolute path in long-term memory under a folder key. Two
classes of folders exist:
- Workspace paths: "download", "torrent" — single-valued each, used
by the organize workflows.
- Library paths: any other key (e.g. "movie", "tv_show",
"documentary") — these are the collections you organise into.
The path must exist and be a directory; otherwise the call fails
without changing memory.
when_to_use: |
On first run, or when the user moves a folder, or when introducing a
new library collection (e.g. "set the documentaries folder to ...").
when_not_to_use: |
- For one-off listings — list_folder works without configuration only
if the folder is already set.
- To rename or delete an existing folder — this only sets paths.
next_steps: |
- After success: typical follow-ups are list_folder on the same key,
or starting a workflow that needs the path.
parameters:
folder_name:
description: Logical name of the folder (download, torrent, movie, tv_show, ...).
why_needed: |
The key the agent uses everywhere afterwards. "download" and
"torrent" are reserved for workspace; anything else becomes a
library collection.
example: tv_show
path_value:
description: Absolute path to the folder on disk.
why_needed: |
Must exist and be readable. Stored verbatim in LTM — relative
paths are rejected.
example: /tank/library/tv_shows
returns:
ok:
description: Path saved to long-term memory.
fields:
status: "'ok'"
folder_name: The logical name that was set.
path_value: The absolute path that was saved.
error:
description: Could not set the path.
fields:
error: Short error code (path_not_found, not_a_directory, invalid_path, ...).
message: Human-readable explanation.
@@ -0,0 +1,64 @@
name: start_workflow
summary: >
Enter a workflow scope — narrows the visible tool catalog and gives the
agent a clear multi-step plan to follow.
description: |
Activates a named workflow defined in YAML under agent/workflows/.
Once active, only the workflow's declared tools (plus the core noyau)
are exposed to the LLM, which keeps the decision space small and
focused. The returned plan (description + steps) is the script the
agent should execute until end_workflow is called.
when_to_use: |
Use as the very first action whenever the user request maps to a
known workflow (e.g. "organize Breaking Bad" → media.organize_media).
Pass any parameters you already know (release name, target media,
flags) in 'params' so later steps can read them from STM.
when_not_to_use: |
- Do not start a workflow for purely conversational replies or
one-shot lookups that need a single tool call.
- Do not start a new workflow while one is already active — call
end_workflow first.
next_steps: |
- On status=ok: follow the returned 'steps' list, calling the tools
in order. The visible tool catalog has already been narrowed.
- On status=error (unknown_workflow): surface the available list to
the user and ask which one they meant.
- On status=error (workflow_already_active): either continue the
active workflow or call end_workflow first.
parameters:
workflow_name:
description: Fully-qualified name of the workflow to start (e.g. media.organize_media).
why_needed: |
Identifies which YAML definition to load. Names use the
'domain.action' convention (media.*, mail.*, ...).
example: media.organize_media
params:
description: Initial parameters to seed the workflow with (release name, target, flags).
why_needed: |
Later steps read these from STM instead of asking the user again.
Pass whatever you already extracted from the user's message.
example: '{"release_name": "Breaking.Bad.S01.1080p.BluRay.x265-GROUP", "keep_seeding": true}'
returns:
ok:
description: Workflow activated; catalog has been narrowed.
fields:
workflow: Name of the activated workflow.
description: Human-readable description of what the workflow does.
steps: Ordered list of steps to execute.
tools: Tools that are now visible (in addition to the core noyau).
error:
description: Could not activate the workflow.
fields:
error: Short error code (unknown_workflow, workflow_already_active).
message: Human-readable explanation.
available_workflows: List of valid workflow names (only on unknown_workflow).
active_workflow: Name of the currently active workflow (only on workflow_already_active).
+86
View File
@@ -0,0 +1,86 @@
"""Workflow scoping tools — start_workflow / end_workflow meta-tools.
These tools let the agent enter and leave a workflow scope. While a
workflow is active, the PromptBuilder narrows the visible tool catalog
to the noyau + the workflow's declared tools, so the LLM doesn't have
to reason over the full set.
"""
import logging
from typing import Any
from alfred.infrastructure.persistence_TO_CHECK import get_memory
from ..workflows_TO_CHECK import WorkflowLoader
logger = logging.getLogger(__name__)
_loader_cache: list[WorkflowLoader] = []
def _get_loader() -> WorkflowLoader:
"""Lazily build the module-level WorkflowLoader."""
if not _loader_cache:
_loader_cache.append(WorkflowLoader())
return _loader_cache[0]
def start_workflow(workflow_name: str, params: dict) -> dict[str, Any]:
"""See specs/start_workflow.yaml for full description."""
loader = _get_loader()
workflow = loader.get(workflow_name)
if workflow is None:
return {
"status": "error",
"error": "unknown_workflow",
"message": f"Workflow '{workflow_name}' not found",
"available_workflows": loader.names(),
}
memory = get_memory()
current = memory.stm.workflow.current
if current is not None:
return {
"status": "error",
"error": "workflow_already_active",
"message": (
f"Workflow '{current.get('name')}' is already active. "
"Call end_workflow before starting a new one."
),
"active_workflow": current.get("name"),
}
memory.stm.start_workflow(workflow_name, params or {})
memory.save()
logger.info(f"start_workflow: '{workflow_name}' with params={params}")
return {
"status": "ok",
"workflow": workflow_name,
"description": workflow.get("description", ""),
"steps": workflow.get("steps", []),
"tools": workflow.get("tools", []),
}
def end_workflow(reason: str) -> dict[str, Any]:
"""See specs/end_workflow.yaml for full description."""
memory = get_memory()
current = memory.stm.workflow.current
if current is None:
return {
"status": "error",
"error": "no_active_workflow",
"message": "No workflow is currently active.",
}
workflow_name = current.get("name")
memory.stm.end_workflow()
memory.save()
logger.info(f"end_workflow: '{workflow_name}' reason={reason!r}")
return {
"status": "ok",
"workflow": workflow_name,
"reason": reason,
}
@@ -0,0 +1,3 @@
from .loader import WorkflowLoader
__all__ = ["WorkflowLoader"]
+52
View File
@@ -0,0 +1,52 @@
"""WorkflowLoader — autodiscovers and loads workflow YAML files.
Scans the workflows/ directory for all .yaml files and exposes them
as dicts. No manual registration needed — drop a new .yaml file and
it will be picked up automatically.
"""
import logging
from pathlib import Path
import yaml
logger = logging.getLogger(__name__)
_WORKFLOWS_DIR = Path(__file__).parent
class WorkflowLoader:
"""
Loads all workflow definitions from the workflows/ directory.
Usage:
loader = WorkflowLoader()
all_workflows = loader.all()
workflow = loader.get("media.organize_media")
"""
def __init__(self):
self._workflows: dict[str, dict] = {}
self._load()
def _load(self) -> None:
for path in sorted(_WORKFLOWS_DIR.glob("*.yaml")):
try:
data = yaml.safe_load(path.read_text(encoding="utf-8"))
name = data.get("name") or path.stem
self._workflows[name] = data
logger.info(f"WorkflowLoader: Loaded '{name}' from {path.name}")
except Exception as e:
logger.warning(f"WorkflowLoader: Could not load {path.name}: {e}")
def all(self) -> dict[str, dict]:
"""Return all loaded workflows keyed by name."""
return self._workflows
def get(self, name: str) -> dict | None:
"""Return a specific workflow by name, or None if not found."""
return self._workflows.get(name)
def names(self) -> list[str]:
"""Return all available workflow names."""
return list(self._workflows.keys())
@@ -0,0 +1,69 @@
name: media.manage_subtitles
description: >
Place subtitle files alongside a video that has just been organised into the library.
Detects the release pattern automatically, identifies and classifies all tracks,
filters by user rules, and hard-links matching files to the destination.
If any tracks are unrecognised, asks the user and optionally teaches Alfred.
trigger:
examples:
- "handle subtitles for The X-Files S01E01"
- "place the subs next to the file"
- "subtitles are in the Subs/ folder"
- "add subtitles"
tools:
- manage_subtitles
- learn
memory:
SubtitlePreferences: read
Workflow: read-write
steps:
- id: place_subtitles
tool: manage_subtitles
description: >
Detect release pattern, identify and classify all subtitle tracks,
filter by rules, hard-link matching files next to the destination video.
Reads SubtitlePreferences from LTM for language/type/format filtering.
params:
source_video: "{source_video}"
destination_video: "{destination_video}"
imdb_id: "{imdb_id}"
media_type: "{media_type}"
release_group: "{release_group}"
season: "{season}"
episode: "{episode}"
on_result:
ok_placed_zero: skip # no subtitles found — not an error
needs_clarification: ask_user # unrecognised tokens found
- id: ask_user
description: >
Some tracks could not be classified. Show the user the unresolved tokens
and ask if they want to teach Alfred what they mean.
If yes → go to learn_tokens. If no → end workflow.
ask_user:
question: >
I could not identify some tokens in the subtitle files: {unresolved}.
Do you want to teach me what they mean?
answers:
yes: { next_step: learn_tokens }
no: { next_step: end }
- id: learn_tokens
tool: learn
description: >
Persist a new token mapping to the learned knowledge pack so Alfred
recognises it in future scans without asking again.
params:
pack: "subtitles"
category: "{token_category}" # "languages" or "types"
key: "{token_key}" # e.g. "es", "de"
values: "{token_values}" # e.g. ["spanish", "espanol"]
subtitle_naming:
standard: "{lang}.{ext}"
sdh: "{lang}.sdh.{ext}"
forced: "{lang}.forced.{ext}"
@@ -0,0 +1,92 @@
name: media.organize_media
description: >
Organise a downloaded series or movie into the media library.
Triggered when the user asks to move/organize a specific title.
Always moves the video file. Optionally creates seed links in the
torrents folder so qBittorrent can keep seeding.
trigger:
examples:
- "organize Breaking Bad"
- "organise Severance season 2"
- "move Inception to my library"
- "organize Breaking Bad season 1, keep seeding"
tools:
- list_folder
- analyze_release
- probe_media
- find_media_imdb_id
- resolve_season_destination
- resolve_episode_destination
- resolve_movie_destination
- resolve_series_destination
- move_to_destination
- manage_subtitles
- create_seed_links
memory:
WorkspacePaths: read
LibraryPaths: read
Library: read-write
Workflow: read-write
Entities: read-write
steps:
- id: list_downloads
tool: list_folder
description: List the download folder to find the target files.
params:
folder_type: download
- id: analyze
tool: analyze_release
description: >
Parse the release name to detect media_type (movie / tv_season /
tv_episode / tv_complete) and extract season/episode info.
- id: identify_media
tool: find_media_imdb_id
description: Confirm canonical title and year via TMDB.
- id: resolve_destination
description: >
Call the resolver that matches media_type from analyze_release:
movie → resolve_movie_destination
tv_season → resolve_season_destination
tv_episode → resolve_episode_destination
tv_complete → resolve_series_destination
If the resolver returns needs_clarification, ask the user and
re-call with confirmed_folder.
- id: move_file
tool: move_to_destination
description: >
Move the video file/folder to the destination returned by the
resolver above.
- id: handle_subtitles
tool: manage_subtitles
description: >
Place subtitle files alongside the video in the library.
Pass the original source path and the new library destination path.
on_missing: skip
- id: ask_seeding
ask_user:
question: "Do you want to keep seeding this torrent?"
answers:
"yes": { next_step: create_seed_links }
"no": { next_step: end }
- id: create_seed_links
tool: create_seed_links
description: >
Hard-link the library video file back into torrents/<original_folder>/
and copy all remaining files from the original download folder
(subs, nfo, jpg, …) so the torrent stays complete for seeding.
naming_convention:
# Resolved by domain entities (Movie, Episode) — not hardcoded here
tv_show: "{title}/Season {season:02d}/{title}.S{season:02d}E{episode:02d}.{ext}"
movie: "{title} ({year})/{title}.{year}.{ext}"
+43 -31
View File
@@ -2,22 +2,21 @@
import json import json
import logging import logging
import os
import time import time
import uuid import uuid
from pathlib import Path
from typing import Any from typing import Any
from fastapi import FastAPI, HTTPException from fastapi import FastAPI, HTTPException
from fastapi.responses import JSONResponse, StreamingResponse from fastapi.responses import JSONResponse, StreamingResponse
from fastapi.staticfiles import StaticFiles
from pydantic import BaseModel, Field, validator from pydantic import BaseModel, Field, validator
from agent.agent import Agent from alfred.agent.agent import Agent
from agent.config import settings from alfred.agent.llm.deepseek import DeepSeekClient
from agent.llm.deepseek import DeepSeekClient from alfred.agent.llm.exceptions import LLMAPIError, LLMConfigurationError
from agent.llm.exceptions import LLMAPIError, LLMConfigurationError from alfred.agent.llm.ollama import OllamaClient
from agent.llm.ollama import OllamaClient from alfred.infrastructure.persistence_TO_CHECK import get_memory, init_memory
from infrastructure.persistence import get_memory, init_memory from alfred.settings import settings
logging.basicConfig( logging.basicConfig(
level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s" level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
@@ -30,38 +29,51 @@ app = FastAPI(
version="0.2.0", version="0.2.0",
) )
# TODO: Make a variable memory_path = Path(settings.data_storage_dir) / "memory"
manifests = "manifests" init_memory(storage_dir=str(memory_path))
# Sécurité : on vérifie que le dossier existe pour ne pas faire planter l'app au démarrage logger.info(f"Memory context initialized (path: {memory_path})")
if os.path.exists(manifests):
app.mount("/manifests", StaticFiles(directory=manifests), name="manifests")
else:
print(
f"⚠️ ATTENTION : Le dossier '{manifests}' est introuvable. Le plugin ne marchera pas."
)
# Initialize memory context at startup
# Use /data/memory in Docker, fallback to memory_data for local dev
storage_dir = os.getenv("MEMORY_STORAGE_DIR", "memory_data")
init_memory(storage_dir=storage_dir)
logger.info(f"Memory context initialized (storage: {storage_dir})")
# Initialize LLM based on environment variable # Initialize LLM based on environment variable
llm_provider = os.getenv("LLM_PROVIDER", "deepseek").lower() llm_provider = settings.default_llm_provider.lower()
class _UnconfiguredLLM:
"""Placeholder LLM used when no provider could be configured at import time.
Importing the FastAPI app must not fail just because credentials are
absent (e.g. during test collection). Any actual call surfaces a clear
503 error at request time via the handlers below.
"""
def __init__(self, reason: str):
self.reason = reason
def complete(self, *args, **kwargs):
raise LLMAPIError(f"LLM is not configured: {self.reason}")
try: try:
if llm_provider == "ollama": if llm_provider == "local":
logger.info("Using Ollama LLM") logger.info("Using local Ollama LLM")
llm = OllamaClient() llm = OllamaClient(settings=settings)
else: elif llm_provider == "deepseek":
logger.info("Using DeepSeek LLM") logger.info("Using DeepSeek LLM")
llm = DeepSeekClient() llm = DeepSeekClient()
elif llm_provider == "claude":
raise ValueError(f"LLM provider not fully implemented: {llm_provider}")
else:
raise ValueError(f"Unknown LLM provider: {llm_provider}")
except LLMConfigurationError as e: except LLMConfigurationError as e:
# Degrade gracefully: keep the app importable so tests can patch agent.step
# and so missing credentials surface as a 503 at the endpoint, not as an
# import error.
logger.error(f"Failed to initialize LLM: {e}") logger.error(f"Failed to initialize LLM: {e}")
raise llm = _UnconfiguredLLM(str(e))
# Initialize agent # Initialize agent
agent = Agent(llm=llm, max_tool_iterations=settings.max_tool_iterations) agent = Agent(
settings=settings, llm=llm, max_tool_iterations=settings.max_tool_iterations
)
logger.info("Agent Media API initialized") logger.info("Agent Media API initialized")
@@ -116,7 +128,7 @@ def extract_last_user_content(messages: list[dict[str, Any]]) -> str:
@app.get("/health") @app.get("/health")
async def health_check(): async def health_check():
"""Health check endpoint.""" """Health check endpoint."""
return {"status": "healthy", "version": "0.2.0"} return {"status": "healthy", "version": f"v{settings.alfred_version}"}
@app.get("/v1/models") @app.get("/v1/models")
+26
View File
@@ -0,0 +1,26 @@
"""Application-layer exceptions shared across orchestrators.
Kept in a dedicated module (rather than inside each orchestrator's
file) because the sync flows for TV shows and movies raise structurally
identical "not found in library" errors — pulling them out makes the
shared semantics explicit and avoids cross-imports between the
``tv_shows`` and ``movies`` packages.
"""
from __future__ import annotations
class ShowNotFoundInLibrary(LookupError):
"""Raised when no on-disk TV show carries the requested ``tmdb_id``.
The sync orchestrator raises this when both the library index and
the per-show release repository return ``None`` for a lookup —
there is nothing on disk to refresh TMDB facts against.
"""
class MovieNotFoundInLibrary(LookupError):
"""Raised when no on-disk movie carries the requested ``tmdb_id``.
Symmetric to :class:`ShowNotFoundInLibrary` for the movies library.
"""
+42
View File
@@ -0,0 +1,42 @@
"""Filesystem application layer — 5 atomic use cases as free functions.
Each use case:
- accepts :class:`pathlib.Path` inputs plus a :class:`DirectoryRoots` VO,
- guards inputs against escaping configured roots,
- calls the matching infra op,
- catches :class:`~alfred.infrastructure.filesystem.FilesystemError` and
returns a frozen DTO with a normalized error code.
No global state, no ``get_memory()``. Roots are injected.
"""
from .create_dir import create_dir_use_case
from .directory_roots import DirectoryRoots
from .dto import (
CreateDirResponse,
LinkFileResponse,
ListDirResponse,
MoveDirResponse,
MoveFileResponse,
)
from .link_file import link_file_use_case
from .list_dir import list_dir_use_case
from .move_dir import move_dir_use_case
from .move_file import move_file_use_case
__all__ = [
# use cases
"list_dir_use_case",
"create_dir_use_case",
"link_file_use_case",
"move_file_use_case",
"move_dir_use_case",
# VO
"DirectoryRoots",
# DTOs
"ListDirResponse",
"CreateDirResponse",
"LinkFileResponse",
"MoveFileResponse",
"MoveDirResponse",
]
+41
View File
@@ -0,0 +1,41 @@
"""Internal helpers: mapping infra exceptions → error codes.
Kept private (``_errors``) — only the 5 use cases in this package use
it. Centralizes the exception → code translation so every use case
returns consistent error payloads.
"""
from __future__ import annotations
from alfred.infrastructure.filesystem import (
CrossDevice,
DestinationExists,
FilesystemError,
FilesystemOSError,
NotADirectory,
NotAFile,
PermissionDenied,
SourceNotFound,
)
# Application-layer error codes (guard violations, not infra).
PATH_NOT_ALLOWED = "path_not_allowed"
def code_for(exc: FilesystemError) -> str:
"""Return the snake-case error code for an infra exception."""
if isinstance(exc, SourceNotFound):
return "source_not_found"
if isinstance(exc, DestinationExists):
return "destination_exists"
if isinstance(exc, NotADirectory):
return "not_a_directory"
if isinstance(exc, NotAFile):
return "not_a_file"
if isinstance(exc, PermissionDenied):
return "permission_denied"
if isinstance(exc, CrossDevice):
return "cross_device"
if isinstance(exc, FilesystemOSError):
return "filesystem_os_error"
return "filesystem_error"
@@ -0,0 +1,33 @@
"""create_dir use case — create a directory under one of the configured roots."""
from __future__ import annotations
from pathlib import Path
from alfred.infrastructure.filesystem import FilesystemError, create_dir
from ._errors import PATH_NOT_ALLOWED, code_for
from .directory_roots import DirectoryRoots
from .dto import CreateDirResponse
def create_dir_use_case(path: Path, roots: DirectoryRoots) -> CreateDirResponse:
"""Create directory ``path`` (and any missing parents) provided it
lives under one of the configured roots.
Idempotent on the infra side: re-running on an existing directory
returns ``status="ok"``.
"""
if not roots.contains(path):
return CreateDirResponse(
status="error",
error=PATH_NOT_ALLOWED,
message=f"Path is outside configured roots: {path}",
)
try:
create_dir(path)
except FilesystemError as e:
return CreateDirResponse(status="error", error=code_for(e), message=str(e))
return CreateDirResponse(status="ok", path=path)
@@ -0,0 +1,54 @@
"""CreateSeedLinksUseCase — prepares a torrent folder for continued seeding."""
import logging
from alfred.infrastructure.filesystem import FileManager
from alfred.infrastructure.persistence_TO_CHECK import get_memory
from .dto import CreateSeedLinksResponse
logger = logging.getLogger(__name__)
class CreateSeedLinksUseCase:
"""
Prepares a torrent subfolder so qBittorrent can keep seeding after a move.
Hard-links the video file from the library back into torrents/<original_folder>/,
then copies all remaining files from the original download folder (subs, nfo, …).
"""
def __init__(self, file_manager: FileManager):
self.file_manager = file_manager
def execute(
self, library_file: str, original_download_folder: str
) -> CreateSeedLinksResponse:
memory = get_memory()
torrent_folder = memory.ltm.workspace.torrent
if not torrent_folder:
return CreateSeedLinksResponse(
status="error",
error="torrent_folder_not_set",
message="Torrent folder is not configured. Use set_path_for_folder to set it.",
)
result = self.file_manager.create_seed_links(
library_file, original_download_folder, torrent_folder
)
if result.get("status") == "ok":
return CreateSeedLinksResponse(
status="ok",
torrent_subfolder=result.get("torrent_subfolder"),
linked_file=result.get("linked_file"),
copied_files=result.get("copied_files"),
copied_count=result.get("copied_count", 0),
skipped=result.get("skipped"),
)
return CreateSeedLinksResponse(
status="error",
error=result.get("error"),
message=result.get("message"),
)
@@ -0,0 +1,56 @@
"""DirectoryRoots — VO carrying the configured filesystem roots.
Replaces the ad-hoc ``get_memory().ltm.workspace.<x>`` lookups that were
sprinkled across the filesystem use cases. By making roots an explicit
input, use cases become pure (no global state read) and easy to test.
The roots are read once at the tool wrapper boundary (where the agent
config lives) and threaded through the use cases.
"""
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
@dataclass(frozen=True)
class DirectoryRoots:
"""Configured roots of Alfred's filesystem.
All paths must be absolute and existing directories — validation is
expected at the boundary that builds this VO.
Attributes:
downloads: where qBittorrent drops finished torrents.
torrents: where seeding hard-links live (mirrors downloads/).
movies: library root for movies.
tv_shows: library root for TV shows.
"""
downloads: Path
torrents: Path
movies: Path
tv_shows: Path
def all(self) -> tuple[Path, ...]:
"""Return every configured root, in declaration order."""
return (self.downloads, self.torrents, self.movies, self.tv_shows)
def contains(self, path: Path) -> bool:
"""Return True if ``path`` is inside one of the configured roots.
Uses ``Path.resolve()`` to handle symlinks and ``..`` segments,
then ``relative_to`` for an exact within-root check.
"""
try:
resolved = path.resolve()
except OSError:
return False
for root in self.all():
try:
resolved.relative_to(root.resolve())
return True
except (ValueError, OSError):
continue
return False
+111
View File
@@ -0,0 +1,111 @@
"""DTOs for the 5 atomic filesystem use cases.
Each use case returns a small frozen dataclass tagged with a ``status``
field. On error, ``error`` (machine-readable code) and ``message``
(human-readable) are populated; on success, the relevant payload
fields are.
Error codes mirror the infrastructure exception types (lowercased,
snake-cased) — e.g. ``SourceNotFound`` → ``"source_not_found"`` — plus
the application-layer ``"path_not_allowed"`` for guard violations.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from pathlib import Path
@dataclass(frozen=True)
class ListDirResponse:
"""Response from ``list_dir_use_case``."""
status: str # "ok" | "error"
path: Path | None = None
entries: tuple[Path, ...] = ()
error: str | None = None
message: str | None = None
def to_dict(self) -> dict:
if self.error:
return {"status": self.status, "error": self.error, "message": self.message}
return {
"status": self.status,
"path": str(self.path) if self.path else None,
"entries": [str(p) for p in self.entries],
}
@dataclass(frozen=True)
class CreateDirResponse:
"""Response from ``create_dir_use_case``."""
status: str
path: Path | None = None
error: str | None = None
message: str | None = None
def to_dict(self) -> dict:
if self.error:
return {"status": self.status, "error": self.error, "message": self.message}
return {"status": self.status, "path": str(self.path) if self.path else None}
@dataclass(frozen=True)
class LinkFileResponse:
"""Response from ``link_file_use_case``."""
status: str
source: Path | None = None
destination: Path | None = None
error: str | None = None
message: str | None = None
def to_dict(self) -> dict:
if self.error:
return {"status": self.status, "error": self.error, "message": self.message}
return {
"status": self.status,
"source": str(self.source) if self.source else None,
"destination": str(self.destination) if self.destination else None,
}
@dataclass(frozen=True)
class MoveFileResponse:
"""Response from ``move_file_use_case``."""
status: str
source: Path | None = None
destination: Path | None = None
error: str | None = None
message: str | None = None
def to_dict(self) -> dict:
if self.error:
return {"status": self.status, "error": self.error, "message": self.message}
return {
"status": self.status,
"source": str(self.source) if self.source else None,
"destination": str(self.destination) if self.destination else None,
}
@dataclass(frozen=True)
class MoveDirResponse:
"""Response from ``move_dir_use_case``."""
status: str
source: Path | None = None
destination: Path | None = None
error: str | None = None
message: str | None = None
def to_dict(self) -> dict:
if self.error:
return {"status": self.status, "error": self.error, "message": self.message}
return {
"status": self.status,
"source": str(self.source) if self.source else None,
"destination": str(self.destination) if self.destination else None,
}
+188
View File
@@ -0,0 +1,188 @@
"""Filesystem application DTOs."""
from __future__ import annotations
from dataclasses import dataclass
@dataclass
class CopyMediaResponse:
"""Response from copying a media file."""
status: str
source: str | None = None
destination: str | None = None
filename: str | None = None
size: int | None = None
error: str | None = None
message: str | None = None
def to_dict(self) -> dict:
if self.error:
return {"status": self.status, "error": self.error, "message": self.message}
return {
"status": self.status,
"source": self.source,
"destination": self.destination,
"filename": self.filename,
"size": self.size,
}
@dataclass
class MoveMediaResponse:
"""Response from moving a media file."""
status: str
source: str | None = None
destination: str | None = None
filename: str | None = None
size: int | None = None
error: str | None = None
message: str | None = None
def to_dict(self) -> dict:
if self.error:
return {"status": self.status, "error": self.error, "message": self.message}
return {
"status": self.status,
"source": self.source,
"destination": self.destination,
"filename": self.filename,
"size": self.size,
}
@dataclass
class PlacedSubtitle:
"""One subtitle file successfully placed."""
source: str
destination: str
filename: str
def to_dict(self) -> dict:
return {
"source": self.source,
"destination": self.destination,
"filename": self.filename,
}
@dataclass
class UnresolvedTrack:
"""A subtitle track that needs agent clarification before placement."""
raw_tokens: list[str]
file_path: str | None = None
file_size_kb: float | None = None
reason: str = "" # "unknown_language" | "low_confidence"
def to_dict(self) -> dict:
return {
"raw_tokens": self.raw_tokens,
"file_path": self.file_path,
"file_size_kb": self.file_size_kb,
"reason": self.reason,
}
@dataclass
class AvailableSubtitle:
"""One subtitle track available on an embedded media item."""
language: str # ISO 639-2 code
subtitle_type: str # "standard" | "sdh" | "forced" | "unknown"
def to_dict(self) -> dict:
return {"language": self.language, "type": self.subtitle_type}
@dataclass
class ManageSubtitlesResponse:
"""Response from the manage_subtitles use case."""
status: str # "ok" | "needs_clarification" | "error"
video_path: str | None = None
placed: list[PlacedSubtitle] | None = None
skipped_count: int = 0
unresolved: list[UnresolvedTrack] | None = None
available: list[AvailableSubtitle] | None = None # embedded tracks summary
error: str | None = None
message: str | None = None
def to_dict(self) -> dict:
if self.error:
return {"status": self.status, "error": self.error, "message": self.message}
result = {
"status": self.status,
"video_path": self.video_path,
"placed": [p.to_dict() for p in (self.placed or [])],
"placed_count": len(self.placed or []),
"skipped_count": self.skipped_count,
}
if self.unresolved:
result["unresolved"] = [u.to_dict() for u in self.unresolved]
result["unresolved_count"] = len(self.unresolved)
if self.available:
result["available"] = [a.to_dict() for a in self.available]
return result
@dataclass
class CreateSeedLinksResponse:
"""Response from creating seed links for a torrent."""
status: str
torrent_subfolder: str | None = None
linked_file: str | None = None
copied_files: list[str] | None = None
copied_count: int = 0
skipped: list[str] | None = None
error: str | None = None
message: str | None = None
def to_dict(self) -> dict:
if self.error:
return {"status": self.status, "error": self.error, "message": self.message}
return {
"status": self.status,
"torrent_subfolder": self.torrent_subfolder,
"linked_file": self.linked_file,
"copied_files": self.copied_files or [],
"copied_count": self.copied_count,
"skipped": self.skipped or [],
}
@dataclass
class ListFolderResponse:
"""Response from listing a folder."""
status: str
folder_type: str | None = None # SHOULD BE A PROPERTY
path: str | None = None # NOT NONE - Should be path
entries: list[str] | None = None # NOT NONE - Empty list of path
count: int | None = None # USELESS
error: str | None = None
message: str | None = None
def to_dict(self):
"""Convert to dict for agent compatibility."""
result = {"status": self.status}
if self.error:
result["error"] = self.error
result["message"] = self.message
else:
if self.folder_type:
result["folder_type"] = self.folder_type
if self.path:
result["path"] = self.path
if self.entries is not None:
result["entries"] = self.entries
if self.count is not None:
result["count"] = self.count
return result
@@ -0,0 +1,40 @@
"""link_file use case — hard-link a file from one root to another."""
from __future__ import annotations
from pathlib import Path
from alfred.infrastructure.filesystem import FilesystemError, link_file
from ._errors import PATH_NOT_ALLOWED, code_for
from .directory_roots import DirectoryRoots
from .dto import LinkFileResponse
def link_file_use_case(
src: Path, dst: Path, roots: DirectoryRoots
) -> LinkFileResponse:
"""Hard-link ``src`` to ``dst``. Both must be under configured roots.
The destination parent must already exist — the caller is expected
to have created it via ``create_dir_use_case`` if needed.
"""
if not roots.contains(src):
return LinkFileResponse(
status="error",
error=PATH_NOT_ALLOWED,
message=f"Source is outside configured roots: {src}",
)
if not roots.contains(dst):
return LinkFileResponse(
status="error",
error=PATH_NOT_ALLOWED,
message=f"Destination is outside configured roots: {dst}",
)
try:
link_file(src, dst)
except FilesystemError as e:
return LinkFileResponse(status="error", error=code_for(e), message=str(e))
return LinkFileResponse(status="ok", source=src, destination=dst)
+34
View File
@@ -0,0 +1,34 @@
"""list_dir use case — list a directory after guarding it within roots."""
from __future__ import annotations
from pathlib import Path
from alfred.infrastructure.filesystem import FilesystemError, list_dir
from ._errors import PATH_NOT_ALLOWED, code_for
from .directory_roots import DirectoryRoots
from .dto import ListDirResponse
def list_dir_use_case(path: Path, roots: DirectoryRoots) -> ListDirResponse:
"""List the immediate children of ``path`` if it lives under one of
the configured roots.
Returns a :class:`ListDirResponse`. On guard failure, status is
``"error"`` with ``error="path_not_allowed"``. On infra failure,
status is ``"error"`` with a code mapped from the raised exception.
"""
if not roots.contains(path):
return ListDirResponse(
status="error",
error=PATH_NOT_ALLOWED,
message=f"Path is outside configured roots: {path}",
)
try:
entries = list_dir(path)
except FilesystemError as e:
return ListDirResponse(status="error", error=code_for(e), message=str(e))
return ListDirResponse(status="ok", path=path, entries=tuple(entries))
@@ -2,7 +2,7 @@
import logging import logging
from infrastructure.filesystem import FileManager from alfred.infrastructure.filesystem import FileManager
from .dto import ListFolderResponse from .dto import ListFolderResponse
@@ -0,0 +1,308 @@
"""ManageSubtitlesUseCase — orchestrates the full subtitle pipeline for a video file."""
import logging
from pathlib import Path
from alfred.application.subtitles_TO_CHECK.placer import (
PlacedTrack,
SubtitlePlacer,
_build_dest_name,
)
from alfred.domain.shared_TO_CHECK.value_objects import ImdbId
from alfred.domain.subtitles_TO_CHECK.entities import SubtitleScanResult
from alfred.domain.subtitles_TO_CHECK.services.identifier import SubtitleIdentifier
from alfred.domain.subtitles_TO_CHECK.services.matcher import SubtitleMatcher
from alfred.domain.subtitles_TO_CHECK.services.pattern_detector import PatternDetector
from alfred.domain.subtitles_TO_CHECK.services.utils import available_subtitles
from alfred.domain.subtitles_TO_CHECK.value_objects import ScanStrategy
from alfred.infrastructure.filesystem.scanner import PathlibFilesystemScanner
from alfred.infrastructure.knowledge_TO_CHECK.subtitles.base import SubtitleKnowledgeBase
from alfred.infrastructure.knowledge_TO_CHECK.subtitles.loader import KnowledgeLoader
from alfred.infrastructure.persistence_TO_CHECK.context import get_memory
from alfred.infrastructure.probe_TO_CHECK.ffprobe_prober import FfprobeMediaProber
from alfred.infrastructure.subtitle_TO_CHECK.metadata_store import SubtitleMetadataStore
from alfred.infrastructure.subtitle_TO_CHECK.rule_repository import RuleSetRepository
from .dto import (
AvailableSubtitle,
ManageSubtitlesResponse,
PlacedSubtitle,
UnresolvedTrack,
)
logger = logging.getLogger(__name__)
def _infer_library_root(dest_video: Path, media_type: str) -> Path:
"""
Infer the media library root folder from the destination video path.
TV show: video → Season 01 → The X-Files (3 levels up)
Movie: video → Inception (2010) (1 level up)
"""
if media_type == "tv_show":
return dest_video.parent.parent
return dest_video.parent
def _to_imdb_id(raw: str | None) -> ImdbId | None:
if not raw:
return None
try:
return ImdbId(raw)
except Exception:
return None
class ManageSubtitlesUseCase:
"""
Full subtitle pipeline:
1. Load knowledge base
2. Detect (or confirm) the release pattern
3. Identify all tracks (ffprobe + filesystem scan)
4. Load + resolve rules for this media
5. Match tracks against rules
6. If any tracks are unresolved → return needs_clarification (don't place yet)
7. Place matched tracks via hard-link
8. Persist to .alfred/metadata.yaml
The use case is stateless — all dependencies are instantiated inline.
"""
def execute(
self,
source_video: str,
destination_video: str,
imdb_id: str | None = None,
media_type: str = "tv_show",
release_group: str | None = None,
season: int | None = None,
episode: int | None = None,
confirmed_pattern_id: str | None = None,
dry_run: bool = False,
) -> ManageSubtitlesResponse:
source_path = Path(source_video)
dest_path = Path(destination_video)
if not source_path.exists() and not source_path.parent.exists():
return ManageSubtitlesResponse(
status="error",
error="source_not_found",
message=f"Source video not found: {source_video}",
)
kb = SubtitleKnowledgeBase(KnowledgeLoader())
prober = FfprobeMediaProber()
scanner = PathlibFilesystemScanner()
library_root = _infer_library_root(dest_path, media_type)
store = SubtitleMetadataStore(library_root)
repo = RuleSetRepository(library_root)
# --- Pattern resolution ---
pattern = self._resolve_pattern(
kb,
prober,
scanner,
store,
source_path,
confirmed_pattern_id,
release_group,
)
if pattern is None:
return ManageSubtitlesResponse(
status="error",
error="pattern_not_found",
message="Could not determine subtitle pattern for this release.",
)
# --- Identify ---
media_id = _to_imdb_id(imdb_id)
identifier = SubtitleIdentifier(kb, prober, scanner)
metadata = identifier.identify(
video_path=source_path,
pattern=pattern,
media_id=media_id,
media_type=media_type,
release_group=release_group,
)
if metadata.total_count == 0:
logger.info(
f"ManageSubtitles: no subtitle tracks found for {source_path.name}"
)
return ManageSubtitlesResponse(
status="ok",
video_path=destination_video,
placed=[],
skipped_count=0,
)
# --- Embedded short-circuit ---
if pattern.scan_strategy == ScanStrategy.EMBEDDED:
logger.info("ManageSubtitles: embedded pattern — skipping matcher")
available = [
AvailableSubtitle(
language=t.language.code if t.language else "?",
subtitle_type=t.subtitle_type.value,
)
for t in available_subtitles(metadata.embedded_tracks)
]
return ManageSubtitlesResponse(
status="ok",
video_path=destination_video,
placed=[],
skipped_count=0,
available=available,
)
# --- Match (external only) ---
subtitle_prefs = None
try:
memory = get_memory()
subtitle_prefs = memory.ltm.subtitle_preferences
except Exception:
pass
rules = repo.load(release_group, subtitle_prefs).resolve(kb.default_rules())
matcher = SubtitleMatcher()
matched, unresolved = matcher.match(metadata.external_tracks, rules)
if unresolved:
logger.info(
f"ManageSubtitles: {len(unresolved)} unresolved track(s) — needs clarification"
)
return ManageSubtitlesResponse(
status="needs_clarification",
video_path=destination_video,
placed=[],
unresolved=[_to_unresolved_dto(t) for t in unresolved],
)
if not matched:
return ManageSubtitlesResponse(
status="ok",
video_path=destination_video,
placed=[],
skipped_count=metadata.total_count,
)
# --- Dry run: skip placement ---
if dry_run:
placed_dtos = []
for t in matched:
if not t.file_path:
continue
try:
filename = _build_dest_name(t, dest_path.stem)
except ValueError:
continue
placed_dtos.append(
PlacedSubtitle(
source=str(t.file_path),
destination=str(dest_path.parent / filename),
filename=filename,
)
)
return ManageSubtitlesResponse(
status="ok",
video_path=destination_video,
placed=placed_dtos,
skipped_count=0,
)
# --- Place ---
placer = SubtitlePlacer()
place_result = placer.place(matched, dest_path)
# --- Persist ---
if place_result.placed:
pairs = _pair_placed_with_tracks(place_result.placed, matched)
store.append_history(pairs, season, episode, release_group)
placed_dtos = [
PlacedSubtitle(
source=str(p.source),
destination=str(p.destination),
filename=p.filename,
)
for p in place_result.placed
]
return ManageSubtitlesResponse(
status="ok",
video_path=destination_video,
placed=placed_dtos,
skipped_count=place_result.skipped_count,
)
def _resolve_pattern(
self,
kb: SubtitleKnowledgeBase,
prober: FfprobeMediaProber,
scanner: PathlibFilesystemScanner,
store: SubtitleMetadataStore,
source_path: Path,
confirmed_pattern_id: str | None,
release_group: str | None,
):
# 1. Explicit override from caller
if confirmed_pattern_id:
p = kb.pattern(confirmed_pattern_id)
if p:
return p
logger.warning(f"ManageSubtitles: unknown pattern '{confirmed_pattern_id}'")
# 2. Previously confirmed in metadata store
stored_id = store.confirmed_pattern()
if stored_id:
p = kb.pattern(stored_id)
if p:
logger.debug(f"ManageSubtitles: using confirmed pattern '{stored_id}'")
return p
# 3. Auto-detect
release_root = source_path.parent
detector = PatternDetector(kb, prober, scanner)
result = detector.detect(release_root, source_path)
if result["detected"] and result["confidence"] >= 0.6:
logger.info(
f"ManageSubtitles: auto-detected pattern '{result['detected'].id}' "
f"(confidence={result['confidence']:.2f})"
)
return result["detected"]
# 4. Fallback — adjacent (safest default)
logger.info("ManageSubtitles: falling back to 'adjacent' pattern")
return kb.pattern("adjacent")
def _to_unresolved_dto(
track: SubtitleScanResult, min_confidence: float = 0.7
) -> UnresolvedTrack:
reason = "unknown_language" if track.language is None else "low_confidence"
return UnresolvedTrack(
raw_tokens=track.raw_tokens,
file_path=str(track.file_path) if track.file_path else None,
file_size_kb=track.file_size_kb,
reason=reason,
)
def _pair_placed_with_tracks(
placed: list[PlacedTrack],
tracks: list[SubtitleScanResult],
) -> list[tuple[PlacedTrack, SubtitleScanResult]]:
"""
Pair each PlacedTrack with its originating SubtitleScanResult by source path.
Falls back to positional matching if paths don't align.
"""
track_by_path = {t.file_path: t for t in tracks if t.file_path}
pairs = []
for p in placed:
track = track_by_path.get(p.source)
if track is None and tracks:
track = tracks[0] # positional fallback
if track:
pairs.append((p, track))
return pairs
+36
View File
@@ -0,0 +1,36 @@
"""move_dir use case — move a directory tree between configured roots."""
from __future__ import annotations
from pathlib import Path
from alfred.infrastructure.filesystem import FilesystemError, move_dir
from ._errors import PATH_NOT_ALLOWED, code_for
from .directory_roots import DirectoryRoots
from .dto import MoveDirResponse
def move_dir_use_case(
src: Path, dst: Path, roots: DirectoryRoots
) -> MoveDirResponse:
"""Move directory ``src`` to ``dst``. Both must be under configured roots."""
if not roots.contains(src):
return MoveDirResponse(
status="error",
error=PATH_NOT_ALLOWED,
message=f"Source is outside configured roots: {src}",
)
if not roots.contains(dst):
return MoveDirResponse(
status="error",
error=PATH_NOT_ALLOWED,
message=f"Destination is outside configured roots: {dst}",
)
try:
move_dir(src, dst)
except FilesystemError as e:
return MoveDirResponse(status="error", error=code_for(e), message=str(e))
return MoveDirResponse(status="ok", source=src, destination=dst)
@@ -0,0 +1,36 @@
"""move_file use case — move a file between configured roots."""
from __future__ import annotations
from pathlib import Path
from alfred.infrastructure.filesystem import FilesystemError, move_file
from ._errors import PATH_NOT_ALLOWED, code_for
from .directory_roots import DirectoryRoots
from .dto import MoveFileResponse
def move_file_use_case(
src: Path, dst: Path, roots: DirectoryRoots
) -> MoveFileResponse:
"""Move file ``src`` to ``dst``. Both must be under configured roots."""
if not roots.contains(src):
return MoveFileResponse(
status="error",
error=PATH_NOT_ALLOWED,
message=f"Source is outside configured roots: {src}",
)
if not roots.contains(dst):
return MoveFileResponse(
status="error",
error=PATH_NOT_ALLOWED,
message=f"Destination is outside configured roots: {dst}",
)
try:
move_file(src, dst)
except FilesystemError as e:
return MoveFileResponse(status="error", error=code_for(e), message=str(e))
return MoveFileResponse(status="ok", source=src, destination=dst)
@@ -0,0 +1,43 @@
"""Move media use case."""
import logging
from alfred.infrastructure.filesystem import FileManager
from .dto import MoveMediaResponse
logger = logging.getLogger(__name__)
class MoveMediaUseCase:
"""Use case for moving a media file to a destination (copy + delete source)."""
def __init__(self, file_manager: FileManager):
self.file_manager = file_manager
def execute(self, source: str, destination: str) -> MoveMediaResponse:
"""
Move a media file from source to destination.
Args:
source: Absolute path to the source file.
destination: Absolute path to the destination file.
Returns:
MoveMediaResponse with success or error information.
"""
result = self.file_manager.move_file(source, destination)
if result.get("status") == "ok":
return MoveMediaResponse(
status="ok",
source=result.get("source"),
destination=result.get("destination"),
filename=result.get("filename"),
size=result.get("size"),
)
return MoveMediaResponse(
status="error",
error=result.get("error"),
message=result.get("message"),
)
@@ -0,0 +1,464 @@
"""
Destination resolution — compute library paths for releases.
Four distinct use cases, one per release type:
- resolve_season_destination : season pack (folder move)
- resolve_episode_destination : single episode (file move)
- resolve_movie_destination : movie (file move)
- resolve_series_destination : complete series multi-season pack (folder move)
Each returns a dedicated DTO with only the fields that make sense for that type.
These use cases follow Option B of the snapshot-VO design: ``ParsedRelease``
arrives with ``title_sanitized`` already computed, and TMDB-supplied strings
are sanitized **at the use-case boundary** (here) before being passed into
``ParsedRelease`` builder methods. The builders themselves perform no I/O and
no sanitization.
"""
from __future__ import annotations
import logging
from dataclasses import dataclass
from pathlib import Path
from alfred.application.release_TO_CHECK import inspect_release
from alfred.domain.release import parse_release
from alfred.domain.releases_TO_CHECK.ports import ReleaseKnowledge
from alfred.domain.release.value_objects import ParsedRelease
from alfred.domain.shared_TO_CHECK.ports import MediaProber
from alfred.infrastructure.persistence_TO_CHECK import get_memory
logger = logging.getLogger(__name__)
def _resolve_parsed(
release_name: str,
source_path: str | None,
kb: ReleaseKnowledge,
prober: MediaProber,
) -> ParsedRelease:
"""Pick the right entry point depending on whether we have a path.
When ``source_path`` is provided and points to something that exists,
we run the full inspection pipeline so probe data can refresh tech
fields (which feed every filename builder). Otherwise we fall back
to a parse-only path — same behavior as before.
"""
if source_path:
path = Path(source_path)
if path.exists():
return inspect_release(release_name, path, kb, prober).parsed
parsed, _ = parse_release(release_name, kb)
return parsed
def _find_existing_tvshow_folders(
tv_root: Path, tmdb_title_safe: str, tmdb_year: int
) -> list[str]:
"""Return folder names in tv_root that match title + year prefix."""
if not tv_root.exists():
return []
clean_title = tmdb_title_safe.replace(" ", ".")
prefix = f"{clean_title}.{tmdb_year}".lower()
return sorted(
entry.name
for entry in tv_root.iterdir()
if entry.is_dir() and entry.name.lower().startswith(prefix)
)
def _get_tv_root() -> Path | None:
memory = get_memory()
tv_root = memory.ltm.library_paths.get("tv_show")
return Path(tv_root) if tv_root else None
# ---------------------------------------------------------------------------
# Internal sentinel + series-folder resolver (shared by the 3 TV use cases)
# ---------------------------------------------------------------------------
@dataclass
class _Clarification:
"""Module-private sentinel signalling that user input is needed."""
question: str
options: list[str]
def _resolve_series_folder(
tv_root: Path,
tmdb_title: str,
tmdb_title_safe: str,
tmdb_year: int,
computed_name: str,
confirmed_folder: str | None,
) -> tuple[str, bool] | _Clarification:
"""
Resolve which series folder to use.
Returns:
(folder_name, is_new) if resolved unambiguously,
_Clarification(question, options) if the caller must ask the user.
"""
if confirmed_folder:
return confirmed_folder, not (tv_root / confirmed_folder).exists()
existing = _find_existing_tvshow_folders(tv_root, tmdb_title_safe, tmdb_year)
if not existing:
return computed_name, True
if len(existing) == 1 and existing[0] == computed_name:
return existing[0], False
options = existing + ([computed_name] if computed_name not in existing else [])
return _Clarification(
question=(
f"Un dossier série existe déjà pour '{tmdb_title}' "
f"mais son nom diffère du nom calculé ({computed_name}). "
f"Lequel utiliser ?"
),
options=options,
)
# ---------------------------------------------------------------------------
# DTOs
# ---------------------------------------------------------------------------
@dataclass
class _ResolvedDestinationBase:
"""
Shared shape across all resolution DTOs.
Holds the status flag and the fields used in non-ok states
(error / needs_clarification). Subclasses add their own ok-state fields
and a to_dict() that delegates the non-ok cases via _base_dict().
"""
status: str # "ok" | "needs_clarification" | "error"
# needs_clarification
question: str | None = None
options: list[str] | None = None
# error
error: str | None = None
message: str | None = None
def _base_dict(self) -> dict | None:
"""Return the dict for error/needs_clarification, or None for ok."""
if self.status == "error":
return {"status": self.status, "error": self.error, "message": self.message}
if self.status == "needs_clarification":
return {
"status": self.status,
"question": self.question,
"options": self.options or [],
}
return None
@dataclass
class ResolvedSeasonDestination(_ResolvedDestinationBase):
"""Paths for a season pack — folder move, no individual file paths."""
series_folder: str | None = None
season_folder: str | None = None
series_folder_name: str | None = None
season_folder_name: str | None = None
is_new_series_folder: bool = False
def to_dict(self) -> dict:
return self._base_dict() or {
"status": self.status,
"series_folder": self.series_folder,
"season_folder": self.season_folder,
"series_folder_name": self.series_folder_name,
"season_folder_name": self.season_folder_name,
"is_new_series_folder": self.is_new_series_folder,
}
@dataclass
class ResolvedEpisodeDestination(_ResolvedDestinationBase):
"""Paths for a single episode — file move."""
series_folder: str | None = None
season_folder: str | None = None
library_file: str | None = None # full path to destination .mkv
series_folder_name: str | None = None
season_folder_name: str | None = None
filename: str | None = None
is_new_series_folder: bool = False
def to_dict(self) -> dict:
return self._base_dict() or {
"status": self.status,
"series_folder": self.series_folder,
"season_folder": self.season_folder,
"library_file": self.library_file,
"series_folder_name": self.series_folder_name,
"season_folder_name": self.season_folder_name,
"filename": self.filename,
"is_new_series_folder": self.is_new_series_folder,
}
@dataclass
class ResolvedMovieDestination(_ResolvedDestinationBase):
"""Paths for a movie — file move."""
movie_folder: str | None = None
library_file: str | None = None
movie_folder_name: str | None = None
filename: str | None = None
is_new_folder: bool = False
def to_dict(self) -> dict:
return self._base_dict() or {
"status": self.status,
"movie_folder": self.movie_folder,
"library_file": self.library_file,
"movie_folder_name": self.movie_folder_name,
"filename": self.filename,
"is_new_folder": self.is_new_folder,
}
@dataclass
class ResolvedSeriesDestination(_ResolvedDestinationBase):
"""Paths for a complete multi-season series pack — folder move."""
series_folder: str | None = None
series_folder_name: str | None = None
is_new_series_folder: bool = False
def to_dict(self) -> dict:
return self._base_dict() or {
"status": self.status,
"series_folder": self.series_folder,
"series_folder_name": self.series_folder_name,
"is_new_series_folder": self.is_new_series_folder,
}
# ---------------------------------------------------------------------------
# Use cases
# ---------------------------------------------------------------------------
def resolve_season_destination(
release_name: str,
tmdb_title: str,
tmdb_year: int,
kb: ReleaseKnowledge,
prober: MediaProber,
confirmed_folder: str | None = None,
source_path: str | None = None,
) -> ResolvedSeasonDestination:
"""
Compute destination paths for a season pack.
Returns series_folder + season_folder. No file paths — the whole
source folder is moved as-is into season_folder.
When ``source_path`` points to the release on disk, the parser is
augmented with ffprobe data so tech tokens missing from the release
name (quality / codec) end up in the folder names.
"""
tv_root = _get_tv_root()
if not tv_root:
return ResolvedSeasonDestination(
status="error",
error="library_not_set",
message="TV show library path is not configured.",
)
parsed = _resolve_parsed(release_name, source_path, kb, prober)
tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)
resolved = _resolve_series_folder(
tv_root, tmdb_title, tmdb_title_safe, tmdb_year, computed_name, confirmed_folder
)
if isinstance(resolved, _Clarification):
return ResolvedSeasonDestination(
status="needs_clarification",
question=resolved.question,
options=resolved.options,
)
series_folder_name, is_new = resolved
season_folder_name = parsed.season_folder_name()
series_path = tv_root / series_folder_name
season_path = series_path / season_folder_name
return ResolvedSeasonDestination(
status="ok",
series_folder=str(series_path),
season_folder=str(season_path),
series_folder_name=series_folder_name,
season_folder_name=season_folder_name,
is_new_series_folder=is_new,
)
def resolve_episode_destination(
release_name: str,
source_file: str,
tmdb_title: str,
tmdb_year: int,
kb: ReleaseKnowledge,
prober: MediaProber,
tmdb_episode_title: str | None = None,
confirmed_folder: str | None = None,
) -> ResolvedEpisodeDestination:
"""
Compute destination paths for a single episode file.
Returns series_folder + season_folder + library_file (full path to .mkv).
``source_file`` doubles as the inspection target — when it exists,
ffprobe enrichment refreshes tech tokens missing from the release name.
"""
tv_root = _get_tv_root()
if not tv_root:
return ResolvedEpisodeDestination(
status="error",
error="library_not_set",
message="TV show library path is not configured.",
)
parsed = _resolve_parsed(release_name, source_file, kb, prober)
ext = Path(source_file).suffix
tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
tmdb_episode_title_safe = (
kb.sanitize_for_fs(tmdb_episode_title) if tmdb_episode_title else None
)
computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)
resolved = _resolve_series_folder(
tv_root, tmdb_title, tmdb_title_safe, tmdb_year, computed_name, confirmed_folder
)
if isinstance(resolved, _Clarification):
return ResolvedEpisodeDestination(
status="needs_clarification",
question=resolved.question,
options=resolved.options,
)
series_folder_name, is_new = resolved
season_folder_name = parsed.season_folder_name()
filename = parsed.episode_filename(tmdb_episode_title_safe, ext)
series_path = tv_root / series_folder_name
season_path = series_path / season_folder_name
file_path = season_path / filename
return ResolvedEpisodeDestination(
status="ok",
series_folder=str(series_path),
season_folder=str(season_path),
library_file=str(file_path),
series_folder_name=series_folder_name,
season_folder_name=season_folder_name,
filename=filename,
is_new_series_folder=is_new,
)
def resolve_movie_destination(
release_name: str,
source_file: str,
tmdb_title: str,
tmdb_year: int,
kb: ReleaseKnowledge,
prober: MediaProber,
) -> ResolvedMovieDestination:
"""
Compute destination paths for a movie file.
Returns movie_folder + library_file (full path to .mkv).
``source_file`` doubles as the inspection target — when it exists,
ffprobe enrichment refreshes tech tokens missing from the release name.
"""
memory = get_memory()
movies_root = memory.ltm.library_paths.get("movie")
if not movies_root:
return ResolvedMovieDestination(
status="error",
error="library_not_set",
message="Movie library path is not configured.",
)
parsed = _resolve_parsed(release_name, source_file, kb, prober)
ext = Path(source_file).suffix
tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
folder_name = parsed.movie_folder_name(tmdb_title_safe, tmdb_year)
filename = parsed.movie_filename(tmdb_title_safe, tmdb_year, ext)
folder_path = Path(movies_root) / folder_name
file_path = folder_path / filename
return ResolvedMovieDestination(
status="ok",
movie_folder=str(folder_path),
library_file=str(file_path),
movie_folder_name=folder_name,
filename=filename,
is_new_folder=not folder_path.exists(),
)
def resolve_series_destination(
release_name: str,
tmdb_title: str,
tmdb_year: int,
kb: ReleaseKnowledge,
prober: MediaProber,
confirmed_folder: str | None = None,
source_path: str | None = None,
) -> ResolvedSeriesDestination:
"""
Compute destination path for a complete multi-season series pack.
Returns only series_folder — the whole pack lands directly inside it.
When ``source_path`` points to the release on disk, ffprobe
enrichment refreshes tech tokens missing from the release name.
"""
tv_root = _get_tv_root()
if not tv_root:
return ResolvedSeriesDestination(
status="error",
error="library_not_set",
message="TV show library path is not configured.",
)
parsed = _resolve_parsed(release_name, source_path, kb, prober)
tmdb_title_safe = kb.sanitize_for_fs(tmdb_title)
computed_name = parsed.show_folder_name(tmdb_title_safe, tmdb_year)
resolved = _resolve_series_folder(
tv_root, tmdb_title, tmdb_title_safe, tmdb_year, computed_name, confirmed_folder
)
if isinstance(resolved, _Clarification):
return ResolvedSeriesDestination(
status="needs_clarification",
question=resolved.question,
options=resolved.options,
)
series_folder_name, is_new = resolved
series_path = tv_root / series_folder_name
return ResolvedSeriesDestination(
status="ok",
series_folder=str(series_path),
series_folder_name=series_folder_name,
is_new_series_folder=is_new,
)
@@ -1,9 +1,10 @@
"""Movie use cases.""" """Movie use cases."""
from .dto import SearchMovieResponse from .dto import MovieHit, SearchMovieResponse
from .search_movie import SearchMovieUseCase from .search_movie import SearchMovieUseCase
__all__ = [ __all__ = [
"SearchMovieUseCase", "MovieHit",
"SearchMovieResponse", "SearchMovieResponse",
"SearchMovieUseCase",
] ]
+40
View File
@@ -0,0 +1,40 @@
"""Movie application DTOs."""
from dataclasses import dataclass, field
@dataclass(frozen=True)
class MovieHit:
"""One movie hit, flattened for transport to the agent."""
tmdb_id: int
title: str
release_year: int | None = None
def to_dict(self) -> dict:
out: dict = {"tmdb_id": self.tmdb_id, "title": self.title}
if self.release_year is not None:
out["release_year"] = self.release_year
return out
@dataclass
class SearchMovieResponse:
"""Response from searching for a movie."""
status: str
hits: list[MovieHit] = field(default_factory=list)
error: str | None = None
message: str | None = None
def to_dict(self):
"""Convert to dict for agent compatibility."""
result: dict = {"status": self.status}
if self.error:
result["error"] = self.error
result["message"] = self.message
else:
result["hits"] = [h.to_dict() for h in self.hits]
return result
@@ -0,0 +1,60 @@
"""Search movie use case."""
import logging
from alfred.infrastructure.api_TO_CHECK.tmdb import (
TMDBAPIError,
TMDBClient,
TMDBConfigurationError,
)
from .dto import MovieHit, SearchMovieResponse
logger = logging.getLogger(__name__)
class SearchMovieUseCase:
"""List movies matching a free-text query via TMDB ``/search/movie``.
The use case is a thin orchestrator: it asks the client for hits,
flattens domain VOs into agent-friendly primitives, and wraps
errors. It deliberately does **not** look up ``imdb_id`` —
enrichment is the caller's job (via :meth:`TMDBClient.get_movie_info`
on a chosen ``tmdb_id``).
"""
def __init__(self, tmdb_client: TMDBClient):
self.tmdb_client = tmdb_client
def execute(self, media_title: str) -> SearchMovieResponse:
try:
results = self.tmdb_client.search_movies(media_title)
hits = [
MovieHit(
tmdb_id=r.tmdb_id.value,
title=str(r.title),
release_year=r.release_year.value if r.release_year else None,
)
for r in results
]
logger.info(f"search_movies({media_title!r}) → {len(hits)} hits")
return SearchMovieResponse(status="ok", hits=hits)
except TMDBConfigurationError as e:
logger.error(f"TMDB configuration error: {e}")
return SearchMovieResponse(
status="error", error="configuration_error", message=str(e)
)
except TMDBAPIError as e:
logger.error(f"TMDB API error: {e}")
return SearchMovieResponse(
status="error", error="api_error", message=str(e)
)
except ValueError as e:
logger.error(f"Validation error: {e}")
return SearchMovieResponse(
status="error", error="validation_failed", message=str(e)
)
@@ -0,0 +1,20 @@
"""Release application layer — orchestrators sitting between domain
parsing and infrastructure I/O.
Public surface:
- :func:`is_supported_video` / :func:`find_main_video` — pre-pipeline
filesystem helpers (extension-only filtering, top-level video pick).
- :func:`inspect_release` / :class:`InspectedResult` — full inspection
pipeline combining parse + filesystem refinement + probe enrichment.
"""
from .inspect import InspectedResult, inspect_release
from .supported_media import find_main_video, is_supported_video
__all__ = [
"InspectedResult",
"find_main_video",
"inspect_release",
"is_supported_video",
]
@@ -0,0 +1,67 @@
"""
detect_media_type — filesystem-based media type refinement.
Enriches a ParsedRelease.media_type with evidence from the actual source path
(file or folder). Called after parse_release() to produce a final classification.
Classification logic:
1. If source_path is a file — check its extension directly.
2. If source_path is a folder — collect all extensions inside (non-recursive
for the first level, then recursive if nothing conclusive found).
3. Decision:
- Any non_video extension AND no video extension → "other"
- Any video extension → keep parsed media_type ("movie" | "tv_show" | "unknown")
- No conclusive extension found → keep parsed media_type as-is
- Mixed (video + non_video) → "unknown"
"""
from __future__ import annotations
from pathlib import Path
from alfred.domain.releases_TO_CHECK.ports import ReleaseKnowledge
from alfred.domain.release.value_objects import ParsedRelease
def detect_media_type(
parsed: ParsedRelease, source_path: Path, kb: ReleaseKnowledge
) -> str:
"""
Return a refined media_type string for the given source_path.
Does not mutate parsed — returns the new media_type value only.
The caller is responsible for updating the ParsedRelease if needed.
"""
extensions = _collect_extensions(source_path)
# Metadata extensions (.nfo, .srt, …) are always present alongside releases
# and must not influence the type decision.
conclusive = extensions - kb.metadata_extensions
has_video = bool(conclusive & kb.video_extensions)
has_non_video = bool(conclusive & kb.non_video_extensions)
if has_video and has_non_video:
return "unknown"
if has_non_video and not has_video:
return "other"
if has_video:
return parsed.media_type # trust token-level inference
# No conclusive extension — trust token-level inference
return parsed.media_type
def _collect_extensions(path: Path) -> set[str]:
"""Return the set of lowercase extensions found at path (file or folder)."""
if not path.exists():
return set()
if path.is_file():
return {path.suffix.lower()}
# Folder — scan first level only
exts: set[str] = set()
for child in path.iterdir():
if child.is_file():
exts.add(child.suffix.lower())
return exts
@@ -0,0 +1,74 @@
"""enrich_from_probe — fill missing ParsedRelease fields from MediaInfo."""
from __future__ import annotations
from dataclasses import replace
from alfred.domain.releases_TO_CHECK.ports import ReleaseKnowledge
from alfred.domain.release.value_objects import ParsedRelease
from alfred.domain.shared_TO_CHECK.media import MediaInfo
def enrich_from_probe(
parsed: ParsedRelease, info: MediaInfo, kb: ReleaseKnowledge
) -> ParsedRelease:
"""
Return a new ParsedRelease with None fields filled from ffprobe MediaInfo.
Only overwrites fields that are currently None — token-level values
from the release name always take priority. ``ParsedRelease`` is
frozen; this returns a new instance via :func:`dataclasses.replace`.
Translation tables (ffprobe codec name → scene token, channel count
→ layout) live in ``kb.probe_mappings`` (loaded from
``alfred/knowledge/release/probe_mappings.yaml``). When ffprobe
reports a value with no mapping entry, the fallback is the uppercase
raw value so unknown codecs still surface in a predictable form.
"""
mappings = kb.probe_mappings
video_codec_map: dict[str, str] = mappings.get("video_codec", {})
audio_codec_map: dict[str, str] = mappings.get("audio_codec", {})
channel_map: dict[int, str] = mappings.get("audio_channels", {})
updates: dict[str, object] = {}
if parsed.quality is None and info.resolution:
updates["quality"] = info.resolution
if parsed.codec is None and info.video_codec:
updates["codec"] = video_codec_map.get(
info.video_codec.lower(), info.video_codec.upper()
)
# bit_depth: ffprobe exposes it via pix_fmt — not in MediaInfo yet, skip.
# Audio — use the default track, fallback to first
default_track = next((t for t in info.audio_tracks if t.is_default), None)
track = default_track or (info.audio_tracks[0] if info.audio_tracks else None)
if track:
if parsed.audio_codec is None and track.codec:
updates["audio_codec"] = audio_codec_map.get(
track.codec.lower(), track.codec.upper()
)
if parsed.audio_channels is None and track.channels:
updates["audio_channels"] = channel_map.get(
track.channels, f"{track.channels}ch"
)
# Languages — merge ffprobe languages with token-level ones
# "und" = undetermined, not useful
if info.audio_languages:
existing_upper = {lang.upper() for lang in parsed.languages}
new_languages = list(parsed.languages)
for lang in info.audio_languages:
if lang.lower() != "und" and lang.upper() not in existing_upper:
new_languages.append(lang)
existing_upper.add(lang.upper())
if len(new_languages) != len(parsed.languages):
updates["languages"] = tuple(new_languages)
if not updates:
return parsed
return replace(parsed, **updates)
@@ -0,0 +1,192 @@
"""Release inspection orchestrator — the canonical "look at this thing"
entry point.
``inspect_release`` is the single composition of the four layers we
care about for a freshly-arrived release:
1. **Parse the name** — :func:`alfred.domain.release.services.parse_release`
gives a ``ParsedRelease`` plus a ``ParseReport`` (confidence + road).
2. **Pick the main video** — :func:`find_main_video` runs a top-level
scan over the source path. If nothing qualifies the result still
completes; downstream callers decide what to do with a videoless
release.
3. **Refine the media type** — :func:`detect_media_type` uses the
on-disk extension mix to override any token-level guess (e.g. a
bare ``.iso`` folder becomes ``"other"``). The refined value is
patched onto ``parsed`` in place — same convention as
``analyze_release`` had before.
4. **Probe the video** — the injected :class:`MediaProber` fills in
missing technical fields via :func:`enrich_from_probe`. Skipped
when there is no main video or when ``media_type`` ended up in
``{"unknown", "other"}`` (the probe would tell us nothing useful).
The return type is :class:`InspectedResult`, a frozen VO that bundles
everything downstream callers need (``analyze_release`` tool,
``resolve_destination``, future workflow stages) without forcing them
to redo the same four calls.
Design notes:
- **Application layer.** This module touches both domain
(``parse_release``) and infrastructure (``MediaProber`` port). That
is exactly application's job — orchestrate.
- **Knowledge base is injected.** ``inspect_release`` takes ``kb`` and
``prober`` as parameters; no module-level singletons here. Callers
(the tool wrapper, tests) decide what to plug in.
- **Mutation is contained.** We still mutate ``parsed.media_type`` and
let ``enrich_from_probe`` fill its ``None`` fields, because
``ParsedRelease`` is intentionally a mutable dataclass. The outer
``InspectedResult`` is frozen so the *bundle* is immutable from the
caller's perspective.
- **Never raises.** Filesystem / probe errors surface as ``None``
fields on the result, never as exceptions — same contract as the
underlying adapters.
"""
from __future__ import annotations
from dataclasses import dataclass, replace
from pathlib import Path
from alfred.application.release_TO_CHECK.detect_media_type import detect_media_type
from alfred.application.release_TO_CHECK.enrich_from_probe import enrich_from_probe
from alfred.application.release_TO_CHECK.supported_media import find_main_video
from alfred.domain.releases_TO_CHECK.ports import ReleaseKnowledge
from alfred.domain.releases_TO_CHECK.parser.services import parse_release
from alfred.domain.release.value_objects import (
MediaTypeToken,
ParsedRelease,
ParseReport,
)
from alfred.domain.shared_TO_CHECK.media import MediaInfo
from alfred.domain.shared_TO_CHECK.ports import MediaProber
# Media types for which a probe carries no useful information.
_NON_PROBABLE_MEDIA_TYPES = frozenset({"unknown", "other"})
# Media types for which there's nothing for the organizer to do.
# ``other`` covers things like games / ISOs / archives sitting on the
# downloads folder. ``unknown`` does NOT belong here — those need a
# user decision, not a skip.
_SKIPPABLE_MEDIA_TYPES = frozenset({"other"})
# Roads that signal the parser couldn't reach a confident answer on its
# own. ``Road`` values are kept as strings on the report to avoid a
# cross-package import here.
_ASK_USER_ROADS = frozenset({"path_of_pain"})
@dataclass(frozen=True)
class InspectedResult:
"""The full picture of a release: parsed name + filesystem reality.
Bundles everything the downstream pipeline needs after a single
inspection pass:
- ``parsed`` — :class:`ParsedRelease`, with ``media_type`` already
refined by :func:`detect_media_type` and ``None`` tech fields
filled in by :func:`enrich_from_probe` when a probe ran.
- ``report`` — :class:`ParseReport` from the parser (confidence +
road, untouched by inspection).
- ``source_path`` — the path the inspector was pointed at (file or
folder), as supplied by the caller.
- ``main_video`` — the canonical video file inside ``source_path``,
or ``None`` if no eligible file was found.
- ``media_info`` — the :class:`MediaInfo` snapshot when a probe
succeeded; ``None`` when no video was probed (no main video, or
``media_type`` in ``{"unknown", "other"}``) or when ffprobe
failed.
- ``probe_used`` — ``True`` iff ``media_info`` is non-``None`` and
``enrich_from_probe`` actually ran. Explicit flag so callers
don't have to re-derive the condition.
- ``recommended_action`` — derived hint for the orchestrator (see
property docstring). Encodes the exclusion / clarification /
go-ahead decision in one place so downstream callers don't
re-implement the same checks.
"""
parsed: ParsedRelease
report: ParseReport
source_path: Path
main_video: Path | None
media_info: MediaInfo | None
probe_used: bool
@property
def recommended_action(self) -> str:
"""Return one of ``"skip"`` / ``"ask_user"`` / ``"process"``.
- ``"skip"`` — nothing to organize:
* the source has no main video file, **or**
* ``media_type`` is ``"other"`` (games / ISOs / archives).
- ``"ask_user"`` — a decision is required before any action:
* ``media_type`` is ``"unknown"`` (parser couldn't classify), **or**
* the parse landed on ``Road.PATH_OF_PAIN``
(low-confidence, malformed name, etc.).
- ``"process"`` — everything else: a confident parse with a
usable media type and a main video on disk. The orchestrator
can move straight to the planning step.
The check ordering matters: ``"skip"`` wins over ``"ask_user"``
because if there's no video to organize, no question to the
user can change that. ``"ask_user"`` then wins over
``"process"`` because a confident parse alone isn't enough if
the type or road still flag uncertainty.
"""
if self.main_video is None:
return "skip"
if self.parsed.media_type.value in _SKIPPABLE_MEDIA_TYPES:
return "skip"
if self.parsed.media_type.value == "unknown":
return "ask_user"
if self.report.road in _ASK_USER_ROADS:
return "ask_user"
return "process"
def inspect_release(
release_name: str,
source_path: Path,
kb: ReleaseKnowledge,
prober: MediaProber,
) -> InspectedResult:
"""Run the full inspection pipeline on ``release_name`` /
``source_path``.
See module docstring for the four-step flow. ``kb`` and ``prober``
are injected so the caller controls the knowledge base layering
and the probe adapter (real ffprobe in production, stubs in tests).
Never raises. A missing or unreadable ``source_path`` simply
results in ``main_video=None`` and ``media_info=None``.
"""
parsed, report = parse_release(release_name, kb)
# Step 2: refine media_type from the on-disk extension mix.
# detect_media_type tolerates non-existent paths (returns parsed.media_type
# untouched), so no need to guard here. ParsedRelease is frozen — use
# dataclasses.replace to rebind with the refined value.
refined_media_type = MediaTypeToken(detect_media_type(parsed, source_path, kb))
if refined_media_type != parsed.media_type:
parsed = replace(parsed, media_type=refined_media_type)
# Step 3: pick the canonical main video (top-level scan only).
main_video = find_main_video(source_path, kb)
# Step 4: probe + enrich, when it makes sense.
media_info: MediaInfo | None = None
probe_used = False
if main_video is not None and parsed.media_type not in _NON_PROBABLE_MEDIA_TYPES:
media_info = prober.probe(main_video)
if media_info is not None:
parsed = enrich_from_probe(parsed, media_info, kb)
probe_used = True
return InspectedResult(
parsed=parsed,
report=report,
source_path=source_path,
main_video=main_video,
media_info=media_info,
probe_used=probe_used,
)
@@ -0,0 +1,74 @@
"""Pre-pipeline exclusion — decide which files are worth parsing.
These helpers live one notch above the domain: they touch the
filesystem (``Path.iterdir``, ``Path.suffix``) but carry no parsing
logic of their own. The goal is to filter out non-video files and pick
the canonical "main video" from a release folder *before* anything
hits :func:`~alfred.domain.release.parse_release`.
Design notes (Phase A bis, 2026-05-20):
- **Extension is the sole eligibility criterion.** A file is supported
iff its suffix is in ``kb.video_extensions``. No size threshold, no
filename heuristics ("sample", "trailer", …). If a release packs a
bloated featurette or names its sample alphabetically before the
main feature, that's PATH_OF_PAIN territory — not this layer's job.
- **Top-level scan only.** ``find_main_video`` does not descend into
subdirectories. Releases that wrap the main video in ``Sample/`` or
similar are non-scene-standard and handled by the orchestrator
upstream.
- **Lexicographic tie-break.** When several candidates qualify
(legitimate for season packs), we return the first by alphabetical
order. Deterministic, no size-based ranking.
- **Direct ``Path`` I/O.** No ``FilesystemScanner`` port — this layer
is application, not domain. If isolation becomes necessary for
testing scale, we'll introduce a port then.
"""
from __future__ import annotations
from pathlib import Path
from alfred.domain.releases_TO_CHECK.ports.knowledge import ReleaseKnowledge
def is_supported_video(path: Path, kb: ReleaseKnowledge) -> bool:
"""Return True when ``path`` is a video file the parser should
consider.
The check is purely extension-based: ``path.suffix.lower()`` must
belong to ``kb.video_extensions``. ``path`` must also be a regular
file — directories and broken symlinks return False.
"""
if not path.is_file():
return False
return path.suffix.lower() in kb.video_extensions
def find_main_video(folder: Path, kb: ReleaseKnowledge) -> Path | None:
"""Return the canonical main video file inside ``folder``, or
``None`` if there isn't one.
Behavior:
- Top-level scan only — subdirectories are ignored.
- Eligibility is :func:`is_supported_video`.
- When several files qualify, the lexicographically first one wins.
- When ``folder`` itself is a video file, it is returned as-is
(single-file releases are valid).
- When ``folder`` doesn't exist or isn't a directory (and isn't a
video file either), returns ``None``.
"""
if folder.is_file():
return folder if is_supported_video(folder, kb) else None
if not folder.is_dir():
return None
candidates = sorted(
child for child in folder.iterdir() if is_supported_video(child, kb)
)
return candidates[0] if candidates else None
@@ -0,0 +1,116 @@
"""SubtitlePlacer — hard-links matched subtitle tracks next to the destination video."""
import logging
import os
from dataclasses import dataclass
from pathlib import Path
from alfred.domain.subtitles_TO_CHECK.entities import SubtitleScanResult
from alfred.domain.subtitles_TO_CHECK.value_objects import SubtitleType
logger = logging.getLogger(__name__)
def _build_dest_name(track: SubtitleScanResult, video_stem: str) -> str:
"""
Build the destination filename for a subtitle track.
Format: {video_stem}.{lang}.{ext}
{video_stem}.{lang}.sdh.{ext}
{video_stem}.{lang}.forced.{ext}
"""
if not track.language or not track.format:
raise ValueError("Cannot compute destination name: language or format missing")
ext = track.format.extensions[0].lstrip(".")
parts = [video_stem, track.language.code]
if track.subtitle_type == SubtitleType.SDH:
parts.append("sdh")
elif track.subtitle_type == SubtitleType.FORCED:
parts.append("forced")
return ".".join(parts) + "." + ext
@dataclass
class PlacedTrack:
source: Path
destination: Path
filename: str
@dataclass
class PlaceResult:
placed: list[PlacedTrack]
skipped: list[tuple[SubtitleScanResult, str]] # (track, reason)
@property
def placed_count(self) -> int:
return len(self.placed)
@property
def skipped_count(self) -> int:
return len(self.skipped)
class SubtitlePlacer:
"""
Hard-links matched SubtitleScanResult files next to a destination video.
Uses the same hard-link strategy as FileManager.copy_file:
instant, no data duplication, qBittorrent keeps seeding.
Embedded tracks are skipped — nothing to place on disk.
"""
def place(
self,
tracks: list[SubtitleScanResult],
destination_video: Path,
) -> PlaceResult:
placed: list[PlacedTrack] = []
skipped: list[tuple[SubtitleScanResult, str]] = []
dest_dir = destination_video.parent
for track in tracks:
if track.is_embedded:
logger.debug(f"SubtitlePlacer: skip embedded track ({track.language})")
skipped.append((track, "embedded — no file to place"))
continue
if not track.file_path:
skipped.append((track, "source file not set"))
continue
try:
dest_name = _build_dest_name(track, destination_video.stem)
except ValueError as e:
skipped.append((track, str(e)))
continue
dest_path = dest_dir / dest_name
try:
os.link(track.file_path, dest_path)
placed.append(
PlacedTrack(
source=track.file_path,
destination=dest_path,
filename=dest_name,
)
)
logger.info(f"SubtitlePlacer: placed {dest_name}")
except FileNotFoundError:
skipped.append((track, "source file not found"))
except FileExistsError:
logger.debug(f"SubtitlePlacer: skip {dest_name} — already exists")
skipped.append((track, "destination already exists"))
except OSError as e:
logger.warning(f"SubtitlePlacer: failed to place {dest_name}: {e}")
skipped.append((track, str(e)))
logger.info(
f"SubtitlePlacer: {len(placed)} placed, {len(skipped)} skipped "
f"for {destination_video.name}"
)
return PlaceResult(placed=placed, skipped=skipped)
@@ -2,7 +2,7 @@
import logging import logging
from infrastructure.api.qbittorrent import ( from alfred.infrastructure.api_TO_CHECK.qbittorrent import (
QBittorrentAPIError, QBittorrentAPIError,
QBittorrentAuthError, QBittorrentAuthError,
QBittorrentClient, QBittorrentClient,
@@ -2,7 +2,11 @@
import logging import logging
from infrastructure.api.knaben import KnabenAPIError, KnabenClient, KnabenNotFoundError from alfred.infrastructure.api_TO_CHECK.knaben import (
KnabenAPIError,
KnabenClient,
KnabenNotFoundError,
)
from .dto import SearchTorrentsResponse from .dto import SearchTorrentsResponse
@@ -0,0 +1,21 @@
"""TV-show orchestrators — operate on the Alfred-managed TV library tree.
The TV library is a directory of show folders (one per TV show), each
holding season folders containing video files. Modules here walk this
tree and reconstruct on-disk :class:`SeriesRelease` aggregates by
reusing the existing release pipeline (``inspect_release``) rather
than duplicating its parse/probe logic.
"""
from .dto import SearchShowResponse, ShowHit
from .search_show import SearchShowUseCase
from .walker import SeasonFolder, ShowTree, walk_show
__all__ = [
"SearchShowResponse",
"SearchShowUseCase",
"SeasonFolder",
"ShowHit",
"ShowTree",
"walk_show",
]
@@ -0,0 +1,39 @@
"""TV show application DTOs."""
from dataclasses import dataclass, field
@dataclass(frozen=True)
class ShowHit:
"""One TV-show hit, flattened for transport to the agent."""
tmdb_id: int
name: str
first_air_year: int | None = None
def to_dict(self) -> dict:
out: dict = {"tmdb_id": self.tmdb_id, "name": self.name}
if self.first_air_year is not None:
out["first_air_year"] = self.first_air_year
return out
@dataclass
class SearchShowResponse:
"""Response from searching for a TV show."""
status: str
hits: list[ShowHit] = field(default_factory=list)
error: str | None = None
message: str | None = None
def to_dict(self):
result: dict = {"status": self.status}
if self.error:
result["error"] = self.error
result["message"] = self.message
else:
result["hits"] = [h.to_dict() for h in self.hits]
return result
@@ -0,0 +1,59 @@
"""Search TV show use case."""
import logging
from alfred.infrastructure.api_TO_CHECK.tmdb import (
TMDBAPIError,
TMDBClient,
TMDBConfigurationError,
)
from .dto import SearchShowResponse, ShowHit
logger = logging.getLogger(__name__)
class SearchShowUseCase:
"""List TV shows matching a free-text query via TMDB ``/search/tv``.
Symmetric to :class:`alfred.application.movies.SearchMovieUseCase`:
thin orchestrator, flattens domain VOs into agent-friendly
primitives, no ``imdb_id`` enrichment (caller follows up with
:meth:`TMDBClient.get_tv_show_info` on a chosen ``tmdb_id``).
"""
def __init__(self, tmdb_client: TMDBClient):
self.tmdb_client = tmdb_client
def execute(self, show_title: str) -> SearchShowResponse:
try:
results = self.tmdb_client.search_shows(show_title)
hits = [
ShowHit(
tmdb_id=r.tmdb_id.value,
name=r.name,
first_air_year=r.first_air_year,
)
for r in results
]
logger.info(f"search_shows({show_title!r}) → {len(hits)} hits")
return SearchShowResponse(status="ok", hits=hits)
except TMDBConfigurationError as e:
logger.error(f"TMDB configuration error: {e}")
return SearchShowResponse(
status="error", error="configuration_error", message=str(e)
)
except TMDBAPIError as e:
logger.error(f"TMDB API error: {e}")
return SearchShowResponse(
status="error", error="api_error", message=str(e)
)
except ValueError as e:
logger.error(f"Validation error: {e}")
return SearchShowResponse(
status="error", error="validation_failed", message=str(e)
)
@@ -0,0 +1,208 @@
"""Show tree walker — minimal filesystem traversal of a TV show folder.
The walker is intentionally dumb: it lists season folders, classifies
each one as PACK or EPISODIC by **inspecting its filesystem
structure**, and hands the orchestrator a flat list of video files
per season. It does not parse release names, run ffprobe, or
classify subtitle files. All of that intelligence lives in the
existing release pipeline (``inspect_release`` + downstream
services); the walker just hands the orchestrator the paths to feed
into that pipeline.
Folder convention
-----------------
Inside an Alfred-managed library, a show root looks like::
Foundation/
Foundation.S01.1080p.WEB-DL.x265-GROUP/ ← PACK season
Foundation.S01E01.1080p.WEB-DL.x265.mkv ← flat video
Foundation.S01E02.1080p.WEB-DL.x265.mkv
...
Foundation.S02/ ← EPISODIC season
Foundation.S02E01.1080p.WEB-DL.x265-GROUP/ ← episode subfolder
Foundation.S02E01.1080p.WEB-DL.x265-GROUP.mkv
Foundation.S02E02.1080p.WEB-DL.x265-OTHER/
Foundation.S02E02.1080p.WEB-DL.x265-OTHER.mkv
The walker recognizes a season folder by a ``Sxx`` token anywhere in
its name (case-insensitive). It does **not** care about Plex-style
names (``Season 01``, ``Specials``) — the Alfred library uses
release-style folder names only.
PACK vs EPISODIC is a **structural distinction**, not a naming one:
* **PACK** — season folder contains N flat video files. No
subfolders.
* **EPISODIC** — season folder contains N subfolders, each holding
exactly one video.
A season folder that mixes the two layouts (some flat videos AND
some subfolders) is malformed: the walker reports
``mode=None`` and an empty ``video_files`` tuple so the
orchestrator can warn and skip it.
"""
from __future__ import annotations
import logging
import re
from dataclasses import dataclass
from pathlib import Path
from alfred.domain.releases_TO_CHECK.ports import ReleaseKnowledge
from alfred.domain.releases_TO_CHECK.value_objects import ReleaseMode
from alfred.domain.shared_TO_CHECK.ports import FilesystemScanner
_LOG = logging.getLogger(__name__)
# Matches any ``Sxx`` token (1-2 digits) bounded by non-alphanumerics.
# Examples that match: ``Foundation.S01.1080p`` , ``S2.Pack`` , ``BBC.s10.bluray``.
# Examples that don't: ``Sample`` , ``Soundtrack`` , ``2024.S0E1`` (no S+digits boundary).
_SEASON_TOKEN_RE = re.compile(r"(?<![A-Za-z0-9])s(\d{1,2})(?![A-Za-z0-9])", re.IGNORECASE)
@dataclass(frozen=True)
class SeasonFolder:
"""One season folder discovered inside a show root.
``mode`` is set by the walker from the FS structure:
* :attr:`ReleaseMode.PACK` — ``video_files`` lists the season
folder's flat videos.
* :attr:`ReleaseMode.EPISODIC` — ``video_files`` lists each
episode subfolder's single video.
* ``None`` — the folder is empty, malformed (mixed layout), or
otherwise unclassifiable. ``video_files`` is empty. The
orchestrator decides whether to warn/skip.
"""
season_dir: Path
mode: ReleaseMode | None
video_files: tuple[Path, ...]
@dataclass(frozen=True)
class ShowTree:
"""The full structural snapshot of a show on disk."""
show_root: Path
season_folders: tuple[SeasonFolder, ...]
def walk_show(
show_root: Path,
*,
scanner: FilesystemScanner,
kb: ReleaseKnowledge,
) -> ShowTree:
"""Walk ``show_root`` and return its structural tree.
The walker:
* lists direct children of ``show_root``,
* keeps the directories whose name contains a ``Sxx`` token,
* classifies each season folder as PACK / EPISODIC / unknown by
inspecting its direct children (videos vs subfolders),
* for EPISODIC, descends one extra level into each episode
subfolder to collect its single video,
* sorts season folders by name and video files by name within
each folder.
The walker never raises — empty / unreadable / malformed
directories surface as a ``SeasonFolder`` with ``mode=None`` and
an empty ``video_files`` tuple.
"""
video_exts = {ext.lower() for ext in kb.video_extensions}
season_folders: list[SeasonFolder] = []
for entry in scanner.scan_dir(show_root):
if not entry.is_dir or not _SEASON_TOKEN_RE.search(entry.name):
continue
season_folders.append(
_classify_season(entry.path, scanner=scanner, video_exts=video_exts)
)
return ShowTree(
show_root=show_root, season_folders=tuple(season_folders)
)
# --------------------------------------------------------------------------- #
# Season-folder classification #
# --------------------------------------------------------------------------- #
def _classify_season(
season_dir: Path,
*,
scanner: FilesystemScanner,
video_exts: set[str],
) -> SeasonFolder:
"""Inspect one season folder and decide PACK / EPISODIC / unknown.
Looks only at direct children. For EPISODIC, descends one extra
level into each subfolder to collect its single video. Mixed
layouts (flat videos + subfolders) are reported as ``mode=None``
so the orchestrator can skip them with a warning.
"""
flat_videos: list[Path] = []
subdirs: list[Path] = []
for child in scanner.scan_dir(season_dir):
if child.is_file and child.suffix.lower() in video_exts:
flat_videos.append(child.path)
elif child.is_dir:
subdirs.append(child.path)
# Anything else (non-video files like .nfo, .srt at the season
# root) is ignored — it doesn't affect classification.
has_flat = bool(flat_videos)
has_subdirs = bool(subdirs)
if has_flat and has_subdirs:
_LOG.warning(
"walker: season folder %s mixes flat videos and subfolders — "
"malformed layout, skipping",
season_dir,
)
return SeasonFolder(season_dir=season_dir, mode=None, video_files=())
if has_flat:
return SeasonFolder(
season_dir=season_dir,
mode=ReleaseMode.PACK,
video_files=tuple(sorted(flat_videos)),
)
if has_subdirs:
episode_videos: list[Path] = []
for sub in sorted(subdirs):
videos_in_sub = [
child.path
for child in scanner.scan_dir(sub)
if child.is_file and child.suffix.lower() in video_exts
]
if len(videos_in_sub) == 0:
_LOG.warning(
"walker: episode subfolder %s contains no video — skipping",
sub,
)
continue
if len(videos_in_sub) > 1:
_LOG.warning(
"walker: episode subfolder %s contains %d videos — "
"malformed, skipping season %s",
sub,
len(videos_in_sub),
season_dir,
)
return SeasonFolder(
season_dir=season_dir, mode=None, video_files=()
)
episode_videos.append(videos_in_sub[0])
return SeasonFolder(
season_dir=season_dir,
mode=ReleaseMode.EPISODIC,
video_files=tuple(episode_videos),
)
# No flat videos, no subdirs → empty season folder.
return SeasonFolder(season_dir=season_dir, mode=None, video_files=())
@@ -2,7 +2,6 @@
from .entities import Movie from .entities import Movie
from .exceptions import InvalidMovieData, MovieNotFound from .exceptions import InvalidMovieData, MovieNotFound
from .services import MovieService
from .value_objects import MovieTitle, Quality, ReleaseYear from .value_objects import MovieTitle, Quality, ReleaseYear
__all__ = [ __all__ = [
@@ -12,5 +11,4 @@ __all__ = [
"Quality", "Quality",
"MovieNotFound", "MovieNotFound",
"InvalidMovieData", "InvalidMovieData",
"MovieService",
] ]
+91
View File
@@ -0,0 +1,91 @@
"""Movie domain entities."""
from dataclasses import dataclass
from ..shared_TO_CHECK.value_objects import ImdbId, TmdbId
from .value_objects import MovieTitle, ReleaseYear
@dataclass(frozen=True, eq=False)
class Movie:
"""
Movie aggregate root for the movies domain.
TMDB-only aggregate: carries identity (``tmdb_id`` + optional
``imdb_id``) plus the catalog facts that come from TMDB (``title``,
``release_year``). Filesystem-side concerns (file path, quality,
tracks, ``added_at``) live on :class:`alfred.domain.releases.entities.
MovieRelease`, the per-movie release aggregate persisted alongside.
Frozen: rebuild via ``dataclasses.replace`` to project metadata
updates (e.g. a TMDB refresh) onto a new instance.
Equality is identity-based on ``tmdb_id``: two ``Movie`` instances
are equal iff they share the same primary key. ``imdb_id`` is a
secondary anchor and not part of the identity.
"""
tmdb_id: TmdbId
title: MovieTitle
imdb_id: ImdbId | None = None
release_year: ReleaseYear | None = None
def __post_init__(self) -> None:
if not isinstance(self.tmdb_id, TmdbId):
raise ValueError(
f"tmdb_id must be TmdbId, got {type(self.tmdb_id)}"
)
if not isinstance(self.title, MovieTitle):
if isinstance(self.title, str):
object.__setattr__(self, "title", MovieTitle(self.title))
else:
raise ValueError(
f"title must be MovieTitle or str, got {type(self.title)}"
)
if self.imdb_id is not None and not isinstance(self.imdb_id, ImdbId):
raise ValueError(
f"imdb_id must be ImdbId or None, got {type(self.imdb_id)}"
)
def __eq__(self, other: object) -> bool:
if not isinstance(other, Movie):
return NotImplemented
return self.tmdb_id == other.tmdb_id
def __hash__(self) -> int:
return hash(self.tmdb_id)
# WRONG
def get_folder_name(self) -> str:
"""
Get the folder name for this movie.
Format: "Title (Year)"
Example: "Inception (2010)"
"""
if self.release_year:
return f"{self.title.value} ({self.release_year.value})"
return self.title.value
# WRONG
def get_filename(self) -> str:
"""
Get the suggested base filename (without extension) for this movie.
Format: ``Title.Year`` (quality lives on
:class:`alfred.domain.releases.entities.MovieRelease` now and is
appended by the release-aware caller — typically the rescan /
organize flow, after Phase 4).
Example: ``Inception.2010``.
"""
parts = [self.title.normalized()]
if self.release_year:
parts.append(str(self.release_year.value))
return ".".join(parts)
def __str__(self) -> str:
return f"{self.title.value} ({self.release_year.value if self.release_year else 'Unknown'})"
def __repr__(self) -> str:
return f"Movie(tmdb_id={self.tmdb_id}, title='{self.title.value}')"
@@ -1,6 +1,6 @@
"""Movie domain exceptions.""" """Movie domain exceptions."""
from ..shared.exceptions import DomainException, NotFoundError from ..shared_TO_CHECK.exceptions import DomainException, NotFoundError
class MovieNotFound(NotFoundError): class MovieNotFound(NotFoundError):
@@ -1,10 +1,9 @@
"""Movie domain value objects.""" """Movie domain value objects."""
import re
from dataclasses import dataclass from dataclasses import dataclass
from enum import Enum from enum import Enum
from ..shared.exceptions import ValidationError from ..shared_TO_CHECK.exceptions import ValidationError
class Quality(Enum): class Quality(Enum):
@@ -17,7 +16,7 @@ class Quality(Enum):
UNKNOWN = "unknown" UNKNOWN = "unknown"
@classmethod @classmethod
def from_string(cls, quality_str: str) -> "Quality": def from_string(cls, quality_str: str) -> Quality:
""" """
Parse quality from string. Parse quality from string.
@@ -56,22 +55,11 @@ class MovieTitle:
f"Movie title must be a string, got {type(self.value)}" f"Movie title must be a string, got {type(self.value)}"
) )
if len(self.value) > 500: if len(self.value) > 150:
raise ValidationError( raise ValidationError(
f"Movie title too long: {len(self.value)} characters (max 500)" f"Movie title too long: {len(self.value)} characters (max 150)"
) )
def normalized(self) -> str:
"""
Return normalized title for file system usage.
Removes special characters and replaces spaces with dots.
"""
# Remove special characters except spaces, dots, and hyphens
cleaned = re.sub(r"[^\w\s\.\-]", "", self.value)
# Replace spaces with dots
normalized = cleaned.replace(" ", ".")
return normalized
def __str__(self) -> str: def __str__(self) -> str:
return self.value return self.value
@@ -97,10 +85,6 @@ class ReleaseYear:
f"Release year must be an integer, got {type(self.value)}" f"Release year must be an integer, got {type(self.value)}"
) )
# Movies started around 1888, and we shouldn't have movies from the future
if self.value < 1888 or self.value > 2100:
raise ValidationError(f"Invalid release year: {self.value}")
def __str__(self) -> str: def __str__(self) -> str:
return str(self.value) return str(self.value)
@@ -0,0 +1,38 @@
"""Filesystem release aggregates — what the user owns on disk.
This bounded context is intentionally separated from
``alfred.domain.tv_shows`` / ``alfred.domain.movies`` (TMDB identity).
A :class:`SeriesRelease` describes the physical files on disk for one
show; a :class:`TVShow` describes the work as catalogued by TMDB. The
two are linked by :class:`~alfred.domain.shared.value_objects.TmdbId`
in the persistence layer, never by direct reference.
Not to be confused with ``alfred.domain.release`` (singular) which
parses release **names** (strings → tokens). The two packages may be
merged later; for now they coexist as separate concerns.
"""
from .builders import SeasonReleaseBuilder, SeriesReleaseBuilder
from .entities import (
EpisodeRelease,
MovieRelease,
SeasonRelease,
SeriesRelease,
TrackProfile,
)
from .repositories import MovieReleaseRepository, SeriesReleaseRepository
from .value_objects import EpisodeRange, ReleaseMode
__all__ = [
"EpisodeRange",
"EpisodeRelease",
"MovieRelease",
"MovieReleaseRepository",
"ReleaseMode",
"SeasonRelease",
"SeasonReleaseBuilder",
"SeriesRelease",
"SeriesReleaseBuilder",
"SeriesReleaseRepository",
"TrackProfile",
]
+243
View File
@@ -0,0 +1,243 @@
"""Builders for the filesystem release aggregates.
The aggregates are frozen — :class:`SeriesRelease`, :class:`SeasonRelease`,
and :class:`EpisodeRelease` are ``@dataclass(frozen=True)`` and offer no
mutation methods. All construction goes through these builders, which
assemble the aggregate piece by piece and emit a frozen instance via
``build()``.
Typical usage during a filesystem walk::
builder = SeriesReleaseBuilder(tmdb_id=TmdbId(84958), imdb_id=ImdbId("tt0804484"))
sb = builder.season_builder(SeasonNumber(1), folder="Show.S01", mode=ReleaseMode.PACK)
sb.add_episode(EpisodeRelease(
episodes=EpisodeRange(EpisodeNumber(1), EpisodeNumber(1)),
file_path=FilePath("Show.S01/Show.S01E01.mkv"),
tracks=TrackProfile(),
))
release = builder.build()
Builders are **single-use scratchpads**: they hold mutable state during
construction, then produce an immutable aggregate.
Invariants enforced at ``build()`` time:
* Seasons are emitted sorted by ``season_number``.
* Episodes within each season are emitted sorted by their
``EpisodeRange.start`` (so a season with ``E01-E03`` + ``E04`` is
emitted in that order).
* No two ``EpisodeRelease`` within a season may overlap (same TMDB
episode covered by two distinct files) — raises ``ValidationError``.
"""
from __future__ import annotations
from ..shared_TO_CHECK.exceptions import ValidationError
from ..shared_TO_CHECK.value_objects import ImdbId, TmdbId
from ..tv_shows.value_objects import SeasonNumber
from .entities import (
EpisodeRelease,
SeasonRelease,
SeriesRelease,
)
from .value_objects import ReleaseMode
# ════════════════════════════════════════════════════════════════════════════
# MovieReleaseBuilder
# ════════════════════════════════════════════════════════════════════════════
# ...
# ════════════════════════════════════════════════════════════════════════════
# SeasonReleaseBuilder
# ════════════════════════════════════════════════════════════════════════════
class SeasonReleaseBuilder:
"""
Mutable scratchpad for a :class:`SeasonRelease`.
Episodes are appended in arbitrary order; ``build()`` sorts them by
their range start before emitting the frozen aggregate and verifies
there are no overlapping ranges.
"""
def __init__(
self,
season_number: SeasonNumber | int,
*,
folder: str,
mode: ReleaseMode,
) -> None:
if isinstance(season_number, int):
season_number = SeasonNumber(season_number)
self._season_number: SeasonNumber = season_number
self._folder: str = folder
self._mode: ReleaseMode = mode
self._episodes: list[EpisodeRelease] = []
@classmethod
def from_existing(cls, season: SeasonRelease) -> SeasonReleaseBuilder:
"""Seed a builder from an existing frozen :class:`SeasonRelease`."""
builder = cls(
season.season_number,
folder=season.folder,
mode=season.mode,
)
builder._episodes = list(season.episodes)
return builder
@property
def season_number(self) -> SeasonNumber:
return self._season_number
@property
def mode(self) -> ReleaseMode:
return self._mode
def set_folder(self, folder: str) -> SeasonReleaseBuilder:
self._folder = folder
return self
def set_mode(self, mode: ReleaseMode) -> SeasonReleaseBuilder:
self._mode = mode
return self
def add_episode(self, episode: EpisodeRelease) -> SeasonReleaseBuilder:
"""Append a physical-file :class:`EpisodeRelease` to this season."""
self._episodes.append(episode)
return self
def build(self) -> SeasonRelease:
"""Emit a frozen :class:`SeasonRelease` with episodes sorted.
Raises :class:`ValidationError` if any two episode ranges overlap
(same TMDB slot claimed by two distinct files).
"""
ordered = tuple(
sorted(self._episodes, key=lambda ep: ep.episodes.start.value)
)
# Overlap check — ranges are inclusive on both ends, sorted by start.
for prev, curr in zip(ordered, ordered[1:], strict=False):
if curr.episodes.start.value <= prev.episodes.end.value:
raise ValidationError(
f"SeasonRelease season {self._season_number}: overlapping "
f"episode ranges {prev.episodes} and {curr.episodes}"
)
return SeasonRelease(
season_number=self._season_number,
folder=self._folder,
mode=self._mode,
episodes=ordered,
)
# ════════════════════════════════════════════════════════════════════════════
# SeriesReleaseBuilder
# ════════════════════════════════════════════════════════════════════════════
class SeriesReleaseBuilder:
"""
Mutable scratchpad for the :class:`SeriesRelease` aggregate root.
Seasons are tracked via internal :class:`SeasonReleaseBuilder`
instances keyed by :class:`SeasonNumber`.
"""
def __init__(
self,
*,
tmdb_id: TmdbId | int,
imdb_id: ImdbId | str | None = None,
) -> None:
if isinstance(tmdb_id, int):
tmdb_id = TmdbId(tmdb_id)
if isinstance(imdb_id, str):
imdb_id = ImdbId(imdb_id)
self._tmdb_id: TmdbId = tmdb_id
self._imdb_id: ImdbId | None = imdb_id
self._season_builders: dict[SeasonNumber, SeasonReleaseBuilder] = {}
@classmethod
def from_existing(cls, release: SeriesRelease) -> SeriesReleaseBuilder:
"""Seed a builder from an existing frozen :class:`SeriesRelease`."""
builder = cls(
tmdb_id=release.tmdb_id,
imdb_id=release.imdb_id,
)
for season in release.seasons:
builder._season_builders[season.season_number] = (
SeasonReleaseBuilder.from_existing(season)
)
return builder
# ── Top-level mutators ─────────────────────────────────────────────────
def set_imdb_id(self, imdb_id: ImdbId | str | None) -> SeriesReleaseBuilder:
if isinstance(imdb_id, str):
imdb_id = ImdbId(imdb_id)
self._imdb_id = imdb_id
return self
# ── Content ────────────────────────────────────────────────────────────
def season_builder(
self,
season_number: SeasonNumber | int,
*,
folder: str | None = None,
mode: ReleaseMode | None = None,
) -> SeasonReleaseBuilder:
"""
Return (creating if needed) the :class:`SeasonReleaseBuilder` for a
season.
``folder`` and ``mode`` are required when the builder does not yet
exist for this season; subsequent calls may pass them to override.
"""
if isinstance(season_number, int):
season_number = SeasonNumber(season_number)
sb = self._season_builders.get(season_number)
if sb is None:
if folder is None or mode is None:
raise ValidationError(
f"season_builder({season_number}): folder and mode "
f"are required to create a new season builder"
)
sb = SeasonReleaseBuilder(season_number, folder=folder, mode=mode)
self._season_builders[season_number] = sb
else:
if folder is not None:
sb.set_folder(folder)
if mode is not None:
sb.set_mode(mode)
return sb
def add_season(self, season: SeasonRelease) -> SeriesReleaseBuilder:
"""
Attach (or replace) a fully-built :class:`SeasonRelease`.
Replaces any existing season with the same number.
"""
self._season_builders[season.season_number] = (
SeasonReleaseBuilder.from_existing(season)
)
return self
# ── Emit ───────────────────────────────────────────────────────────────
def build(self) -> SeriesRelease:
"""Emit a frozen :class:`SeriesRelease` with seasons sorted by number."""
ordered_seasons = tuple(
self._season_builders[n].build()
for n in sorted(self._season_builders, key=lambda x: x.value)
)
return SeriesRelease(
tmdb_id=self._tmdb_id,
imdb_id=self._imdb_id,
seasons=ordered_seasons,
)

Some files were not shown because too many files have changed in this diff Show More