DDD-pure cleanup — entities and value objects no longer query the world
at read time.
FilePath: drop .exists() / .is_file() / .is_dir(). The VO is now a
pure address; ask the injected FilesystemScanner for live state.
Movie: drop .has_file() / .is_downloaded(). Invariant: when the
application sets file_path, it has already constated the file
exists; downstream readers trust the snapshot.
Episode: same — drop .has_file() / .is_downloaded().
SubtitlePlacer: drop the pre-check .exists() calls. The placer now
attempts os.link() and reports FileNotFoundError / FileExistsError
as skip reasons. Removes a TOCTOU race as a bonus.
Tests adjusted: the FilePath VO method tests are gone (the methods are
gone), test_has_file_false_when_no_path replaced by a plain assertion
on file_path is None. Placer tests are unchanged — the skip-reason
strings ('not found', 'already exists') match the new try/except paths.
The 'snapshot value objects' pattern (ProbedMediaInfo, TmdbMovieInfo)
that this cleanup enables is documented in refactor_domain_io.md, to
be applied when a future use case actually needs richer metadata —
not now, no speculative VOs.
Domain services no longer call subprocess or pathlib directly. Introduces
two Protocol ports in domain/shared/ports/:
MediaProber.list_subtitle_streams(video) -> list[SubtitleStreamInfo]
FilesystemScanner.scan_dir / stat / read_text -> list[FileEntry] | ...
Concrete adapters live in infrastructure/:
FfprobeMediaProber (wraps subprocess + ffprobe + JSON)
PathlibFilesystemScanner (wraps pathlib + os reads)
SubtitleIdentifier and PatternDetector now take (kb, prober, scanner) at
construction time. Their internals work over FileEntry snapshots and
SubtitleStreamInfo records — no more ad-hoc Path.is_file/iterdir/stat or
embedded subprocess.run loops. _count_entries now takes raw SRT text
(returned by scanner.read_text) so SRT-only entry counting stays out of
the FS layer.
manage_subtitles use case instantiates the two adapters once and injects
them into both services. Tests pass real adapters and patch
`alfred.infrastructure.probe.ffprobe_prober.subprocess.run` for the
ffprobe-failure cases. _classify_single tests build FileEntry via a
small helper.
Domain is now free of subprocess / direct filesystem reads in the
subtitle pipeline. The only remaining I/O hooks are FilePath VO
convenience methods (exists/is_file/is_dir) which stay as a deliberate
affordance on the value object.
The domain layer no longer reads YAML files. All knowledge loaders move
from `alfred/domain/*/knowledge/` to `alfred/infrastructure/knowledge/`:
domain/release/knowledge.py
→ infrastructure/knowledge/release.py
domain/shared/knowledge/language_registry.py
→ infrastructure/knowledge/language_registry.py
domain/subtitles/knowledge/{loader,base}.py
→ infrastructure/knowledge/subtitles/{loader,base}.py
Callers in domain/release/{services,value_objects}.py,
domain/subtitles/{aggregates,services/*}.py, and
application/filesystem/manage_subtitles.py updated to absolute imports.
Re-exports of KnowledgeLoader/SubtitleKnowledgeBase from
domain/subtitles/__init__.py dropped (no shim per project convention).
Tests follow the moved targets.
ParsedRelease.media_type is now MediaTypeToken (not str) and parse_path
is ParsePath (not str). __post_init__ keeps a tolerant constructor that
coerces raw strings via the enum, so callers passing 'movie'/'direct'
still work transparently. Since both enums inherit from str, existing
string comparisons and JSON serialization remain unchanged.
ParsedRelease accepted any string for media_type/parse_path and had no
validation on numeric ranges (season=-5 was silently accepted). Tighten
both ends:
- New str-backed Enums MediaTypeToken and ParsePath. Inherit from str so
every existing comparison ('== "movie"'), JSON serialization, and TMDB
DTO interop keeps working unchanged.
- ParsedRelease.__post_init__ now validates: raw/group non-empty, year in
1888-2100, season 0-100, episode 0-9999, episode_end >= episode,
media_type/parse_path against the enum allowlist.
- services.py uses the enum .value members everywhere instead of bare
string literals — kills the typo risk.
Two related DDD fixes for Movie and Episode entities:
- Identity equality: @dataclass(eq=False) with custom __eq__/__hash__.
Movie is identified by imdb_id, Episode by (season, episode) within
the TVShow aggregate. Auto-generated field-by-field equality was
incorrectly making two Movie instances with the same imdb_id but
different audio_tracks compare unequal — breaks dedup/caching.
- MediaWithTracks mixin: the 5 audio/subtitle helpers
(has_audio_in / audio_languages / has_subtitles_in / has_forced_subs /
subtitle_languages) were duplicated verbatim between Movie and Episode.
Extracted to shared/media/tracks_mixin.py; both entities now inherit.
Bonus: dropped the object.__setattr__ coercion dance in Movie.__post_init__
— the class isn't frozen so plain assignment is the right call.
AudioTrack, VideoTrack, SubtitleTrack and MediaInfo are snapshots of a
single ffprobe run — model them as proper immutable value objects.
- @dataclass(frozen=True) on all four
- MediaInfo track collections become tuple[...] instead of list[...]
- ffprobe adapter rewritten to build tuples up-front instead of
appending/setattr'ing on a constructed instance
SubtitleScanner was an earlier iteration superseded by SubtitleIdentifier
and never imported in production code (only by its own tests). Removing
both keeps the bounded context clean and shrinks the surface.
Low-risk cleanup items, no functional change to the parser. The
philosophy remains: keep the parser simple, the AI handles edge cases.
- Extract duplicated 'fs-safe title → dot-folder-name' regex into
to_dot_folder_name() in domain/shared/value_objects.py. Used by both
MovieTitle.normalized() and TVShow.get_folder_name() (item #5).
- ParsedRelease.languages now uses field(default_factory=list) instead
of a manual __post_init__ assigning [] via object.__setattr__ (#6).
- tv_shows/entities.py module docstring: prepend ASCII ownership tree
for quicker visual scan of the aggregate hierarchy (#7).
- file_extensions.yaml: split subtitle sidecars (.srt/.sub/.idx/.ass/.ssa)
into a dedicated 'subtitle:' category instead of lumping them under
'metadata:'. _METADATA_EXTENSIONS at the value_objects.py level remains
the union of both — detect_media_type behavior unchanged. New loader
load_subtitle_extensions() exposes the distinct subtitle set for future
callers in the subtitles domain (#20).
Suite: 1020 passed, 8 skipped.
10 pathological release names mined from the real downloads folder.
Each fixture locks in the current parse_release output (including
its silent losses and false positives) so future parser improvements
are intentional, not silent drift.
Cases:
- Khruangbin yt-dlp slug (UTF-8 wide pipe '|', YT ID as group)
- Deutschland 83-86-89 franchise box (group=S03 misdetection)
- Chérie Le BéBé (accented chars preserved, VFF language)
- Jimmy Carr 8-word stand-up special title
- [ OxTorrent.vc ] prefix + XviD codec (site_tag prefix)
- Prodiges S12E01 with episode title + air-date silently lost
- The Prodigy: apostrophe + Blu-ray dash + 1080i + multi-word audio
= full AI-path degeneration (everything UNKNOWN)
- Sleaford Mods yt-dlp slug (YT ID glued to year)
- Super Mario Bros [FR-EN] (bilingual tag mistaken for group)
- Gilmore Girls Complete S01-S07 (the well-behaved exception:
COMPLETE token correctly drives tv_complete + REPACK + 10bit)
Also adds shitty + path_of_pain to the per-bucket sanity assertion.
Suite: 1020 passed, 8 skipped.
Add 15 expected.yaml fixtures under tests/fixtures/releases/shitty/
covering the awkward but real-world release names from the downloads
folder. Each fixture locks in the current parse_release behavior so
future parser changes are intentional, not silent drift.
Cases captured:
- Angel INTEGRALE 3-level hierarchy (tv_complete media_type)
- Buffy custom French title with dots preserved
- Archer S14E09E10E11 multi-episode (E11 lost — tech debt)
- Notre Planète lowercase s01e01
- Vinyl ' - 1x01 - FHD' (stray dash artifact — tech debt)
- Deutschland.83 (year-suffix as part of title)
- Tatortreiniger S01-06 range (falls to movie — tech debt)
- Derry Girls duplicated title
- Jurassic Park bare folder (media_type=unknown)
- La Nuit au Musée bilingual MULTI
- Chérie j'ai agrandi (ASCII-stripped apostrophe, parses fine)
- Honey Don't (unescaped apostrophe — full AI-path degeneration)
- Hook MULTi.SUBS movie with Subs/ folder
- Predator Badlands space separators (group=UNKNOWN — tech debt)
- Westworld S04 Subs.Only (no video file)
Each fixture also captures the future 3-flow routing (library /
torrents / seed_hardlinks) ahead of the organize_media refactor.
Suite: 1011 passed, 8 skipped.
The blanket *.md ignore was hiding CHANGELOG.md, forcing 'git add -f' on
every update. Allow-list it so the file lives under normal git tracking.
CLAUDE.md stays local (user keeps it personal until a dedicated repo).
Captures 5 canonical releases from /mnt/testipool/downloads as parametrized
fixtures under tests/fixtures/releases/easy/. Each fixture declares the
release name, expected ParsedRelease fields, original tree, and the future
routing (library / torrents / seed_hardlinks) for the upcoming organize_media
refactor.
Today only the 'parsed' section is asserted; tree is materialized into a
tmp_path to catch typos. Routing is captured ahead of the planner work — it
becomes verifiable once organize_media lands.
Cases: back_in_action (movie), slow_horses_single_ep (TV single),
foundation_season_pack (S02 + .nfo noise), long_walk_with_noise (movie +
KONTRAST.TOP.txt), sinners_yts (YTS bracket-heavy + Subs/ dir).
Also tracks CHANGELOG.md under [Unreleased] / Added.
Multi-week sprint: ISO 639-2/B language unification, release parser
unification + data-driven tokenizer, removal of fossil services
(movies/tv_shows/subtitles), subtitle services split into a package,
MediaInfo split, test suite expansion (990 passing).
See CHANGELOG.md [Unreleased] for the user-facing summary.
Several weeks of work accumulated without being committed. Grouped here for
clarity; see CHANGELOG.md [Unreleased] for the user-facing summary.
Highlights
----------
P1 #2 — ISO 639-2/B canonical migration
- New Language VO + LanguageRegistry (alfred/domain/shared/knowledge/).
- iso_languages.yaml as single source of truth for language codes.
- SubtitleKnowledgeBase now delegates lookup to LanguageRegistry; subtitles.yaml
only declares subtitle-specific tokens (vostfr, vf, vff, …).
- SubtitlePreferences default → ["fre", "eng"]; subtitle filenames written as
{iso639_2b}.srt (legacy fr.srt still read via alias).
- Scanner: dropped _LANG_KEYWORDS / _SDH_TOKENS / _FORCED_TOKENS /
SUBTITLE_EXTENSIONS hardcoded dicts.
- Fixed: 'hi' token no longer marks SDH (conflicted with Hindi alias).
- Added settings.min_movie_size_bytes (was a module constant).
P1 #3 — Release parser unification + data-driven tokenizer
- parse_release() is now the single source of truth for release-name parsing.
- alfred/knowledge/release/separators.yaml declares the token separators used
by the tokenizer (., space, [, ], (, ), _). New conventions can be added
without code changes.
- Tokenizer now splits on any configured separator instead of name.split('.').
Releases like 'The Father (2020) [1080p] [WEBRip] [5.1] [YTS.MX]' parse via
the direct path without sanitization fallback.
- Site-tag extraction always runs first; well-formedness only rejects truly
forbidden chars.
- _parse_season_episode() extended with NxNN / NxNNxNN alt forms.
- Removed dead helpers: _sanitize, _normalize.
Domain cleanup
- Deleted fossil services with zero production callers:
alfred/domain/movies/services.py
alfred/domain/tv_shows/services.py
alfred/domain/subtitles/services.py (replaced by subtitles/services/ package)
alfred/domain/subtitles/repositories.py
- Split monolithic subtitle services into a package (identifier, matcher,
placer, pattern_detector, utils) + dedicated knowledge/ package.
- MediaInfo split into dedicated package (alfred/domain/shared/media/:
audio, video, subtitle, info, matching).
Persistence cleanup
- Removed dead JSON repositories (movie/subtitle/tvshow_repository.py).
Tests
- Major expansion of the test suite organized to mirror the source tree.
- Removed obsolete *_edge_cases test files superseded by structured tests.
- Suite: 990 passed, 8 skipped.
Misc
- .gitignore: exclude env_backup/ and *.bak.
- Adjustments across agent/llm, app.py, application/filesystem, and
infrastructure/filesystem to align with the new domain layout.
- Extract MetadataStore from SubtitleMetadataStore (alfred/infrastructure/metadata/).
Generic load/save + typed update helpers (update_parse, update_probe, update_tmdb)
for the per-release .alfred/metadata.yaml.
- SubtitleMetadataStore becomes a thin facade — owns subtitle_history shape,
delegates I/O to MetadataStore.
- Agent._execute_tool_call auto-persists successful analyze_release / probe_media /
find_media_imdb_id results to the release's .alfred file. find_media_imdb_id
follows release_focus when it has no path argument.
- New tools:
· read_release_metadata(release_path) — cacheable, key=release_path.
Returns the .alfred content or has_metadata=false.
· query_library(name) — substring scan across configured library roots.
- Both new tools added to CORE_TOOLS (always visible).
Adds two STM components and a transparent cache hook in the agent loop so
read-only tools don't re-do work the agent already did in this session.
New STM components:
- ToolResultsCache — {tool_name: {key: result}}, session-scoped.
to_dict() exposes only the key inventory (not payloads) to keep the
prompt cheap.
- ReleaseFocus — current_release_path + working_set list, updated
automatically when a path-keyed inspector runs.
YAML spec layer:
- New optional 'cache: { key: <param_name> }' block in ToolSpec.
- Validated at load time: cache.key must be a declared parameter.
- Surfaced on Tool dataclass as cache_key: str | None.
Agent._execute_tool_call:
- Pre-exec cache lookup; hit short-circuits and adds _from_cache=true.
- Post-exec: stores successful results, updates release_focus for
path-keyed tools, refreshes episodic.last_search_results when
find_torrent's hit served the response (so get_torrent_by_index
keeps pointing at the right list).
Cacheable tools (5): analyze_release, probe_media, list_folder,
find_media_imdb_id, find_torrent.
Adds YAML specs for the 14 tools that were still description-from-docstring:
filesystem:
- set_path_for_folder, list_folder, analyze_release, probe_media,
move_media, manage_subtitles, create_seed_links, learn
api:
- find_media_imdb_id, find_torrent, get_torrent_by_index,
add_torrent_to_qbittorrent, add_torrent_by_index
language:
- set_language
Each spec follows the established shape (summary / description /
when_to_use / when_not_to_use / next_steps / parameters with
why_needed + example / returns) and the Python function docstring is
slimmed to a one-line pointer.
Registry now reports: 21 tools, 21 with YAML spec, 0 doc-only.
The Workflow STM component stored an active workflow as
{type, target, stage, started_at}. Now that start_workflow takes a
workflow_name and a params dict, those keys match what they actually
hold:
type -> name (the YAML workflow name, e.g. media.organize_media)
target -> params (the dict passed to start_workflow)
ShortTermMemory.start_workflow parameters renamed accordingly. All
consumers (prompt builder workflow scope + STM context, start/end
workflow tools) updated.
Introduce a scope-aware agent so the LLM never sees the full 21-tool
catalog at once. The system prompt now describes either:
- idle mode: core noyau (5 tools: set_language, set_path_for_folder,
list_folder, start_workflow, end_workflow) + a list of available
workflows with their goals;
- active mode: the noyau plus the tools declared by the active
workflow's YAML, with the step plan inlined into the prompt.
Pieces:
- alfred/agent/tools/workflow.py: start_workflow / end_workflow tools
(with YAML specs under tools/specs/) that drive memory.stm.workflow.
- alfred/agent/prompt.py: CORE_TOOLS constant, visible_tool_names(),
filtered build_tools_spec() / _format_tools_description(), and a new
_format_workflow_scope() section in the system prompt.
- alfred/agent/agent.py: WorkflowLoader wired into Agent, defensive
out-of-scope check in _execute_tool_call.
- alfred/agent/registry.py: registers the two new meta-tools (21 total,
7 with YAML spec).
- workflows/media.organize_media.yaml: tools/steps list refreshed to
match the current resolver split (analyze_release, probe_media,
resolve_*_destination, move_to_destination).
Rename workflow files and their 'name' field with a 'media.' domain
prefix to anticipate future multi-domain expansion (mail.*, calendar.*, ...).
- organize_media -> media.organize_media
- manage_subtitles -> media.manage_subtitles
WorkflowLoader picks them up unchanged (uses data['name']).
The ParameterSchema / REQUIRED_PARAMETERS / get_missing_required_parameters
machinery in alfred/agent/parameters.py was used in early prototypes for
the prompt-required-params check but has been unwired from production for
several refactors. The new YAML tool-spec layer (alfred/agent/tools/specs/)
covers the same need (rich, LLM-facing parameter descriptions) without
the parallel registration plumbing.
Tests in tests/test_config_edge_cases.py still reference the deleted
module — left untouched per the project policy of treating test sync
as a dedicated end-of-week task.
Introduce a first-class semantic layer for tool descriptions, separated
from Python signatures (which stay the source of truth for types and
required-ness).
New
- alfred/agent/tools/spec.py — ToolSpec / ParameterSpec / ReturnsSpec
dataclasses with strict YAML validation (ToolSpecError on malformed
or inconsistent specs). compile_description() builds the rich text
passed to the LLM as Tool.description, with sections for summary,
description, when_to_use, when_not_to_use, next_steps, and returns.
compile_parameter_description() injects the 'why_needed' field next
to each parameter so the LLM sees the *intent* of each argument.
- alfred/agent/tools/spec_loader.py — discovers tools/specs/*.yaml,
enforces filename ↔ spec.name match, rejects duplicates.
- alfred/agent/tools/specs/ — one YAML per tool:
* resolve_season_destination.yaml
* resolve_episode_destination.yaml
* resolve_movie_destination.yaml
* resolve_series_destination.yaml
* move_to_destination.yaml
Refactor
- alfred/agent/registry.py
* _create_tool_from_function now takes an optional ToolSpec.
When provided, the long description + per-parameter descriptions
come from the spec; types and required-ness still come from the
Python signature.
* Cross-validates spec.parameters against the function signature —
crashes on missing or extra entries.
* make_tools() loads all specs at startup and hands the right one
to each tool. Tools without a spec fall back to the old
docstring-only behaviour, so the 14 not-yet-migrated tools keep
working unchanged.
* Adds 'array' and 'object' to the Python→JSON type mapping and
handles Optional[X] / X | None annotations.
- alfred/agent/tools/filesystem.py
* Drops the '_tool' suffix on the 4 resolve_* wrappers (option 1:
alias the use-case imports as _resolve_*). Tool names exposed to
the LLM now match the underlying use case verbatim.
* Wrapper docstrings shrink to a one-liner pointing to the YAML
spec — no more duplicated when_to_use/Args/Returns in Python.
Verified
- make_tools() loads 19 tools (5 with YAML spec, 14 doc-only).
- Compiled descriptions render cleanly with all sections.
- New _Clarification sentinel and _resolve_series_folder() helper —
the three TV use cases now share one matching/clarification path
instead of triplicating the same if/elif/else block.
- New _ResolvedDestinationBase carrying status/question/options/error/
message plus a _base_dict() helper; the four concrete DTOs only
declare their own ok-state fields and a slim to_dict().
- No behaviour change: same outputs for ok/needs_clarification/error
cases (verified by import + DTO smoke tests).
Destination resolution
- Replace the single ResolveDestinationUseCase with four dedicated
functions, one per release type:
resolve_season_destination (pack season, folder move)
resolve_episode_destination (single episode, file move)
resolve_movie_destination (movie, file move)
resolve_series_destination (multi-season pack, folder move)
- Each returns a dedicated DTO carrying only the fields relevant to
that release type — no more polymorphic ResolvedDestination with
half the fields unused depending on the case.
- Looser series folder matching: exact computed-name match is reused
silently; any deviation (different group, multiple candidates) now
prompts the user with all options including the computed name.
Agent tools
- Four new tools wrapping the use cases above; old resolve_destination
removed from the registry.
- New move_to_destination tool: create_folder + move, chained — used
after a resolve_* call to perform the actual relocation.
- Low-level filesystem_operations module (create_folder, move via mv)
for instant same-FS renames (ZFS).
Prompt & persona
- New PromptBuilder (alfred/agent/prompt.py) replacing prompts.py:
identity + personality block, situational expressions, memory
schema, episodic/STM/config context, tool catalogue.
- Per-user expression system: knowledge/users/common.yaml +
{username}.yaml are merged at runtime; one phrase per situation
(greeting/success/error/...) is sampled into the system prompt.
qBittorrent integration
- Credentials now come from settings (qbittorrent_url/username/password)
instead of hardcoded defaults.
- New client methods: find_by_name, set_location, recheck — the trio
needed to update a torrent's save path and re-verify after a move.
- Host→container path translation settings (qbittorrent_host_path /
qbittorrent_container_path) for docker-mounted setups.
Subtitles
- Identifier: strip parenthesized qualifiers (simplified, brazil…) at
tokenization; new _tokenize_suffix used for the episode_subfolder
pattern so episode-stem tokens no longer pollute language detection.
- Placer: extract _build_dest_name so it can be reused by the new
dry_run path in ManageSubtitlesUseCase.
- Knowledge: add yue, ell, ind, msa, rus, vie, heb, tam, tel, tha,
hin, ukr; add 'fre' to fra; add 'simplified'/'traditional' to zho.
Misc
- LTM workspace: add 'trash' folder slot.
- Default LLM provider switched to deepseek.
- testing/debug_release.py: CLI to parse a release, hit TMDB, and
dry-run the destination resolution end-to-end.