Parallel to sync_show. Calls TMDBClient.get_movie_info,
combines the TmdbMovieInfo with the on-disk MovieRelease loaded
via DotAlfredMovieReleaseRepository.load_by_tmdb_id, and upserts
into DotAlfredMovieLibraryIndex.
Policy mirrors sync_show with two adaptations specific to movies:
* placeholder signature is name == metadata.path (auto-heal writes
them equal — the schema requires name to be non-empty so we can't
use name == "" as the spec originally suggested),
* when the per-movie sidecar is gone but the index entry remains,
sync warns and returns the existing entry unchanged (no upsert
possible without a release: index.upsert requires folder/imdb_id
from the MovieRelease itself).
Raises MovieNotFoundInLibrary when neither index nor sidecar
carry tmdb_id.
New orchestrator alfred.application.tv_shows.sync.sync_show calls
TMDBClient.get_tv_show_info, combines the response with the on-disk
release loaded via DotAlfredSeriesReleaseRepository.load_by_tmdb_id,
and upserts the result into DotAlfredTVShowLibraryIndex.
Policy:
* placeholders (auto-healed entries, status=="unknown") always
refresh regardless of TTL,
* fresh entries within Settings.tmdb_cache_ttl_days are no-ops,
* stale entries past TTL refresh,
* force=True overrides both gates,
* indexed shows whose per-show sidecar is gone still get a fresh
TMDB pass — slot map clears until rescan repopulates it,
* truly absent shows raise ShowNotFoundInLibrary from the new
alfred.application.exceptions module.
Series repo returns (release, folder) so the upcoming sync
orchestrator can feed the library index's upsert(..., path=...).
Movie repo returns the release alone (folder is on release.folder
by the one-folder-one-file convention) — kept as a semantic alias
of find_by_tmdb_id for symmetry with the series side.
Symmetric to TmdbShowInfo / get_tv_show_info — gives the upcoming
sync_movie orchestrator a typed cache snapshot for the v2 movie
library index.
* TmdbMovieInfo(tmdb_id, imdb_id, title, release_year)
* parse_movie_info(details, external_ids) — pure builder, parses
release_year from the first 4 chars of release_date (None on
missing/empty/non-numeric)
* TMDBClient.get_movie_info(tmdb_id) — aggregates
/movie/{id} + /movie/{id}/external_ids and feeds the parser
Tests cover happy path, missing/null/empty imdb_id, every
release_year edge (none/empty/short/non-numeric/missing key),
and the two required-field errors (id, title).
The Phase 4 walker + rescan logic classified seasons by parser
output (does the filename carry Exx?), but PACK vs EPISODIC is a
structural distinction:
* PACK = season folder with N flat SxxEyy videos directly inside
* EPISODIC = season folder with N subfolders, each holding one video
Changes:
* walker.py: descends two levels under show_root and classifies
each season folder by FS structure. SeasonFolder now carries
mode: ReleaseMode | None. Mixed layouts (flat + subfolders) and
EPISODIC subfolders with >1 video log a warning and report
mode=None.
* rescan.py: trusts walker.mode; drops the bogus 'single un-
numbered video → PACK with empty episodes' branch. A season
with no parseable episodes is now skipped with a warning.
* Tests rewritten against the real model: PACK with flat numbered
files, EPISODIC with one-video-per-subfolder, malformed mixed
layout skipped, single-un-numbered-file skipped.
Suite: 1237 → 1245 passing.
Two small additions that close out Phase 4's loose ends.
Settings — tmdb_cache_ttl_days
class Settings(BaseSettings):
# --- DOT_ALFRED ---
tmdb_cache_ttl_days: int = 14
Default 14 days, matching the dot_alfred_v2 master spec. Will drive
the Phase 5 TTL policy on TVShowLibraryIndexSidecar /
MovieLibraryIndexSidecar (decide when a TMDB-cached entry is stale
and triggers a refresh sync).
Anchor-mismatch warning
DotAlfredTVShowLibraryIndex._load_or_heal and DotAlfredMovieLibraryIndex
._load_or_heal now cross-check each indexed entry's metadata.path
against the on-disk folder layout right after a successful parse.
Drift (sidecar says folder X, X no longer exists under library_root)
is surfaced as a WARNING log — one per missing folder, with the
tmdb_id for cross-reference. No auto-heal on drift; the caller
decides (the heal path remains opt-in via index.heal()).
The warning fires only on the parsed-index path. The heal path
always synthesizes entries from real folder names, so it can never
drift — silent by construction.
Tests
* TestTVShowLibraryIndexAnchorWarning — 3 scenarios:
warn-on-drift / no-warn-on-match / no-warn-on-heal.
* TestMovieLibraryIndexAnchorWarning — symmetric coverage.
Full suite: 1237 passed / 8 skipped / 4 xfailed.
Now that rescan_show + rescan_movie run on the v2 release repositories
(Phase 4 Steps 1-2), the v1 dot_alfred stack and its abstract domain
ports have zero callers. Delete them and lift the Phase 3 quarantines.
Deleted
* alfred/infrastructure/persistence/dot_alfred/bridge.py
* alfred/infrastructure/persistence/dot_alfred/repository.py (v1)
* alfred/infrastructure/persistence/dot_alfred/serializer.py (v1)
* alfred/infrastructure/persistence/dot_alfred/sidecar.py (v1)
* alfred/domain/tv_shows/repositories.py (TVShowRepository ABC)
* alfred/domain/movies/repositories.py (MovieRepository ABC)
* tests/infrastructure/persistence/dot_alfred/test_repository.py
* tests/infrastructure/persistence/dot_alfred/test_serializer.py
Rewrite
alfred/infrastructure/persistence/dot_alfred/__init__.py now re-
exports only the v2 surface: the four concrete repositories
(DotAlfredSeriesReleaseRepository, DotAlfredMovieReleaseRepository,
DotAlfredTVShowLibraryIndex, DotAlfredMovieLibraryIndex) plus
ShowFolderUnknown. DTO-level imports go through
alfred.infrastructure.persistence.dot_alfred.v2 directly.
No backwards-compat shims (per CLAUDE.md): the v1 names are gone,
not aliased. Test suite drops from 10 → 8 skips (the two Phase 3
module-level skips disappear with the quarantined files).
Full suite: 1233 passed / 8 skipped / 4 xfailed.
The MediaWithTracks mixin in alfred.domain.shared.media is now
orphaned (Episode lost its tracks in Phase 3, MovieRelease doesn't
inherit it). Parked for Phase 5, which will either mount it on
MovieRelease / SeasonRelease or delete it for good.
Mirror rescan_show for the movies library. Locates the main video via
find_video_file, runs inspect_release once (movies are one-folder-one-
main-file by convention), and writes a v2 MovieRelease sidecar via
DotAlfredMovieReleaseRepository.
Signature
rescan_movie(
movie_dir,
*,
tmdb_id: TmdbId,
imdb_id: ImdbId | None = None,
movie_repo: DotAlfredMovieReleaseRepository,
prober,
kb,
) -> MovieRelease
Behavior
* added_at = datetime.now(UTC) — the v2 sidecar records when the
release was last reconciled with disk, not filesystem mtime (which
drifts across moves and hard-links). Phase 3 made this field
required on MovieRelease.
* No TMDB call. Index auto-heals from the new sidecar on next read.
* MovieRescanFailed raised when no video is found inside movie_dir
(only explicit failure mode; all other adapter errors degrade
gracefully into empty / partial fields).
* file_path is recorded relative to movie_dir so the sidecar stays
portable across library moves.
Tests
tests/application/movies/test_rescan.py: 8 scenarios on the real v2
movie repo + real KB + stubbed prober. Covers track flattening,
sidecar round-trip, prober returning None, video in subfolder,
explicit no-video failure, imdb_id optional.
Full suite: 1233 passed / 10 skipped / 4 xfailed.
Rewrite rescan_show to build a SeriesRelease (Phase 1 v2 aggregate)
and persist it via DotAlfredSeriesReleaseRepository. The orchestrator
keeps reusing inspect_release as the single source of parse/probe
truth — only the assembly target changes (SeriesRelease/SeasonRelease/
EpisodeRelease instead of TVShow/Season/Episode).
New signature
rescan_show(
show_root,
*,
tmdb_id: TmdbId,
imdb_id: ImdbId | None = None,
series_repo: DotAlfredSeriesReleaseRepository,
scanner,
prober,
kb,
) -> SeriesRelease
Identity is TMDB-anchored (tmdb_id required, no coercion); imdb_id is
optional. No TMDB call from rescan — the library index auto-heals
from the new sidecar on its next read.
PACK vs EPISODIC
* Single-video + season-parsed + no-episode → SeasonRelease(
mode=PACK, folder=<season folder>, episodes=()). The slot map stays
empty until the Phase 5 TMDB sync supplies episode_count. We do
not fabricate an EpisodeRange we cannot prove on disk.
* Otherwise → EPISODIC: every file with (season, episode) becomes an
EpisodeRelease with EpisodeRange(start, end) = (E, E). Multi-episode
files (S01E01E02) still record only the first slot — Parser does
not yet expose episode_end (existing tech debt, unchanged).
Package move
The orchestrator moves from alfred/application/library/ to
alfred/application/tv_shows/ for symmetry with alfred/application/
movies/ (Step 2). walker.py + its tests move with it. The empty
library/ package is deleted.
Tests
tests/application/tv_shows/test_rescan.py rewritten end-to-end on
the real v2 repository, real KB, real scanner, stubbed prober.
9 happy-path + edge-case scenarios cover EPISODIC track flattening,
PACK empty-episodes semantics, sidecar round-trip, imdb_id optional,
empty show root, season folder with no videos, prober returning None.
test_walker.py moved verbatim (import path updated).
Full suite: 1214 passed / 10 skipped / 4 xfailed. The three v1
dot_alfred quarantines from Phase 3 stay in place until Step 3.
Phase 3 prep: Movie aggregate is about to become TMDB-only (no
filesystem fields). added_at is a release-time observation, not a
TMDB-aggregate concern, so it moves to MovieRelease +
MovieReleaseSidecar.
- Add added_at: datetime (required) to MovieRelease with a
type-check in __post_init__.
- Add added_at: datetime (required) to MovieReleaseSidecar.
- Bump SCHEMA_VERSION 1 → 2 with a version-history note.
- Bridge round-trips added_at via Pydantic mode="json" (datetime
→ ISO 8601 string).
- Tests: update MovieRelease fixtures, add a validator test, add
an added_at round-trip test, switch hard-coded `1` assertions
to SCHEMA_VERSION for future-proofing.
No v1 sidecars in the wild yet — no migration code needed.
First step of specs/dot_alfred_v2.md. Introduces a separate bounded
context (alfred/domain/releases/) for the filesystem-side aggregates,
disjoint from TMDB identity which stays in tv_shows/ and movies/.
The link between the two worlds is TmdbId, used as the natural key
in the persistence layer (no domain-level reference).
New package alfred/domain/releases/:
- value_objects: EpisodeRange (covers SxxE01E02E03 multi-episode
files via start/end inclusive range, with count/numbers/is_single
helpers), ReleaseMode enum (PACK = N video files direct in the
season folder, EPISODIC = N sub-folders).
- entities: TrackProfile, EpisodeRelease, SeasonRelease (with
episode_count() summing each EpisodeRange.count()), SeriesRelease
(tmdb_id primary anchor, optional imdb_id secondary), MovieRelease.
All frozen dataclasses.
- builders: SeasonReleaseBuilder + SeriesReleaseBuilder mirroring
the v1 TVShowBuilder pattern. Builders sort episodes by range
start on emit and reject overlapping ranges (two files claiming
the same TMDB slot). from_existing() seeds a builder from an
existing frozen aggregate for round-trip edits.
- repositories: abstract ports (SeriesReleaseRepository,
MovieReleaseRepository); concrete .alfred sidecar impls arrive
in Phase 2.
New shared VO alfred/domain/shared/value_objects.py::TmdbId — positive
int, rejects bool/str/float, symmetric with the existing ImdbId VO.
73 unit tests cover VO validation, entity invariants, builder sort
+ overlap detection, and from_existing() round-trips.
v1 code paths are untouched at this stage; the new domain coexists
with the old TVShow aggregate until Phase 3 refactors it.
Step 4 of specs/dot_alfred.md — rebuild a TVShow aggregate from disk
by reusing the existing release pipeline (inspect_release) on every
video file in a show folder, then persist via the .alfred repository.
- alfred/application/library/walker.py — pure structural walk
(season folders detected via \bS\d{1,2}\b regex, video files
filtered against kb.video_extensions, no recursion).
- alfred/application/library/rescan.py — orchestrator that ingests
each season folder, infers PACK vs EPISODIC from on-disk file
count + parser output, and assembles via TVShowBuilder. Episode
paths stored relative to show_root. Logs + skips corrupt input
(no season parsed, mixed season numbers, unparseable episodes).
- Season now inherits MediaWithTracks: PACK seasons carry
season-level audio_tracks / subtitle_tracks; EPISODIC seasons
leave them empty (tracks live per-episode). SeasonBuilder gains
set_audio_tracks / set_subtitle_tracks; bridge writes/reads them
in the PACK branch via shared _synth_* helpers.
Out of scope, tracked as tech debt: adjacent .srt capture, multi-
episode (episode_end), TMDB-driven PACK detection (the current
heuristic '1 file == PACK' is a placeholder until ShowTracker lands).
18 new tests (11 walker + 7 rescan integration) on tmp_path with
the Foundation layout. Full suite: 1149 passed.
Step 3 of specs/dot_alfred.md. Concrete TVShowRepository
implementation reading and writing per-show .alfred YAML files under
a configurable library_root. Writes are atomic (.alfred.tmp +
os.replace), reads tolerate corrupted/wrong-schema sidecars (log +
skip), and the repo never invents a folder name — save(show)
requires the target folder to exist beforehand (raises
ShowFolderUnknown otherwise), matching the spec's
MediaOrganizer-then-sidecar split.
Cold folders without a sidecar are skipped by find_all and yield
None from find_by_imdb_id — the upcoming rescan_show tool (step 4)
will own the opt-in rebuild path.
A small bridge module translates between the rich domain TVShow
(AudioTrack/SubtitleTrack with full ffprobe minutiae) and the
compact sidecar shape (language-only audio, embedded-only subs with
type derived from is_forced). The bridge is intentionally lossy on
probe details the sidecar does not store, per the spec's
factual-only philosophy.
20 integration tests on tmp_path: round-trip save/find,
cold-folder/unknown-id returns, find_all skipping
(corrupted/schema-violating sidecars), delete/exists, atomic write
(no .alfred.tmp leftover), overwrite, and folder-name fallbacks
(get_folder_name guess + full-scan rescue when renamed).
Step 2 of the specs/dot_alfred.md plan. Pure-dict in/out
(serialize(sidecar) -> dict, deserialize(data) -> ShowSidecar);
YAML I/O lives in the repository layer (step 3) and is kept out
for trivial testability.
DTOs mirror the YAML schema field-for-field:
- ShowSidecar (root: imdb_id, tmdb_id, schema_version, seasons)
- SeasonSidecar (number, path, optional audio/subtitles, optional episodes)
- EpisodeSidecar (number, path, optional audio/subtitles)
- SubtitleEntry (language, source, type)
The sidecar acts as a scan cache: it stores only what is genuinely
costly to recompute — folder/file paths (skipping the FS walk) and
probed track metadata (skipping ffprobe). Release identifiers
(group, source, quality, codec) live in folder/file names and are
derived on demand by the parser; they are deliberately absent from
the schema and rejected as unknown keys on deserialize.
The serializer is strict on schema: unknown keys at any level raise
SidecarSchemaError, missing required fields raise clearly, and bool
cannot sneak in as a season/episode number. Optional fields
(tmdb_id, empty audio/subtitles/episodes) are omitted from the
output rather than emitted as null / [].
Tests cover round-trip equivalence (DTO → dict → DTO and DTO → YAML
text → DTO), the Foundation S01 PACK case (real-world fixture with
mixed sub types — superset captured at season scope), and a
Breaking Bad S05 EPISODIC case. An on-disk tmp_path fixture
recreates the Foundation folder structure with placeholder files,
ready to be reused by the upcoming repository walk tests in step 3.
The TVShow aggregate is now fully immutable. TVShow, Season and Episode
are @dataclass(frozen=True), children stored as ordered tuples sorted
by number. All construction goes through TVShowBuilder / SeasonBuilder
(new module), which expose from_existing() to seed from a current
frozen aggregate and apply modifications.
ShowTracker-territory fields are stripped from the domain: ShowStatus,
CollectionStatus, expected_seasons/episodes, aired_episodes,
collection_status(), is_complete_series(), missing_episodes(),
is_ongoing(), is_ended(), Season.name, the aired<=expected validation,
and the TMDB status string mapping. These will reappear in a dedicated
ShowTracker layer (to be designed) combining the .alfred sidecar with
live TMDB data.
New SeasonMode enum (PACK / EPISODIC) computed at read time from the
season's structural shape — never stored, the YAML sidecar encodes the
mode via presence/absence of the episodes: block.
Test suite for the domain entirely rewritten to cover frozen invariants,
builder ordering, last-write-wins, from_existing round-trip, and
SeasonMode derivation. Full suite still green (1078 passed).
The old name conflated 'might become a placed subtitle' with 'what a
scan pass produced'. The class is the output of a scan/identify pass —
language/format may still be None while classification is in progress,
confidence reflects classifier certainty, raw_tokens holds filename
fragments under analysis. SubtitleScanResult says that directly.
Pure rename + refreshed docstring; no behavior change. Touches the
domain entity, the matcher/identifier/utils services, the
manage_subtitles use case, the placer, the metadata store, the
shared-media cross-ref comment, and 7 test modules.
Add a derived 'recommended_action' property on InspectedResult that
collapses the orchestrator's go / wait / skip decision into one value:
- 'skip' → no main_video, or media_type == 'other'
- 'ask_user' → media_type == 'unknown', or road == 'path_of_pain'
- 'process' → confident parse with a main video on disk
The ordering is part of the contract (skip > ask_user > process) —
documented in the property docstring.
Until now every consumer (workflows, the agent, the orchestrator
sketch) had to re-derive this from the road / media_type / main_video
triple, with subtle drift between sites. One place, one rule.
Exposed through the analyze_release tool so the LLM can route on it.
Spec YAML updated to describe the new field.
Suite: 1083 passed (+6 new tests in tests/application/test_inspect.py
covering the four branches and the precedence rules).
ParsedRelease is now @dataclass(frozen=True). The enrichment passes that
used to patch fields in place now produce new instances:
- enrich_from_probe(parsed, info, kb) returns a new ParsedRelease via
dataclasses.replace (no allocation when no field changed).
- inspect_release rebinds 'parsed' after detect_media_type (wrapped in
MediaTypeToken — the strict isinstance check now also runs on
replace) and after enrich_from_probe.
languages becomes a tuple[str, ...] so the VO is properly immutable.
Parser pipeline packs languages as a tuple in the assemble dict.
Callers updated: inspect_release, testing/recognize_folders_in_downloads.py.
Tests updated: 22 enrich_from_probe call sites rebound, language
assertions switched to tuple literals, test_release_fixtures normalizes
result['languages'] back to list for YAML-fixture comparison.
Suite: 1077 passed.
Remove the module-level _KB / _PROBER singletons from
alfred/application/filesystem/resolve_destination.py. The four
resolve_{season,episode,movie,series}_destination use cases now take
kb: ReleaseKnowledge and prober: MediaProber as required arguments,
matching the shape of inspect_release.
The singletons now live at the agent-tools frontier
(alfred/agent/tools/filesystem.py), where the LLM-facing wrappers
instantiate YamlReleaseKnowledge / FfprobeMediaProber once and thread
them through. The wrappers' Python signatures are unchanged — the
inspect-based JSON-schema generator in agent/registry.py still sees the
same LLM-passable params.
analyze_release drops the dirty 'from ... import _KB' indirection.
Tests inject their own stubs by keyword (prober=_StubProber(...)) via
thin convenience wrappers, replacing the prior
monkeypatch.setattr(rd, '_PROBER', ...) pattern.
testing/debug_release.py: instantiate YamlReleaseKnowledge() /
FfprobeMediaProber() inline at the two call sites.
Suite: 1077 passed.
ParsePath collided with pathlib.Path in mental models, and was one
letter from the parse_path attribute that stores its value — confusion
on confusion. Road (EASY/SHITTY/PATH_OF_PAIN) is the parser-confidence
axis; TokenizationRoute (DIRECT/SANITIZED/AI) is the tokenization-method
axis. They're orthogonal and the new name makes that obvious.
Field name parse_path stays — it's the right name for the attribute
that *holds* the route. String values ("direct", "sanitized", "ai")
stay too, so YAML fixtures and the analyze_release tool spec are
unchanged. Only the type symbol changes:
- value_objects.py: class rename + docstring spelling out orthogonality
with Road.
- services.py: 3 call sites.
- scoring.py: docstring cross-reference updated.
- tests/domain/release/test_parser_v2_scoring.py: import + 3 call sites.
The three module-level dicts in enrich_from_probe (ffprobe codec name
to scene token, channel count to layout) were exactly the kind of
domain lookup table CLAUDE.md says belongs in YAML, not in Python.
Move them to alfred/knowledge/release/probe_mappings.yaml, load
through a new ReleaseKnowledge.probe_mappings port field, and add a
kb parameter to enrich_from_probe so the consumer reads the maps via
the same injection pattern as everything else.
- New knowledge file: alfred/knowledge/release/probe_mappings.yaml
- New loader: load_probe_mappings() in infrastructure/knowledge/release.py
(normalizes channel-count keys back to int).
- Port: ReleaseKnowledge gains probe_mappings: dict.
- Adapter: YamlReleaseKnowledge populates it at __init__.
- Consumer: enrich_from_probe(parsed, info, kb) reads the three sub-maps
from kb.probe_mappings; unknown codecs still fall back to uppercase
raw value, same behaviour as before.
- Call sites updated: inspect_release passes kb through; the testing
script gets its kb wiring (it was already broken since the
ReleaseKnowledge refactor); all 22 enrich_from_probe call sites in
tests/application/test_enrich_from_probe.py pass _KB.
ParsedRelease.tech_string was a stored str field re-computed in two
places (assemble() at parse time, enrich_from_probe() after the probe).
The second site was a reactive fix (e79ca46) for filename builders that
saw a stale value. Turn it into an @property so it stays in sync with
quality/source/codec by construction.
- Drop the field from the dataclass + the key from assemble()'s dict.
- Drop tech_string="" from parse_release's malformed-name fallback.
- Drop the manual recomputation at the end of enrich_from_probe.
- Inject the property into asdict() result in the fixtures runner
(same treatment as is_season_pack).
- Update tests that passed tech_string= to the constructor; rewrite the
TestTechString case that mutated p.tech_string manually.
The fields were already typed as MediaTypeToken / ParsePath, but a
tolerant __post_init__ coerced raw strings into their enum form. With
MediaTypeToken(str, Enum) (and ParsePath idem), the coercion served no
purpose — callers that pass '.value' got back the enum anyway, and
callers that pass an unknown string got a ValidationError just like
they would now.
Strict mode: constructor rejects non-enum values directly. The two
in-tree builders (parse_release() and the parser pipeline) already
produce enum values; all .value sites have been removed. Drops the
unused _VALID_MEDIA_TYPES / _VALID_PARSE_PATHS lookup tables.
Le champ s'appelait normalised mais ne faisait pas la normalisation
suggérée par son nom (dots instead of spaces). En pratique il contient
raw - site_tag - apostrophes, qui sert uniquement à season_folder_name()
via _strip_episode_from_normalized. Renommé en 'clean' qui décrit ce
qu'il contient réellement, docstring corrigée.
Six niveaux possibles (global, release_group, movie, show, season,
episode) étaient passés en str libre, le commentaire docstring servant
de seule documentation. Introduit RuleScopeLevel(str, Enum) — toujours
sérialisable en YAML, mais le set fixe est désormais imposé par le
typage. to_dict() sort explicitement .value pour rester safe côté
écrivains YAML.
Apostrophes are in the forbidden-chars list, which made any release
with a title like "Don't" or "L'avare" short-circuit to the AI
fallback (parse_path=ai, everything UNKNOWN). They are now stripped
up front from the name before the well-formed check and tokenize,
so the parse completes normally. The raw name is preserved on the
VO; only the title field loses its apostrophe.
parse_path becomes 'sanitized' when an apostrophe was stripped, to
surface that the parser cleaned something up.
Fixtures updated:
- shitty/honey_uhd_hdr/ — went from total UNKNOWN to a clean parse
(title=Honey.Dont, year=2025, quality=2160p, source=WEBRip,
codec=x265, group=Amen).
- path_of_pain/the_prodigy_full_chaos/ — went from total failure to
partial success (title, year, source, codec extracted). Remaining
gaps (1080i, multi-word audio, Blu-ray-with-dash) are tracked
separately in tech debt.
`Der.Tatortreiniger.S01-06.GERMAN...` previously parsed as a movie
with 'S01-06' glued to the title. The parser now matches the
season-range form in _parse_season_episode (returning season=first,
episode=None), and the assemble step detects the range token to
promote media_type to 'tv_complete'.
The first season is exposed as `season` so `is_season_pack`
fires (season is not None and episode is None) — useful for routing
to a series root folder.
Fixture shitty/tatortreiniger_flat_multiseason/ updated:
- title: Der.Tatortreiniger.S01-06 → Der.Tatortreiniger
- season: null → 1
- media_type: movie → tv_complete
- is_season_pack: false → true
Releases using ' - ' as a separator (Vinyl - 1x01 - FHD) tokenize to
['Vinyl', '-', '1x01', '-', 'FHD'] — the standalone '-' tokens were
ending up in title_parts and leaked into the joined title
('Vinyl.-'). We can't add '-' to the separator list (it would break
codec-GROUP), so we filter at assembly: a TITLE token with no
alphanumeric characters carries no title content.
Side win: same logic eliminates the UTF-8 wide-pipe '|' from the
khruangbin_yt_wide_pipe fixture title.
Fixtures updated:
- shitty/vinyl_1x01_format/expected.yaml (title: Vinyl.- → Vinyl)
- path_of_pain/khruangbin_yt_wide_pipe/expected.yaml (| dropped)
S14E09E10E11 previously parsed to episode=9, episode_end=10 — E11
was silently dropped. The parser now takes episodes[-1] as
episode_end so the full chain is captured (episode=9, episode_end=11).
Intermediate values stay implied.
Fixture shitty/archer_multi_episode/ updated from anti-regression of
the bug to anti-regression of the fix.
Mirror the MediaProber / FilesystemScanner pattern for language lookup:
- New Protocol `LanguageRepository` in alfred.domain.shared.ports
covering from_iso, from_any, all, __contains__, __len__ — the
surface previously coupled to the concrete LanguageRegistry.
- SubtitleKnowledgeBase types its `language_registry` parameter
against the Protocol; the concrete LanguageRegistry stays in
infrastructure as the YAML-backed adapter and remains the default
when no repository is injected.
- New unit tests in tests/infrastructure/test_language_registry.py
cover the adapter surface (from_iso, from_any, membership,
case-insensitivity, non-string inputs).
Behaviour is unchanged for existing callers. The split opens the
door to in-memory fakes in future tests without loading the full
ISO 639 YAML.
Both helpers are inspection-pipeline pieces, not filesystem use cases —
they belong next to inspect_release, not next to move_media /
resolve_destination / list_folder.
The move also kills the lazy import that was hiding inside
_resolve_parsed: alfred.application.filesystem.resolve_destination
no longer triggers a cycle through alfred.application.filesystem
__init__ when loading inspect_release. Top-level import restored.
Call sites updated: inspect.py, test_detect_media_type.py,
test_enrich_from_probe.py, testing/recognize_folders_in_downloads.py.
Module docstrings + test-file docstrings updated to match the new
location.
The four resolve_*_destination use cases now route through a private
_resolve_parsed helper that picks the right entry point:
- source path provided AND it exists -> inspect_release(name, path)
runs the full pipeline (parse + media-type refinement + probe
+ enrich), so missing tech tokens (quality, codec, ...) get
filled by ffprobe and the refreshed tech_string lands in the
destination folder / file names.
- source path missing or absent -> parse_release(name) only,
same behavior as before. Back-compat: tests using fake /dl/*.mkv
paths still pass unchanged.
resolve_episode_destination / resolve_movie_destination reuse their
existing source_file parameter as the inspection target. The two
folder-move use cases (season / series) gain a new OPTIONAL
source_path parameter — threaded through the agent tool wrappers
and documented in the YAML specs.
The lazy import inside _resolve_parsed avoids a circular import:
inspect_release imports detect_media_type / enrich_from_probe from
the same application.filesystem package whose __init__ re-exports
resolve_destination.
Three new tests in TestProbeEnrichmentWiring with a stub MediaProber
prove the wiring: movie picks up probe quality, season picks it up
via source_path, and a missing path correctly skips probe (back-compat
guard).
enrich_from_probe fills None fields on ParsedRelease (quality, source,
codec, audio_*, languages) but left tech_string at its parser-time
value — so the filename builders (movie_folder_name, episode_filename,
…) saw stale tech tokens even after a successful probe.
Re-derive tech_string the same way the parser does — quality.source.codec
joined by dots, skipping None — at the end of enrich_from_probe. Token-
level values still win because enrich only fills None fields.
Four new tests in TestTechString cover: enrichment rebuilds it,
existing source survives, no-info input leaves it untouched, fully
empty parsed produces ''.
New application-layer entry point that composes the four inspection
layers in one call:
1. parse_release(name, kb) -> (ParsedRelease, ParseReport)
2. detect_media_type(parsed, path, kb) -> patch parsed.media_type
3. find_main_video(path, kb) -> Path | None (top-level scan)
4. prober.probe(video) + enrich -> when video exists and
media_type not in
{unknown, other}
Returns a frozen InspectedResult(parsed, report, source_path,
main_video, media_info, probe_used). kb and prober are injected — no
module-level singletons in inspect.py.
analyze_release tool now delegates to inspect_release; its output
gains two fields, confidence (0-100) and road (easy/shitty/path_of_pain),
surfaced from ParseReport so the LLM can route by confidence. Spec
updated to document them.
12 new tests covering happy paths, probe gating (no video, media_type
'other', probe failure), mutation contract (detect refining
parsed.media_type, enrich filling None fields), resilience
(nonexistent path), and frozen contract. Suite: 1058 passing.
Add probe(video) -> MediaInfo | None to the MediaProber Protocol and
implement it on FfprobeMediaProber. The standalone
alfred/infrastructure/filesystem/ffprobe.py module is removed; all
callers (analyze_release / probe_media tools, testing scripts) now go
through the adapter.
Tests for the probe path moved to tests/infrastructure/test_ffprobe_prober.py
(patching subprocess.run at the adapter module level).
Unblocks the upcoming inspect_release orchestrator, which needs the
port — not a free function — to compose parse + main-video selection
+ probe in one shot.
Add the application-layer helpers that decide which files are worth
parsing, sitting one notch above parse_release.
- is_supported_video(path, kb): extension-only check against
kb.video_extensions. Lowercased suffix lookup. Directories and
broken symlinks return False.
- find_main_video(folder, kb): top-level scan only (no recursion into
subdirectories — releases that wrap their video in Sample/ are
PATH_OF_PAIN territory). Lexicographically-first eligible file wins
when several qualify (deterministic, no size-based ranking). A bare
file as folder argument is supported for single-file releases.
No size threshold and no filename heuristics ('sample' / 'trailer'):
the parser's job is to extract structure, not to second-guess
non-standard release shapes. PoP catches the rest.
17 tests under tests/application/test_supported_media.py.
Wire the scoring foundations into the parser entry point. parse_release
now returns a tuple — the structural ParsedRelease and a diagnostic
ParseReport carrying confidence (0-100), road
(EASY / SHITTY / PATH_OF_PAIN), the residual UNKNOWN tokens, and the
list of critical fields that couldn't be filled.
EASY is decided structurally (a group schema matched), independently
of the score. SHITTY vs PATH_OF_PAIN is decided by score against the
60 cutoff from scoring.yaml. Malformed names (forbidden chars) emit a
zero-confidence PoP report and short-circuit to parse_path=AI as
before.
ParsePath stays as-is (DIRECT / SANITIZED / AI) — it records *how* we
tokenized, not how confident we are. The two dimensions are now
properly separated.
Call sites propagated:
- alfred/application/filesystem/resolve_destination.py (4 occurrences)
- alfred/agent/tools/filesystem.py
- tests/domain/test_release.py
- tests/domain/test_release_fixtures.py
- tests/application/test_detect_media_type.py
New tests/domain/release/test_parser_v2_scoring.py (22 cases) locks
ParseReport validation, compute_score arithmetic, decide_road
thresholding, the collector helpers, and the end-to-end tuple contract.
Replace the ~480-line legacy heuristic block in services.py with a
small dict-driven pass in pipeline._annotate_shitty: each token is
looked up against the kb buckets (resolutions / sources / codecs /
distributors / year / sxxexx) with first-match-wins semantics, the
leftmost contiguous UNKNOWN run becomes the title, done.
SHITTY's scope is intentionally narrow — releases that *look* like
scene names but don't have a registered group schema. Anything more
exotic (parenthesized tech, bare-dashed title fragments, YT slugs,
franchise boxes) is PATH OF PAIN territory and stays out of here.
- annotate() no longer returns None; SHITTY is the always-on fallback
- services.py shrunk from ~525 to ~85 lines (legacy extractors gone)
- 4 fixtures get xfail markers documenting PoP-grade pathologies
(deutschland franchise box, sleaford YT slug, super_mario bilingual,
predator space-separators — the last one moved from shitty/ → pop/)
- ReleaseFixture grows xfail_reason; the parametrized suite wires the
pytest.mark.xfail(strict=False) automatically
Introduce a separate dimension for streaming-platform tags (NF, AMZN,
DSNP, HMAX, ATVP, …) so they stop polluting the encoding-source field.
WEB-DL is the source; the platform that released it is the distributor.
- new distributors.yaml knowledge file
- ReleaseKnowledge port exposes distributors set
- TokenRole.DISTRIBUTOR + ParsedRelease.distributor field
- removed NF/AMZN/DSNP/HMAX/ATVP from sources.yaml
- notre_planete fixture now records distributor: NF
The EASY pipeline now extracts the full ParsedRelease surface from
known-group releases, not just the structural backbone. Behavior is
unchanged for releases that don't carry these tokens.
Pipeline (parser/pipeline.py):
- Structural walk (renamed _annotate_structural): no longer requires
body to be fully consumed. Tokens passed over between schema chunks
remain UNKNOWN so the enricher pass can claim them.
- _find_chunk(): scans forward in the body for the next token matching
a given role, skipping already-annotated tokens. Lets optional and
mandatory chunks both tolerate intercalated enricher tokens.
- _annotate_enrichers(): new non-positional pass. Walks UNKNOWN tokens
and tags AUDIO_CODEC / AUDIO_CHANNELS / BIT_DEPTH / HDR / EDITION /
LANGUAGE. Multi-token sequences from kb.audio / kb.video_meta /
kb.editions are matched first (longest-first ordering preserved from
the YAML), single tokens after.
- _apply_sequences(): mutates the token list, tagging the first token
of a matched sequence with extra['sequence']=<canonical value> and
trailing members with extra['sequence_member']='True' so assemble
skips them.
- _detect_channel_pairs(): handles the '5.1' / '7.1' case where the
'.' separator splits the layout into two tokens. Strips a trailing
'-GROUP' suffix on the second before joining.
Assemble:
- New fields populated: languages (list), audio_codec, audio_channels,
bit_depth, hdr_format, edition. Each role-handler skips
sequence_member tokens.
- media_type heuristic extended: edition in {COMPLETE, INTEGRALE,
COLLECTION} + no season → tv_complete (mirrors legacy).
Tests:
- 4 new TestEnrichers cases covering bit_depth+audio_codec+channels,
HDR sequence + edition sequence + TrueHD.Atmos + 7.1, multi-language
with DTS-HD.MA sequence, TV episode with single language.
- All 14 v2 tests + 30 fixture tests still green. Suite: 1011 passed,
8 skipped.
Refs: project_release_parser_v2_specs (memory)
The annotate-based v2 pipeline now handles releases ending in -KONTRAST,
-ELiTE, or -RARBG. Unknown groups still fall through to the legacy
SHITTY heuristic in services.py — nothing changes for them.
Pipeline (alfred/domain/release/parser/pipeline.py):
- tokenize(): string-ops separator split, strips [site.tag] first.
- annotate(): right-to-left group detection (priority to codec-GROUP
shape, fallback to any non-source dashed token), GroupSchema lookup
via the kb port, then lockstep walk of tokens against schema chunks.
Optional chunks skip on mismatch, mandatory mismatches return None so
the caller falls back gracefully. CODEC pre-consumed by a codec-GROUP
trailing token correctly skips the CODEC chunk in the body walk.
- assemble(): folds annotated tokens into a ParsedRelease-compatible
dict (title joined by '.', group from the codec-GROUP token's extras).
Schema (alfred/domain/release/parser/schema.py):
- GroupSchema + SchemaChunk frozen value objects.
- TokenRole.GROUP added.
Port + adapter:
- ReleaseKnowledge.group_schema(name) lookup added (case-insensitive).
- YamlReleaseKnowledge loads alfred/knowledge/release/release_groups/
*.yaml at construction time; learned overrides in
data/knowledge/release/release_groups/ also picked up.
Knowledge:
- release_groups/kontrast.yaml, elite.yaml, rarbg.yaml declare the
canonical chunk_order. ELiTE marks source as optional (Foundation.S02
has no WEBRip token).
Services:
- parse_release tries the v2 path first; on None falls through to the
legacy implementation untouched.
Tests:
- tests/domain/release/test_parser_v2_easy.py (10 cases) cover group
detection (codec-GROUP, dashed-source skip, no-dash → unknown),
schema-driven annotation (movie, TV episode, season pack with
optional source, unknown group returns None), and field assembly.
- Existing tests/domain/test_release_fixtures.py (30 cases) stay green:
5 EASY fixtures now produced by v2, 25 SHITTY/PATH OF PAIN fixtures
still produced by the legacy path. Verified via spy on v2.assemble.
Suite: 1007 passed, 8 skipped.
Refs: project_release_parser_v2_specs (memory)
New package alfred/domain/release/parser/ lays the foundation for the
release parser refactor (specs in memory). Exposes:
- Token: frozen VO carrying text + stream index + TokenRole + extra dict.
with_role() returns a new instance (no mutation).
- TokenRole: str-backed enum split into structural (TITLE/YEAR/SEASON_EP/
GROUP), technical (RESOLUTION/SOURCE/CODEC/AUDIO_*/BIT_DEPTH/HDR/
EDITION/LANGUAGE), and meta (SITE_TAG/UNKNOWN) families.
- pipeline.strip_site_tag(): pulls a [site.tag] prefix or suffix.
- pipeline.tokenize(): release name -> list[Token] (all UNKNOWN),
string-ops split on kb.separators (no regex, per CLAUDE.md).
- pipeline.annotate(): documented stub. Walk order recorded in docstring
(group right-to-left, then season/episode, year, tech, title).
Legacy parse_release in release.services remains the live implementation
until the annotate step lands. Scaffolding tests verify Token API,
site-tag stripping (prefix/suffix), and tokenize output shape.
Refs: project_release_parser_v2_specs (memory)
- test_release.py / test_release_fixtures.py: module-level
_KB = YamlReleaseKnowledge() + thin _parse(name) helper threading it
into parse_release. test_show_folder_name_strips_windows_chars renamed
to test_show_folder_name_uses_already_safe_title to reflect the
Option B contract (caller sanitizes via kb.sanitize_for_fs).
- test_detect_media_type.py: same _KB pattern, all
detect_media_type(parsed, path) calls now pass kb.
- test_filesystem_extras.py: find_video_file(path) calls now pass kb.
- test_enrich_from_probe.py: _bare() helper adds the new
title_sanitized field.
- test_resolve_destination.py: drop _sanitize import + TestSanitize
class (helper deleted), add tmdb_title_safe arg to
_resolve_series_folder calls.
987 passed, 8 skipped.
aggregates.py used to call SubtitleKnowledgeBase().default_rules() via a
DEFAULT_RULES() helper, which silently pulled the infrastructure layer
(YAML loader) into the domain on every resolve.
Make the dependency explicit: resolve() now takes the default rules as
a parameter, and the caller (the ManageSubtitles use case) loads them
from the KB once and passes them in. Domain stays I/O-free.
- Drop DEFAULT_RULES helper and the SubtitleKnowledgeBase import from
alfred/domain/subtitles/aggregates.py
- SubtitleRuleSet.resolve(default_rules: SubtitleMatchingRules)
- manage_subtitles use case passes kb.default_rules() at the call site
- Tests use a local SubtitleMatchingRules stand-in instead of relying
on KB defaults
The placer performs filesystem I/O (os.link) — it belongs in the
application layer, not the domain. Domain services should be pure.
- Move alfred/domain/subtitles/services/placer.py to
alfred/application/subtitles/placer.py
- Move tests/domain/test_subtitle_placer.py to
tests/application/test_subtitle_placer.py
- Update all callers (manage_subtitles use case, metadata store, tests)
- Drop placer re-exports from domain.subtitles.services.__init__
DDD-pure cleanup — entities and value objects no longer query the world
at read time.
FilePath: drop .exists() / .is_file() / .is_dir(). The VO is now a
pure address; ask the injected FilesystemScanner for live state.
Movie: drop .has_file() / .is_downloaded(). Invariant: when the
application sets file_path, it has already constated the file
exists; downstream readers trust the snapshot.
Episode: same — drop .has_file() / .is_downloaded().
SubtitlePlacer: drop the pre-check .exists() calls. The placer now
attempts os.link() and reports FileNotFoundError / FileExistsError
as skip reasons. Removes a TOCTOU race as a bonus.
Tests adjusted: the FilePath VO method tests are gone (the methods are
gone), test_has_file_false_when_no_path replaced by a plain assertion
on file_path is None. Placer tests are unchanged — the skip-reason
strings ('not found', 'already exists') match the new try/except paths.
The 'snapshot value objects' pattern (ProbedMediaInfo, TmdbMovieInfo)
that this cleanup enables is documented in refactor_domain_io.md, to
be applied when a future use case actually needs richer metadata —
not now, no speculative VOs.