ParsedRelease.tech_string was a stored str field re-computed in two
places (assemble() at parse time, enrich_from_probe() after the probe).
The second site was a reactive fix (e79ca46) for filename builders that
saw a stale value. Turn it into an @property so it stays in sync with
quality/source/codec by construction.
- Drop the field from the dataclass + the key from assemble()'s dict.
- Drop tech_string="" from parse_release's malformed-name fallback.
- Drop the manual recomputation at the end of enrich_from_probe.
- Inject the property into asdict() result in the fixtures runner
(same treatment as is_season_pack).
- Update tests that passed tech_string= to the constructor; rewrite the
TestTechString case that mutated p.tech_string manually.
Consolidate the five domain-purity refactors of the session under
[Unreleased]: RuleScopeLevel enum, FilePath VO post_init, Language
strict + from_raw, ParsedRelease.normalised → clean, ParsedRelease
enum strictness. Removes the duplicate min_movie_size_bytes entry
(now sits under its proper Removed section).
The fields were already typed as MediaTypeToken / ParsePath, but a
tolerant __post_init__ coerced raw strings into their enum form. With
MediaTypeToken(str, Enum) (and ParsePath idem), the coercion served no
purpose — callers that pass '.value' got back the enum anyway, and
callers that pass an unknown string got a ValidationError just like
they would now.
Strict mode: constructor rejects non-enum values directly. The two
in-tree builders (parse_release() and the parser pipeline) already
produce enum values; all .value sites have been removed. Drops the
unused _VALID_MEDIA_TYPES / _VALID_PARSE_PATHS lookup tables.
Le champ s'appelait normalised mais ne faisait pas la normalisation
suggérée par son nom (dots instead of spaces). En pratique il contient
raw - site_tag - apostrophes, qui sert uniquement à season_folder_name()
via _strip_episode_from_normalized. Renommé en 'clean' qui décrit ce
qu'il contient réellement, docstring corrigée.
object.__setattr__ inside __post_init__ on a frozen dataclass is a
code smell — it bypasses the immutability guarantee to mutate fields
mid-construction. Split the responsibilities:
* Direct constructor is strict — rejects un-normalized input (uppercase
iso, whitespace in aliases, etc.) so once a Language exists in the
system, its fields are guaranteed canonical.
* Language.from_raw() factory handles arbitrary YAML/user input — it
lowercases the iso, dedups/normalizes aliases, then constructs.
Only caller that built from raw data (LanguageRegistry loading YAML)
moves to from_raw(). Test fixtures already pass normalized data so
they keep using the direct constructor.
Custom __init__ on a @dataclass(frozen=True) is a code smell — it
bypasses the generated dataclass __init__ and re-implements the
str/Path coercion + frozen-aware setattr by hand. Replaced with a
single __post_init__ that performs the same normalization. Same
public API (FilePath(str) and FilePath(Path) both work), same
behavior, no callers touched.
Six niveaux possibles (global, release_group, movie, show, season,
episode) étaient passés en str libre, le commentaire docstring servant
de seule documentation. Introduit RuleScopeLevel(str, Enum) — toujours
sérialisable en YAML, mais le set fixe est désormais imposé par le
typage. to_dict() sort explicitement .value pour rester safe côté
écrivains YAML.
Le champ + son validator étaient orphelins depuis la suppression
de MovieService.validate_movie_file. L'exclusion par extension
(application/release/supported_media.py) + le PoP couvrent désormais
la règle 'vrai film vs sample'. Si on a un jour besoin d'un seuil de
taille, il ira dans data/knowledge/, pas dans settings.
Six small files (audio, video, subtitle, info, matching, tracks_mixin
+ __init__) collapsed into one ~250 LoC media.py module. Python treats
media.py and media/__init__.py interchangeably, so the 12 import sites
that read 'from alfred.domain.shared.media import ...' continue to work
without changes.
Reasoning: the whole bounded context fits on one screen; splitting into
sub-modules added more navigation friction than it saved. Tests stay
green (1077 passed).
Apostrophes are in the forbidden-chars list, which made any release
with a title like "Don't" or "L'avare" short-circuit to the AI
fallback (parse_path=ai, everything UNKNOWN). They are now stripped
up front from the name before the well-formed check and tokenize,
so the parse completes normally. The raw name is preserved on the
VO; only the title field loses its apostrophe.
parse_path becomes 'sanitized' when an apostrophe was stripped, to
surface that the parser cleaned something up.
Fixtures updated:
- shitty/honey_uhd_hdr/ — went from total UNKNOWN to a clean parse
(title=Honey.Dont, year=2025, quality=2160p, source=WEBRip,
codec=x265, group=Amen).
- path_of_pain/the_prodigy_full_chaos/ — went from total failure to
partial success (title, year, source, codec extracted). Remaining
gaps (1080i, multi-word audio, Blu-ray-with-dash) are tracked
separately in tech debt.
`Der.Tatortreiniger.S01-06.GERMAN...` previously parsed as a movie
with 'S01-06' glued to the title. The parser now matches the
season-range form in _parse_season_episode (returning season=first,
episode=None), and the assemble step detects the range token to
promote media_type to 'tv_complete'.
The first season is exposed as `season` so `is_season_pack`
fires (season is not None and episode is None) — useful for routing
to a series root folder.
Fixture shitty/tatortreiniger_flat_multiseason/ updated:
- title: Der.Tatortreiniger.S01-06 → Der.Tatortreiniger
- season: null → 1
- media_type: movie → tv_complete
- is_season_pack: false → true
Releases using ' - ' as a separator (Vinyl - 1x01 - FHD) tokenize to
['Vinyl', '-', '1x01', '-', 'FHD'] — the standalone '-' tokens were
ending up in title_parts and leaked into the joined title
('Vinyl.-'). We can't add '-' to the separator list (it would break
codec-GROUP), so we filter at assembly: a TITLE token with no
alphanumeric characters carries no title content.
Side win: same logic eliminates the UTF-8 wide-pipe '|' from the
khruangbin_yt_wide_pipe fixture title.
Fixtures updated:
- shitty/vinyl_1x01_format/expected.yaml (title: Vinyl.- → Vinyl)
- path_of_pain/khruangbin_yt_wide_pipe/expected.yaml (| dropped)
S14E09E10E11 previously parsed to episode=9, episode_end=10 — E11
was silently dropped. The parser now takes episodes[-1] as
episode_end so the full chain is captured (episode=9, episode_end=11).
Intermediate values stay implied.
Fixture shitty/archer_multi_episode/ updated from anti-regression of
the bug to anti-regression of the fix.
Mirror the MediaProber / FilesystemScanner pattern for language lookup:
- New Protocol `LanguageRepository` in alfred.domain.shared.ports
covering from_iso, from_any, all, __contains__, __len__ — the
surface previously coupled to the concrete LanguageRegistry.
- SubtitleKnowledgeBase types its `language_registry` parameter
against the Protocol; the concrete LanguageRegistry stays in
infrastructure as the YAML-backed adapter and remains the default
when no repository is injected.
- New unit tests in tests/infrastructure/test_language_registry.py
cover the adapter surface (from_iso, from_any, membership,
case-insensitivity, non-string inputs).
Behaviour is unchanged for existing callers. The split opens the
door to in-memory fakes in future tests without loading the full
ISO 639 YAML.
Inspection pipeline groundwork:
- MediaProber.probe() port extension (full media inspection on the port)
- inspect_release orchestrator + InspectedResult frozen VO
- enrich_from_probe now refreshes tech_string
- resolve_*_destination use cases consume inspect_release
- detect_media_type & enrich_from_probe moved to application/release
Both helpers are inspection-pipeline pieces, not filesystem use cases —
they belong next to inspect_release, not next to move_media /
resolve_destination / list_folder.
The move also kills the lazy import that was hiding inside
_resolve_parsed: alfred.application.filesystem.resolve_destination
no longer triggers a cycle through alfred.application.filesystem
__init__ when loading inspect_release. Top-level import restored.
Call sites updated: inspect.py, test_detect_media_type.py,
test_enrich_from_probe.py, testing/recognize_folders_in_downloads.py.
Module docstrings + test-file docstrings updated to match the new
location.
The four resolve_*_destination use cases now route through a private
_resolve_parsed helper that picks the right entry point:
- source path provided AND it exists -> inspect_release(name, path)
runs the full pipeline (parse + media-type refinement + probe
+ enrich), so missing tech tokens (quality, codec, ...) get
filled by ffprobe and the refreshed tech_string lands in the
destination folder / file names.
- source path missing or absent -> parse_release(name) only,
same behavior as before. Back-compat: tests using fake /dl/*.mkv
paths still pass unchanged.
resolve_episode_destination / resolve_movie_destination reuse their
existing source_file parameter as the inspection target. The two
folder-move use cases (season / series) gain a new OPTIONAL
source_path parameter — threaded through the agent tool wrappers
and documented in the YAML specs.
The lazy import inside _resolve_parsed avoids a circular import:
inspect_release imports detect_media_type / enrich_from_probe from
the same application.filesystem package whose __init__ re-exports
resolve_destination.
Three new tests in TestProbeEnrichmentWiring with a stub MediaProber
prove the wiring: movie picks up probe quality, season picks it up
via source_path, and a missing path correctly skips probe (back-compat
guard).
enrich_from_probe fills None fields on ParsedRelease (quality, source,
codec, audio_*, languages) but left tech_string at its parser-time
value — so the filename builders (movie_folder_name, episode_filename,
…) saw stale tech tokens even after a successful probe.
Re-derive tech_string the same way the parser does — quality.source.codec
joined by dots, skipping None — at the end of enrich_from_probe. Token-
level values still win because enrich only fills None fields.
Four new tests in TestTechString cover: enrichment rebuilds it,
existing source survives, no-info input leaves it untouched, fully
empty parsed produces ''.
New application-layer entry point that composes the four inspection
layers in one call:
1. parse_release(name, kb) -> (ParsedRelease, ParseReport)
2. detect_media_type(parsed, path, kb) -> patch parsed.media_type
3. find_main_video(path, kb) -> Path | None (top-level scan)
4. prober.probe(video) + enrich -> when video exists and
media_type not in
{unknown, other}
Returns a frozen InspectedResult(parsed, report, source_path,
main_video, media_info, probe_used). kb and prober are injected — no
module-level singletons in inspect.py.
analyze_release tool now delegates to inspect_release; its output
gains two fields, confidence (0-100) and road (easy/shitty/path_of_pain),
surfaced from ParseReport so the LLM can route by confidence. Spec
updated to document them.
12 new tests covering happy paths, probe gating (no video, media_type
'other', probe failure), mutation contract (detect refining
parsed.media_type, enrich filling None fields), resilience
(nonexistent path), and frozen contract. Suite: 1058 passing.
Add probe(video) -> MediaInfo | None to the MediaProber Protocol and
implement it on FfprobeMediaProber. The standalone
alfred/infrastructure/filesystem/ffprobe.py module is removed; all
callers (analyze_release / probe_media tools, testing scripts) now go
through the adapter.
Tests for the probe path moved to tests/infrastructure/test_ffprobe_prober.py
(patching subprocess.run at the adapter module level).
Unblocks the upcoming inspect_release orchestrator, which needs the
port — not a free function — to compose parse + main-video selection
+ probe in one shot.
Add the application-layer helpers that decide which files are worth
parsing, sitting one notch above parse_release.
- is_supported_video(path, kb): extension-only check against
kb.video_extensions. Lowercased suffix lookup. Directories and
broken symlinks return False.
- find_main_video(folder, kb): top-level scan only (no recursion into
subdirectories — releases that wrap their video in Sample/ are
PATH_OF_PAIN territory). Lexicographically-first eligible file wins
when several qualify (deterministic, no size-based ranking). A bare
file as folder argument is supported for single-file releases.
No size threshold and no filename heuristics ('sample' / 'trailer'):
the parser's job is to extract structure, not to second-guess
non-standard release shapes. PoP catches the rest.
17 tests under tests/application/test_supported_media.py.
Wire the scoring foundations into the parser entry point. parse_release
now returns a tuple — the structural ParsedRelease and a diagnostic
ParseReport carrying confidence (0-100), road
(EASY / SHITTY / PATH_OF_PAIN), the residual UNKNOWN tokens, and the
list of critical fields that couldn't be filled.
EASY is decided structurally (a group schema matched), independently
of the score. SHITTY vs PATH_OF_PAIN is decided by score against the
60 cutoff from scoring.yaml. Malformed names (forbidden chars) emit a
zero-confidence PoP report and short-circuit to parse_path=AI as
before.
ParsePath stays as-is (DIRECT / SANITIZED / AI) — it records *how* we
tokenized, not how confident we are. The two dimensions are now
properly separated.
Call sites propagated:
- alfred/application/filesystem/resolve_destination.py (4 occurrences)
- alfred/agent/tools/filesystem.py
- tests/domain/test_release.py
- tests/domain/test_release_fixtures.py
- tests/application/test_detect_media_type.py
New tests/domain/release/test_parser_v2_scoring.py (22 cases) locks
ParseReport validation, compute_score arithmetic, decide_road
thresholding, the collector helpers, and the end-to-end tuple contract.
Add the building blocks for Phase A scoring without yet wiring them
into parse_release. Nothing changes at runtime — parse_release still
returns a single ParsedRelease — but the pieces needed to upgrade it
in a follow-up commit are now in place.
- alfred/knowledge/release/scoring.yaml: weights / penalties /
thresholds. Title and media_type are heavy (30 / 20), structural
fields medium (year 15, season 10), tech fields light (5 each).
Unknown-token penalty 5 capped at -30. SHITTY/PoP cutoff at 60.
- load_scoring() loader with safe defaults baked in: a missing or
partial YAML only de-tunes, never breaks.
- ReleaseKnowledge port grows a 'scoring: dict' field. YamlReleaseKnowledge
populates it from load_scoring().
- New parser/scoring.py module with Road enum (EASY / SHITTY /
PATH_OF_PAIN, distinct from ParsePath which records the tokenization
route), and pure functions: compute_score, decide_road,
collect_unknown_tokens, collect_missing_critical.
- ParseReport frozen VO in value_objects.py — exported alongside
ParsedRelease.
Replace the ~480-line legacy heuristic block in services.py with a
small dict-driven pass in pipeline._annotate_shitty: each token is
looked up against the kb buckets (resolutions / sources / codecs /
distributors / year / sxxexx) with first-match-wins semantics, the
leftmost contiguous UNKNOWN run becomes the title, done.
SHITTY's scope is intentionally narrow — releases that *look* like
scene names but don't have a registered group schema. Anything more
exotic (parenthesized tech, bare-dashed title fragments, YT slugs,
franchise boxes) is PATH OF PAIN territory and stays out of here.
- annotate() no longer returns None; SHITTY is the always-on fallback
- services.py shrunk from ~525 to ~85 lines (legacy extractors gone)
- 4 fixtures get xfail markers documenting PoP-grade pathologies
(deutschland franchise box, sleaford YT slug, super_mario bilingual,
predator space-separators — the last one moved from shitty/ → pop/)
- ReleaseFixture grows xfail_reason; the parametrized suite wires the
pytest.mark.xfail(strict=False) automatically
Introduce a separate dimension for streaming-platform tags (NF, AMZN,
DSNP, HMAX, ATVP, …) so they stop polluting the encoding-source field.
WEB-DL is the source; the platform that released it is the distributor.
- new distributors.yaml knowledge file
- ReleaseKnowledge port exposes distributors set
- TokenRole.DISTRIBUTOR + ParsedRelease.distributor field
- removed NF/AMZN/DSNP/HMAX/ATVP from sources.yaml
- notre_planete fixture now records distributor: NF
The EASY pipeline now extracts the full ParsedRelease surface from
known-group releases, not just the structural backbone. Behavior is
unchanged for releases that don't carry these tokens.
Pipeline (parser/pipeline.py):
- Structural walk (renamed _annotate_structural): no longer requires
body to be fully consumed. Tokens passed over between schema chunks
remain UNKNOWN so the enricher pass can claim them.
- _find_chunk(): scans forward in the body for the next token matching
a given role, skipping already-annotated tokens. Lets optional and
mandatory chunks both tolerate intercalated enricher tokens.
- _annotate_enrichers(): new non-positional pass. Walks UNKNOWN tokens
and tags AUDIO_CODEC / AUDIO_CHANNELS / BIT_DEPTH / HDR / EDITION /
LANGUAGE. Multi-token sequences from kb.audio / kb.video_meta /
kb.editions are matched first (longest-first ordering preserved from
the YAML), single tokens after.
- _apply_sequences(): mutates the token list, tagging the first token
of a matched sequence with extra['sequence']=<canonical value> and
trailing members with extra['sequence_member']='True' so assemble
skips them.
- _detect_channel_pairs(): handles the '5.1' / '7.1' case where the
'.' separator splits the layout into two tokens. Strips a trailing
'-GROUP' suffix on the second before joining.
Assemble:
- New fields populated: languages (list), audio_codec, audio_channels,
bit_depth, hdr_format, edition. Each role-handler skips
sequence_member tokens.
- media_type heuristic extended: edition in {COMPLETE, INTEGRALE,
COLLECTION} + no season → tv_complete (mirrors legacy).
Tests:
- 4 new TestEnrichers cases covering bit_depth+audio_codec+channels,
HDR sequence + edition sequence + TrueHD.Atmos + 7.1, multi-language
with DTS-HD.MA sequence, TV episode with single language.
- All 14 v2 tests + 30 fixture tests still green. Suite: 1011 passed,
8 skipped.
Refs: project_release_parser_v2_specs (memory)
The annotate-based v2 pipeline now handles releases ending in -KONTRAST,
-ELiTE, or -RARBG. Unknown groups still fall through to the legacy
SHITTY heuristic in services.py — nothing changes for them.
Pipeline (alfred/domain/release/parser/pipeline.py):
- tokenize(): string-ops separator split, strips [site.tag] first.
- annotate(): right-to-left group detection (priority to codec-GROUP
shape, fallback to any non-source dashed token), GroupSchema lookup
via the kb port, then lockstep walk of tokens against schema chunks.
Optional chunks skip on mismatch, mandatory mismatches return None so
the caller falls back gracefully. CODEC pre-consumed by a codec-GROUP
trailing token correctly skips the CODEC chunk in the body walk.
- assemble(): folds annotated tokens into a ParsedRelease-compatible
dict (title joined by '.', group from the codec-GROUP token's extras).
Schema (alfred/domain/release/parser/schema.py):
- GroupSchema + SchemaChunk frozen value objects.
- TokenRole.GROUP added.
Port + adapter:
- ReleaseKnowledge.group_schema(name) lookup added (case-insensitive).
- YamlReleaseKnowledge loads alfred/knowledge/release/release_groups/
*.yaml at construction time; learned overrides in
data/knowledge/release/release_groups/ also picked up.
Knowledge:
- release_groups/kontrast.yaml, elite.yaml, rarbg.yaml declare the
canonical chunk_order. ELiTE marks source as optional (Foundation.S02
has no WEBRip token).
Services:
- parse_release tries the v2 path first; on None falls through to the
legacy implementation untouched.
Tests:
- tests/domain/release/test_parser_v2_easy.py (10 cases) cover group
detection (codec-GROUP, dashed-source skip, no-dash → unknown),
schema-driven annotation (movie, TV episode, season pack with
optional source, unknown group returns None), and field assembly.
- Existing tests/domain/test_release_fixtures.py (30 cases) stay green:
5 EASY fixtures now produced by v2, 25 SHITTY/PATH OF PAIN fixtures
still produced by the legacy path. Verified via spy on v2.assemble.
Suite: 1007 passed, 8 skipped.
Refs: project_release_parser_v2_specs (memory)
New package alfred/domain/release/parser/ lays the foundation for the
release parser refactor (specs in memory). Exposes:
- Token: frozen VO carrying text + stream index + TokenRole + extra dict.
with_role() returns a new instance (no mutation).
- TokenRole: str-backed enum split into structural (TITLE/YEAR/SEASON_EP/
GROUP), technical (RESOLUTION/SOURCE/CODEC/AUDIO_*/BIT_DEPTH/HDR/
EDITION/LANGUAGE), and meta (SITE_TAG/UNKNOWN) families.
- pipeline.strip_site_tag(): pulls a [site.tag] prefix or suffix.
- pipeline.tokenize(): release name -> list[Token] (all UNKNOWN),
string-ops split on kb.separators (no regex, per CLAUDE.md).
- pipeline.annotate(): documented stub. Walk order recorded in docstring
(group right-to-left, then season/episode, year, tech, title).
Legacy parse_release in release.services remains the live implementation
until the annotate step lands. Scaffolding tests verify Token API,
site-tag stripping (prefix/suffix), and tokenize output shape.
Refs: project_release_parser_v2_specs (memory)
Final DDD purification of the release parser. Domain layer no longer
imports anything from infrastructure, no YAML at import time, and
ParsedRelease's filesystem-builders are pure (Option B).
- ReleaseKnowledge Protocol port + YamlReleaseKnowledge adapter
- parse_release(name, kb) explicit injection
- ParsedRelease.title_sanitized field; builders accept already-safe strings
- Callers (resolve_destination, detect_media_type, find_video,
analyze_release) thread the kb through
- 987 tests pass
- test_release.py / test_release_fixtures.py: module-level
_KB = YamlReleaseKnowledge() + thin _parse(name) helper threading it
into parse_release. test_show_folder_name_strips_windows_chars renamed
to test_show_folder_name_uses_already_safe_title to reflect the
Option B contract (caller sanitizes via kb.sanitize_for_fs).
- test_detect_media_type.py: same _KB pattern, all
detect_media_type(parsed, path) calls now pass kb.
- test_filesystem_extras.py: find_video_file(path) calls now pass kb.
- test_enrich_from_probe.py: _bare() helper adds the new
title_sanitized field.
- test_resolve_destination.py: drop _sanitize import + TestSanitize
class (helper deleted), add tmdb_title_safe arg to
_resolve_series_folder calls.
987 passed, 8 skipped.
Wires the new explicit-kb signatures into every caller of the release
parser and the filesystem-extension helpers.
- application/filesystem/resolve_destination.py: module-level singleton
_KB: ReleaseKnowledge = YamlReleaseKnowledge(); each use case now calls
parse_release(release_name, _KB) and sanitizes TMDB strings via
_KB.sanitize_for_fs(...) before passing them to the pure ParsedRelease
builders. Local _sanitize helper + _WIN_FORBIDDEN regex dropped.
- application/filesystem/detect_media_type.py: signature is now
detect_media_type(parsed, source_path, kb); uses kb.metadata_extensions,
kb.video_extensions, kb.non_video_extensions.
- infrastructure/filesystem/find_video.py: find_video_file(path, kb) uses
kb.video_extensions instead of an imported constant.
- agent/tools/filesystem.py::analyze_release imports the application _KB
singleton and passes it through to parse_release / detect_media_type /
find_video_file.
Removes the last domain → infrastructure leak in the release parser.
services.py:
- parse_release(name, kb) takes the knowledge as an explicit parameter.
- Every helper (_tokenize, _is_well_formed, _extract_tech,
_extract_languages, _extract_audio, _extract_video_meta,
_extract_edition, _extract_title, _infer_media_type) takes kb.
- No more module-level YAML loading.
value_objects.py — Option B:
- Sanitization happens once at parse time; ParsedRelease now carries
a title_sanitized: str field alongside title.
- Builder methods (show_folder_name, episode_filename, movie_folder_name,
movie_filename) become pure: they accept already-sanitized
tmdb_title_safe / tmdb_episode_title_safe arguments. Callers at the
use-case boundary sanitize via kb.sanitize_for_fs(...) before passing in.
- All domain-knowledge constants removed (_RESOLUTIONS, _SOURCES, _CODECS,
_AUDIO, _VIDEO_META, _EDITIONS, _HDR_EXTRA, _MEDIA_TYPE_TOKENS,
_LANGUAGE_TOKENS, _FORBIDDEN_CHARS, _*_EXTENSIONS, _WIN_FORBIDDEN_TABLE,
_sanitize_for_fs). The module is now pure DDD.
Adds the port/adapter pair that lets the release domain consume parsing
knowledge without importing infrastructure or loading YAML at import time.
- alfred/domain/release/ports/knowledge.py declares the read-only query
surface: token sets (resolutions, sources, codecs, language_tokens,
forbidden_chars, hdr_extra), structured dicts (audio, video_meta,
editions, media_type_tokens), separators list, file-extension sets,
and sanitize_for_fs(text).
- alfred/infrastructure/knowledge/release_kb.py loads every YAML once
at construction and exposes them as attributes, with an immutable
str.maketrans table backing sanitize_for_fs.
No domain code is wired to the port yet — that lands in the next commit.
Extract all I/O (subprocess, filesystem, YAML loading) from the domain
layer via ports/adapters. domain/subtitles/ now has zero imports from
infrastructure/. The remaining domain → infra leak (release knowledge
loaded at import time) is documented in tech-debt for a dedicated branch.
Domain services (SubtitleIdentifier, PatternDetector) used to import the
concrete SubtitleKnowledgeBase class directly from infrastructure for
their type hint. With this commit they depend on a structural Protocol
in alfred/domain/subtitles/ports/knowledge.py declaring just the 7
read-only query methods the domain actually consumes.
The concrete YAML-backed SubtitleKnowledgeBase in infrastructure remains
the sole adapter — no rename, no shim. With this change
alfred/domain/subtitles/ has zero imports from alfred/infrastructure/.
Also extend the changelog entry covering the full domain-io-extraction
branch.
aggregates.py used to call SubtitleKnowledgeBase().default_rules() via a
DEFAULT_RULES() helper, which silently pulled the infrastructure layer
(YAML loader) into the domain on every resolve.
Make the dependency explicit: resolve() now takes the default rules as
a parameter, and the caller (the ManageSubtitles use case) loads them
from the KB once and passes them in. Domain stays I/O-free.
- Drop DEFAULT_RULES helper and the SubtitleKnowledgeBase import from
alfred/domain/subtitles/aggregates.py
- SubtitleRuleSet.resolve(default_rules: SubtitleMatchingRules)
- manage_subtitles use case passes kb.default_rules() at the call site
- Tests use a local SubtitleMatchingRules stand-in instead of relying
on KB defaults
The placer performs filesystem I/O (os.link) — it belongs in the
application layer, not the domain. Domain services should be pure.
- Move alfred/domain/subtitles/services/placer.py to
alfred/application/subtitles/placer.py
- Move tests/domain/test_subtitle_placer.py to
tests/application/test_subtitle_placer.py
- Update all callers (manage_subtitles use case, metadata store, tests)
- Drop placer re-exports from domain.subtitles.services.__init__
DDD-pure cleanup — entities and value objects no longer query the world
at read time.
FilePath: drop .exists() / .is_file() / .is_dir(). The VO is now a
pure address; ask the injected FilesystemScanner for live state.
Movie: drop .has_file() / .is_downloaded(). Invariant: when the
application sets file_path, it has already constated the file
exists; downstream readers trust the snapshot.
Episode: same — drop .has_file() / .is_downloaded().
SubtitlePlacer: drop the pre-check .exists() calls. The placer now
attempts os.link() and reports FileNotFoundError / FileExistsError
as skip reasons. Removes a TOCTOU race as a bonus.
Tests adjusted: the FilePath VO method tests are gone (the methods are
gone), test_has_file_false_when_no_path replaced by a plain assertion
on file_path is None. Placer tests are unchanged — the skip-reason
strings ('not found', 'already exists') match the new try/except paths.
The 'snapshot value objects' pattern (ProbedMediaInfo, TmdbMovieInfo)
that this cleanup enables is documented in refactor_domain_io.md, to
be applied when a future use case actually needs richer metadata —
not now, no speculative VOs.
Domain services no longer call subprocess or pathlib directly. Introduces
two Protocol ports in domain/shared/ports/:
MediaProber.list_subtitle_streams(video) -> list[SubtitleStreamInfo]
FilesystemScanner.scan_dir / stat / read_text -> list[FileEntry] | ...
Concrete adapters live in infrastructure/:
FfprobeMediaProber (wraps subprocess + ffprobe + JSON)
PathlibFilesystemScanner (wraps pathlib + os reads)
SubtitleIdentifier and PatternDetector now take (kb, prober, scanner) at
construction time. Their internals work over FileEntry snapshots and
SubtitleStreamInfo records — no more ad-hoc Path.is_file/iterdir/stat or
embedded subprocess.run loops. _count_entries now takes raw SRT text
(returned by scanner.read_text) so SRT-only entry counting stays out of
the FS layer.
manage_subtitles use case instantiates the two adapters once and injects
them into both services. Tests pass real adapters and patch
`alfred.infrastructure.probe.ffprobe_prober.subprocess.run` for the
ffprobe-failure cases. _classify_single tests build FileEntry via a
small helper.
Domain is now free of subprocess / direct filesystem reads in the
subtitle pipeline. The only remaining I/O hooks are FilePath VO
convenience methods (exists/is_file/is_dir) which stay as a deliberate
affordance on the value object.