Commit Graph

173 Commits

Author SHA1 Message Date
francwa 5d7b214af2 Merge branch 'refactor/language-port' 2026-05-20 23:20:18 +02:00
francwa 18267d0165 refactor(language): LanguageRepository port + SubtitleKnowledgeBase wired to it
Mirror the MediaProber / FilesystemScanner pattern for language lookup:

- New Protocol `LanguageRepository` in alfred.domain.shared.ports
  covering from_iso, from_any, all, __contains__, __len__ — the
  surface previously coupled to the concrete LanguageRegistry.
- SubtitleKnowledgeBase types its `language_registry` parameter
  against the Protocol; the concrete LanguageRegistry stays in
  infrastructure as the YAML-backed adapter and remains the default
  when no repository is injected.
- New unit tests in tests/infrastructure/test_language_registry.py
  cover the adapter surface (from_iso, from_any, membership,
  case-insensitivity, non-string inputs).

Behaviour is unchanged for existing callers. The split opens the
door to in-memory fakes in future tests without loading the full
ISO 639 YAML.
2026-05-20 23:18:25 +02:00
francwa 19fe8a519a Merge branch 'feat/release-inspect-orchestrator'
Inspection pipeline groundwork:
- MediaProber.probe() port extension (full media inspection on the port)
- inspect_release orchestrator + InspectedResult frozen VO
- enrich_from_probe now refreshes tech_string
- resolve_*_destination use cases consume inspect_release
- detect_media_type & enrich_from_probe moved to application/release
2026-05-20 09:31:22 +02:00
francwa a0d1846ff2 refactor(release): move detect_media_type & enrich_from_probe to application/release
Both helpers are inspection-pipeline pieces, not filesystem use cases —
they belong next to inspect_release, not next to move_media /
resolve_destination / list_folder.

The move also kills the lazy import that was hiding inside
_resolve_parsed: alfred.application.filesystem.resolve_destination
no longer triggers a cycle through alfred.application.filesystem
__init__ when loading inspect_release. Top-level import restored.

Call sites updated: inspect.py, test_detect_media_type.py,
test_enrich_from_probe.py, testing/recognize_folders_in_downloads.py.
Module docstrings + test-file docstrings updated to match the new
location.
2026-05-20 09:29:58 +02:00
francwa 0fb59a4581 feat(filesystem): wire inspect_release into resolve_destination
The four resolve_*_destination use cases now route through a private
_resolve_parsed helper that picks the right entry point:

  - source path provided AND it exists -> inspect_release(name, path)
    runs the full pipeline (parse + media-type refinement + probe
    + enrich), so missing tech tokens (quality, codec, ...) get
    filled by ffprobe and the refreshed tech_string lands in the
    destination folder / file names.

  - source path missing or absent       -> parse_release(name) only,
    same behavior as before. Back-compat: tests using fake /dl/*.mkv
    paths still pass unchanged.

resolve_episode_destination / resolve_movie_destination reuse their
existing source_file parameter as the inspection target. The two
folder-move use cases (season / series) gain a new OPTIONAL
source_path parameter — threaded through the agent tool wrappers
and documented in the YAML specs.

The lazy import inside _resolve_parsed avoids a circular import:
inspect_release imports detect_media_type / enrich_from_probe from
the same application.filesystem package whose __init__ re-exports
resolve_destination.

Three new tests in TestProbeEnrichmentWiring with a stub MediaProber
prove the wiring: movie picks up probe quality, season picks it up
via source_path, and a missing path correctly skips probe (back-compat
guard).
2026-05-20 09:26:30 +02:00
francwa e79ca462b8 fix(release): refresh tech_string after enrich_from_probe
enrich_from_probe fills None fields on ParsedRelease (quality, source,
codec, audio_*, languages) but left tech_string at its parser-time
value — so the filename builders (movie_folder_name, episode_filename,
…) saw stale tech tokens even after a successful probe.

Re-derive tech_string the same way the parser does — quality.source.codec
joined by dots, skipping None — at the end of enrich_from_probe. Token-
level values still win because enrich only fills None fields.

Four new tests in TestTechString cover: enrichment rebuilds it,
existing source survives, no-info input leaves it untouched, fully
empty parsed produces ''.
2026-05-20 09:26:09 +02:00
francwa 03aa844d7d feat(release): inspect_release orchestrator + InspectedResult VO
New application-layer entry point that composes the four inspection
layers in one call:

  1. parse_release(name, kb)              -> (ParsedRelease, ParseReport)
  2. detect_media_type(parsed, path, kb)  -> patch parsed.media_type
  3. find_main_video(path, kb)            -> Path | None (top-level scan)
  4. prober.probe(video) + enrich         -> when video exists and
                                             media_type not in
                                             {unknown, other}

Returns a frozen InspectedResult(parsed, report, source_path,
main_video, media_info, probe_used). kb and prober are injected — no
module-level singletons in inspect.py.

analyze_release tool now delegates to inspect_release; its output
gains two fields, confidence (0-100) and road (easy/shitty/path_of_pain),
surfaced from ParseReport so the LLM can route by confidence. Spec
updated to document them.

12 new tests covering happy paths, probe gating (no video, media_type
'other', probe failure), mutation contract (detect refining
parsed.media_type, enrich filling None fields), resilience
(nonexistent path), and frozen contract. Suite: 1058 passing.
2026-05-20 09:15:29 +02:00
francwa c303efea48 refactor(probe): consolidate full probe() into MediaProber port
Add probe(video) -> MediaInfo | None to the MediaProber Protocol and
implement it on FfprobeMediaProber. The standalone
alfred/infrastructure/filesystem/ffprobe.py module is removed; all
callers (analyze_release / probe_media tools, testing scripts) now go
through the adapter.

Tests for the probe path moved to tests/infrastructure/test_ffprobe_prober.py
(patching subprocess.run at the adapter module level).

Unblocks the upcoming inspect_release orchestrator, which needs the
port — not a free function — to compose parse + main-video selection
+ probe in one shot.
2026-05-20 09:11:24 +02:00
francwa 5db350a1df Merge branch 'feat/release-parser-scoring' 2026-05-20 08:47:38 +02:00
francwa 12dc796ea2 docs(changelog): freeze confidence scoring + exclusion work block 2026-05-20 08:47:29 +02:00
francwa 9ddd85929e feat(release): pre-pipeline exclusion helpers
Add the application-layer helpers that decide which files are worth
parsing, sitting one notch above parse_release.

- is_supported_video(path, kb): extension-only check against
  kb.video_extensions. Lowercased suffix lookup. Directories and
  broken symlinks return False.
- find_main_video(folder, kb): top-level scan only (no recursion into
  subdirectories — releases that wrap their video in Sample/ are
  PATH_OF_PAIN territory). Lexicographically-first eligible file wins
  when several qualify (deterministic, no size-based ranking). A bare
  file as folder argument is supported for single-file releases.

No size threshold and no filename heuristics ('sample' / 'trailer'):
the parser's job is to extract structure, not to second-guess
non-standard release shapes. PoP catches the rest.

17 tests under tests/application/test_supported_media.py.
2026-05-20 01:34:32 +02:00
francwa ed7680b58f docs(changelog): log parse-confidence scoring + ParseReport tuple 2026-05-20 01:21:47 +02:00
francwa b4c9efd13b feat(release): parse_release returns (ParsedRelease, ParseReport)
Wire the scoring foundations into the parser entry point. parse_release
now returns a tuple — the structural ParsedRelease and a diagnostic
ParseReport carrying confidence (0-100), road
(EASY / SHITTY / PATH_OF_PAIN), the residual UNKNOWN tokens, and the
list of critical fields that couldn't be filled.

EASY is decided structurally (a group schema matched), independently
of the score. SHITTY vs PATH_OF_PAIN is decided by score against the
60 cutoff from scoring.yaml. Malformed names (forbidden chars) emit a
zero-confidence PoP report and short-circuit to parse_path=AI as
before.

ParsePath stays as-is (DIRECT / SANITIZED / AI) — it records *how* we
tokenized, not how confident we are. The two dimensions are now
properly separated.

Call sites propagated:
- alfred/application/filesystem/resolve_destination.py (4 occurrences)
- alfred/agent/tools/filesystem.py
- tests/domain/test_release.py
- tests/domain/test_release_fixtures.py
- tests/application/test_detect_media_type.py

New tests/domain/release/test_parser_v2_scoring.py (22 cases) locks
ParseReport validation, compute_score arithmetic, decide_road
thresholding, the collector helpers, and the end-to-end tuple contract.
2026-05-20 01:21:30 +02:00
francwa 98c688f29b feat(release): foundations for parse-confidence scoring
Add the building blocks for Phase A scoring without yet wiring them
into parse_release. Nothing changes at runtime — parse_release still
returns a single ParsedRelease — but the pieces needed to upgrade it
in a follow-up commit are now in place.

- alfred/knowledge/release/scoring.yaml: weights / penalties /
  thresholds. Title and media_type are heavy (30 / 20), structural
  fields medium (year 15, season 10), tech fields light (5 each).
  Unknown-token penalty 5 capped at -30. SHITTY/PoP cutoff at 60.
- load_scoring() loader with safe defaults baked in: a missing or
  partial YAML only de-tunes, never breaks.
- ReleaseKnowledge port grows a 'scoring: dict' field. YamlReleaseKnowledge
  populates it from load_scoring().
- New parser/scoring.py module with Road enum (EASY / SHITTY /
  PATH_OF_PAIN, distinct from ParsePath which records the tokenization
  route), and pure functions: compute_score, decide_road,
  collect_unknown_tokens, collect_missing_critical.
- ParseReport frozen VO in value_objects.py — exported alongside
  ParsedRelease.
2026-05-20 01:21:17 +02:00
francwa fcd80763e2 Merge branch 'refactor/release-parser-v2' 2026-05-20 01:08:20 +02:00
francwa 629387591f docs(changelog): freeze release parser v2 work block (2026-05-20) 2026-05-20 01:08:17 +02:00
francwa 230a7ab88a docs(changelog): log SHITTY simplification + distributor split 2026-05-20 01:03:52 +02:00
francwa 3737f66851 refactor(release): simplify SHITTY to dict-driven token tagging
Replace the ~480-line legacy heuristic block in services.py with a
small dict-driven pass in pipeline._annotate_shitty: each token is
looked up against the kb buckets (resolutions / sources / codecs /
distributors / year / sxxexx) with first-match-wins semantics, the
leftmost contiguous UNKNOWN run becomes the title, done.

SHITTY's scope is intentionally narrow — releases that *look* like
scene names but don't have a registered group schema. Anything more
exotic (parenthesized tech, bare-dashed title fragments, YT slugs,
franchise boxes) is PATH OF PAIN territory and stays out of here.

- annotate() no longer returns None; SHITTY is the always-on fallback
- services.py shrunk from ~525 to ~85 lines (legacy extractors gone)
- 4 fixtures get xfail markers documenting PoP-grade pathologies
  (deutschland franchise box, sleaford YT slug, super_mario bilingual,
  predator space-separators — the last one moved from shitty/ → pop/)
- ReleaseFixture grows xfail_reason; the parametrized suite wires the
  pytest.mark.xfail(strict=False) automatically
2026-05-20 01:03:25 +02:00
francwa fd3bd1ad8c feat(release): distinguish streaming distributors from sources
Introduce a separate dimension for streaming-platform tags (NF, AMZN,
DSNP, HMAX, ATVP, …) so they stop polluting the encoding-source field.
WEB-DL is the source; the platform that released it is the distributor.

- new distributors.yaml knowledge file
- ReleaseKnowledge port exposes distributors set
- TokenRole.DISTRIBUTOR + ParsedRelease.distributor field
- removed NF/AMZN/DSNP/HMAX/ATVP from sources.yaml
- notre_planete fixture now records distributor: NF
2026-05-20 01:03:11 +02:00
francwa 7dc7f0c241 feat(release): v2 enricher pass for audio/video-meta/edition/language
The EASY pipeline now extracts the full ParsedRelease surface from
known-group releases, not just the structural backbone. Behavior is
unchanged for releases that don't carry these tokens.

Pipeline (parser/pipeline.py):
- Structural walk (renamed _annotate_structural): no longer requires
  body to be fully consumed. Tokens passed over between schema chunks
  remain UNKNOWN so the enricher pass can claim them.
- _find_chunk(): scans forward in the body for the next token matching
  a given role, skipping already-annotated tokens. Lets optional and
  mandatory chunks both tolerate intercalated enricher tokens.
- _annotate_enrichers(): new non-positional pass. Walks UNKNOWN tokens
  and tags AUDIO_CODEC / AUDIO_CHANNELS / BIT_DEPTH / HDR / EDITION /
  LANGUAGE. Multi-token sequences from kb.audio / kb.video_meta /
  kb.editions are matched first (longest-first ordering preserved from
  the YAML), single tokens after.
- _apply_sequences(): mutates the token list, tagging the first token
  of a matched sequence with extra['sequence']=<canonical value> and
  trailing members with extra['sequence_member']='True' so assemble
  skips them.
- _detect_channel_pairs(): handles the '5.1' / '7.1' case where the
  '.' separator splits the layout into two tokens. Strips a trailing
  '-GROUP' suffix on the second before joining.

Assemble:
- New fields populated: languages (list), audio_codec, audio_channels,
  bit_depth, hdr_format, edition. Each role-handler skips
  sequence_member tokens.
- media_type heuristic extended: edition in {COMPLETE, INTEGRALE,
  COLLECTION} + no season → tv_complete (mirrors legacy).

Tests:
- 4 new TestEnrichers cases covering bit_depth+audio_codec+channels,
  HDR sequence + edition sequence + TrueHD.Atmos + 7.1, multi-language
  with DTS-HD.MA sequence, TV episode with single language.
- All 14 v2 tests + 30 fixture tests still green. Suite: 1011 passed,
  8 skipped.

Refs: project_release_parser_v2_specs (memory)
2026-05-20 00:26:05 +02:00
francwa 075a827b0e feat(release): wire v2 EASY path for known release groups
The annotate-based v2 pipeline now handles releases ending in -KONTRAST,
-ELiTE, or -RARBG. Unknown groups still fall through to the legacy
SHITTY heuristic in services.py — nothing changes for them.

Pipeline (alfred/domain/release/parser/pipeline.py):
- tokenize(): string-ops separator split, strips [site.tag] first.
- annotate(): right-to-left group detection (priority to codec-GROUP
  shape, fallback to any non-source dashed token), GroupSchema lookup
  via the kb port, then lockstep walk of tokens against schema chunks.
  Optional chunks skip on mismatch, mandatory mismatches return None so
  the caller falls back gracefully. CODEC pre-consumed by a codec-GROUP
  trailing token correctly skips the CODEC chunk in the body walk.
- assemble(): folds annotated tokens into a ParsedRelease-compatible
  dict (title joined by '.', group from the codec-GROUP token's extras).

Schema (alfred/domain/release/parser/schema.py):
- GroupSchema + SchemaChunk frozen value objects.
- TokenRole.GROUP added.

Port + adapter:
- ReleaseKnowledge.group_schema(name) lookup added (case-insensitive).
- YamlReleaseKnowledge loads alfred/knowledge/release/release_groups/
  *.yaml at construction time; learned overrides in
  data/knowledge/release/release_groups/ also picked up.

Knowledge:
- release_groups/kontrast.yaml, elite.yaml, rarbg.yaml declare the
  canonical chunk_order. ELiTE marks source as optional (Foundation.S02
  has no WEBRip token).

Services:
- parse_release tries the v2 path first; on None falls through to the
  legacy implementation untouched.

Tests:
- tests/domain/release/test_parser_v2_easy.py (10 cases) cover group
  detection (codec-GROUP, dashed-source skip, no-dash → unknown),
  schema-driven annotation (movie, TV episode, season pack with
  optional source, unknown group returns None), and field assembly.
- Existing tests/domain/test_release_fixtures.py (30 cases) stay green:
  5 EASY fixtures now produced by v2, 25 SHITTY/PATH OF PAIN fixtures
  still produced by the legacy path. Verified via spy on v2.assemble.

Suite: 1007 passed, 8 skipped.

Refs: project_release_parser_v2_specs (memory)
2026-05-20 00:21:11 +02:00
francwa a2c917618f feat(release): scaffold v2 parser package (annotate-based pipeline)
New package alfred/domain/release/parser/ lays the foundation for the
release parser refactor (specs in memory). Exposes:

- Token: frozen VO carrying text + stream index + TokenRole + extra dict.
  with_role() returns a new instance (no mutation).
- TokenRole: str-backed enum split into structural (TITLE/YEAR/SEASON_EP/
  GROUP), technical (RESOLUTION/SOURCE/CODEC/AUDIO_*/BIT_DEPTH/HDR/
  EDITION/LANGUAGE), and meta (SITE_TAG/UNKNOWN) families.
- pipeline.strip_site_tag(): pulls a [site.tag] prefix or suffix.
- pipeline.tokenize(): release name -> list[Token] (all UNKNOWN),
  string-ops split on kb.separators (no regex, per CLAUDE.md).
- pipeline.annotate(): documented stub. Walk order recorded in docstring
  (group right-to-left, then season/episode, year, tech, title).

Legacy parse_release in release.services remains the live implementation
until the annotate step lands. Scaffolding tests verify Token API,
site-tag stripping (prefix/suffix), and tokenize output shape.

Refs: project_release_parser_v2_specs (memory)
2026-05-20 00:12:33 +02:00
francwa 9f10f4e0ad Merge branch 'refactor/domain-release-knowledge'
Final DDD purification of the release parser. Domain layer no longer
imports anything from infrastructure, no YAML at import time, and
ParsedRelease's filesystem-builders are pure (Option B).

- ReleaseKnowledge Protocol port + YamlReleaseKnowledge adapter
- parse_release(name, kb) explicit injection
- ParsedRelease.title_sanitized field; builders accept already-safe strings
- Callers (resolve_destination, detect_media_type, find_video,
  analyze_release) thread the kb through
- 987 tests pass
2026-05-19 22:05:36 +02:00
francwa cd814c7922 docs(changelog): log refactor/domain-release-knowledge work block 2026-05-19 22:05:29 +02:00
francwa 6802933acd test(release): adapt suite to explicit ReleaseKnowledge injection
- test_release.py / test_release_fixtures.py: module-level
  _KB = YamlReleaseKnowledge() + thin _parse(name) helper threading it
  into parse_release. test_show_folder_name_strips_windows_chars renamed
  to test_show_folder_name_uses_already_safe_title to reflect the
  Option B contract (caller sanitizes via kb.sanitize_for_fs).
- test_detect_media_type.py: same _KB pattern, all
  detect_media_type(parsed, path) calls now pass kb.
- test_filesystem_extras.py: find_video_file(path) calls now pass kb.
- test_enrich_from_probe.py: _bare() helper adds the new
  title_sanitized field.
- test_resolve_destination.py: drop _sanitize import + TestSanitize
  class (helper deleted), add tmdb_title_safe arg to
  _resolve_series_folder calls.

987 passed, 8 skipped.
2026-05-19 22:05:26 +02:00
francwa bf37a9d09e refactor(release): thread ReleaseKnowledge through callers
Wires the new explicit-kb signatures into every caller of the release
parser and the filesystem-extension helpers.

- application/filesystem/resolve_destination.py: module-level singleton
  _KB: ReleaseKnowledge = YamlReleaseKnowledge(); each use case now calls
  parse_release(release_name, _KB) and sanitizes TMDB strings via
  _KB.sanitize_for_fs(...) before passing them to the pure ParsedRelease
  builders. Local _sanitize helper + _WIN_FORBIDDEN regex dropped.
- application/filesystem/detect_media_type.py: signature is now
  detect_media_type(parsed, source_path, kb); uses kb.metadata_extensions,
  kb.video_extensions, kb.non_video_extensions.
- infrastructure/filesystem/find_video.py: find_video_file(path, kb) uses
  kb.video_extensions instead of an imported constant.
- agent/tools/filesystem.py::analyze_release imports the application _KB
  singleton and passes it through to parse_release / detect_media_type /
  find_video_file.
2026-05-19 22:05:19 +02:00
francwa 4a74fff9cc refactor(release): purify domain — parse_release(name, kb) + ParsedRelease Option B
Removes the last domain → infrastructure leak in the release parser.

services.py:
- parse_release(name, kb) takes the knowledge as an explicit parameter.
- Every helper (_tokenize, _is_well_formed, _extract_tech,
  _extract_languages, _extract_audio, _extract_video_meta,
  _extract_edition, _extract_title, _infer_media_type) takes kb.
- No more module-level YAML loading.

value_objects.py — Option B:
- Sanitization happens once at parse time; ParsedRelease now carries
  a title_sanitized: str field alongside title.
- Builder methods (show_folder_name, episode_filename, movie_folder_name,
  movie_filename) become pure: they accept already-sanitized
  tmdb_title_safe / tmdb_episode_title_safe arguments. Callers at the
  use-case boundary sanitize via kb.sanitize_for_fs(...) before passing in.
- All domain-knowledge constants removed (_RESOLUTIONS, _SOURCES, _CODECS,
  _AUDIO, _VIDEO_META, _EDITIONS, _HDR_EXTRA, _MEDIA_TYPE_TOKENS,
  _LANGUAGE_TOKENS, _FORBIDDEN_CHARS, _*_EXTENSIONS, _WIN_FORBIDDEN_TABLE,
  _sanitize_for_fs). The module is now pure DDD.
2026-05-19 22:05:10 +02:00
francwa c3a3cb50c9 refactor(release): introduce ReleaseKnowledge Protocol port + YamlReleaseKnowledge adapter
Adds the port/adapter pair that lets the release domain consume parsing
knowledge without importing infrastructure or loading YAML at import time.

- alfred/domain/release/ports/knowledge.py declares the read-only query
  surface: token sets (resolutions, sources, codecs, language_tokens,
  forbidden_chars, hdr_extra), structured dicts (audio, video_meta,
  editions, media_type_tokens), separators list, file-extension sets,
  and sanitize_for_fs(text).
- alfred/infrastructure/knowledge/release_kb.py loads every YAML once
  at construction and exposes them as attributes, with an immutable
  str.maketrans table backing sanitize_for_fs.

No domain code is wired to the port yet — that lands in the next commit.
2026-05-19 22:05:01 +02:00
francwa 14941d47c0 Merge branch 'refactor/domain-io-extraction'
Extract all I/O (subprocess, filesystem, YAML loading) from the domain
layer via ports/adapters. domain/subtitles/ now has zero imports from
infrastructure/. The remaining domain → infra leak (release knowledge
loaded at import time) is documented in tech-debt for a dedicated branch.
2026-05-19 15:16:59 +02:00
francwa df798f55cc refactor(subtitles): introduce SubtitleKnowledge Protocol port
Domain services (SubtitleIdentifier, PatternDetector) used to import the
concrete SubtitleKnowledgeBase class directly from infrastructure for
their type hint. With this commit they depend on a structural Protocol
in alfred/domain/subtitles/ports/knowledge.py declaring just the 7
read-only query methods the domain actually consumes.

The concrete YAML-backed SubtitleKnowledgeBase in infrastructure remains
the sole adapter — no rename, no shim. With this change
alfred/domain/subtitles/ has zero imports from alfred/infrastructure/.

Also extend the changelog entry covering the full domain-io-extraction
branch.
2026-05-19 15:15:43 +02:00
francwa 535935cc73 docs(changelog): summarize refactor/domain-io-extraction work block 2026-05-19 15:11:17 +02:00
francwa 6e252d1e81 refactor(subtitles): inject default rules into SubtitleRuleSet.resolve()
aggregates.py used to call SubtitleKnowledgeBase().default_rules() via a
DEFAULT_RULES() helper, which silently pulled the infrastructure layer
(YAML loader) into the domain on every resolve.

Make the dependency explicit: resolve() now takes the default rules as
a parameter, and the caller (the ManageSubtitles use case) loads them
from the KB once and passes them in. Domain stays I/O-free.

- Drop DEFAULT_RULES helper and the SubtitleKnowledgeBase import from
  alfred/domain/subtitles/aggregates.py
- SubtitleRuleSet.resolve(default_rules: SubtitleMatchingRules)
- manage_subtitles use case passes kb.default_rules() at the call site
- Tests use a local SubtitleMatchingRules stand-in instead of relying
  on KB defaults
2026-05-19 15:10:06 +02:00
francwa 903e9e7117 refactor(subtitles): move SubtitlePlacer to application layer
The placer performs filesystem I/O (os.link) — it belongs in the
application layer, not the domain. Domain services should be pure.

- Move alfred/domain/subtitles/services/placer.py to
  alfred/application/subtitles/placer.py
- Move tests/domain/test_subtitle_placer.py to
  tests/application/test_subtitle_placer.py
- Update all callers (manage_subtitles use case, metadata store, tests)
- Drop placer re-exports from domain.subtitles.services.__init__
2026-05-19 15:07:39 +02:00
francwa 9556bf9e08 refactor(domain): strip live filesystem I/O from VOs and entities
DDD-pure cleanup — entities and value objects no longer query the world
at read time.

  FilePath: drop .exists() / .is_file() / .is_dir(). The VO is now a
    pure address; ask the injected FilesystemScanner for live state.
  Movie:    drop .has_file() / .is_downloaded(). Invariant: when the
    application sets file_path, it has already constated the file
    exists; downstream readers trust the snapshot.
  Episode:  same — drop .has_file() / .is_downloaded().
  SubtitlePlacer: drop the pre-check .exists() calls. The placer now
    attempts os.link() and reports FileNotFoundError / FileExistsError
    as skip reasons. Removes a TOCTOU race as a bonus.

Tests adjusted: the FilePath VO method tests are gone (the methods are
gone), test_has_file_false_when_no_path replaced by a plain assertion
on file_path is None. Placer tests are unchanged — the skip-reason
strings ('not found', 'already exists') match the new try/except paths.

The 'snapshot value objects' pattern (ProbedMediaInfo, TmdbMovieInfo)
that this cleanup enables is documented in refactor_domain_io.md, to
be applied when a future use case actually needs richer metadata —
not now, no speculative VOs.
2026-05-19 14:58:59 +02:00
francwa e6ee700825 refactor(subtitles): inject MediaProber/FilesystemScanner ports into domain services
Domain services no longer call subprocess or pathlib directly. Introduces
two Protocol ports in domain/shared/ports/:

  MediaProber.list_subtitle_streams(video) -> list[SubtitleStreamInfo]
  FilesystemScanner.scan_dir / stat / read_text  -> list[FileEntry] | ...

Concrete adapters live in infrastructure/:

  FfprobeMediaProber          (wraps subprocess + ffprobe + JSON)
  PathlibFilesystemScanner    (wraps pathlib + os reads)

SubtitleIdentifier and PatternDetector now take (kb, prober, scanner) at
construction time. Their internals work over FileEntry snapshots and
SubtitleStreamInfo records — no more ad-hoc Path.is_file/iterdir/stat or
embedded subprocess.run loops. _count_entries now takes raw SRT text
(returned by scanner.read_text) so SRT-only entry counting stays out of
the FS layer.

manage_subtitles use case instantiates the two adapters once and injects
them into both services. Tests pass real adapters and patch
`alfred.infrastructure.probe.ffprobe_prober.subprocess.run` for the
ffprobe-failure cases. _classify_single tests build FileEntry via a
small helper.

Domain is now free of subprocess / direct filesystem reads in the
subtitle pipeline. The only remaining I/O hooks are FilePath VO
convenience methods (exists/is_file/is_dir) which stay as a deliberate
affordance on the value object.
2026-05-19 14:52:24 +02:00
francwa ced72547f7 refactor(knowledge): extract YAML loaders from domain to infrastructure
The domain layer no longer reads YAML files. All knowledge loaders move
from `alfred/domain/*/knowledge/` to `alfred/infrastructure/knowledge/`:

  domain/release/knowledge.py
    → infrastructure/knowledge/release.py
  domain/shared/knowledge/language_registry.py
    → infrastructure/knowledge/language_registry.py
  domain/subtitles/knowledge/{loader,base}.py
    → infrastructure/knowledge/subtitles/{loader,base}.py

Callers in domain/release/{services,value_objects}.py,
domain/subtitles/{aggregates,services/*}.py, and
application/filesystem/manage_subtitles.py updated to absolute imports.
Re-exports of KnowledgeLoader/SubtitleKnowledgeBase from
domain/subtitles/__init__.py dropped (no shim per project convention).
Tests follow the moved targets.
2026-05-19 14:35:18 +02:00
francwa f338b08706 refactor(release): type media_type/parse_path as true enums
ParsedRelease.media_type is now MediaTypeToken (not str) and parse_path
is ParsePath (not str). __post_init__ keeps a tolerant constructor that
coerces raw strings via the enum, so callers passing 'movie'/'direct'
still work transparently. Since both enums inherit from str, existing
string comparisons and JSON serialization remain unchanged.
2026-05-19 14:21:27 +02:00
francwa da484d7474 refactor(release): typed enums + __post_init__ validation on ParsedRelease
ParsedRelease accepted any string for media_type/parse_path and had no
validation on numeric ranges (season=-5 was silently accepted). Tighten
both ends:

- New str-backed Enums MediaTypeToken and ParsePath. Inherit from str so
  every existing comparison ('== "movie"'), JSON serialization, and TMDB
  DTO interop keeps working unchanged.
- ParsedRelease.__post_init__ now validates: raw/group non-empty, year in
  1888-2100, season 0-100, episode 0-9999, episode_end >= episode,
  media_type/parse_path against the enum allowlist.
- services.py uses the enum .value members everywhere instead of bare
  string literals — kills the typo risk.
2026-05-19 14:17:56 +02:00
francwa 481eeb5afd refactor(domain): identity-based equality + dedup track helpers
Two related DDD fixes for Movie and Episode entities:

- Identity equality: @dataclass(eq=False) with custom __eq__/__hash__.
  Movie is identified by imdb_id, Episode by (season, episode) within
  the TVShow aggregate. Auto-generated field-by-field equality was
  incorrectly making two Movie instances with the same imdb_id but
  different audio_tracks compare unequal — breaks dedup/caching.

- MediaWithTracks mixin: the 5 audio/subtitle helpers
  (has_audio_in / audio_languages / has_subtitles_in / has_forced_subs /
  subtitle_languages) were duplicated verbatim between Movie and Episode.
  Extracted to shared/media/tracks_mixin.py; both entities now inherit.

Bonus: dropped the object.__setattr__ coercion dance in Movie.__post_init__
— the class isn't frozen so plain assignment is the right call.
2026-05-19 14:17:47 +02:00
francwa 7cd24f3a31 refactor(domain): freeze media track value objects
AudioTrack, VideoTrack, SubtitleTrack and MediaInfo are snapshots of a
single ffprobe run — model them as proper immutable value objects.

- @dataclass(frozen=True) on all four
- MediaInfo track collections become tuple[...] instead of list[...]
- ffprobe adapter rewritten to build tuples up-front instead of
  appending/setattr'ing on a constructed instance
2026-05-19 14:17:27 +02:00
francwa eb8995cfc3 refactor(subtitles): drop dead scanner module
SubtitleScanner was an earlier iteration superseded by SubtitleIdentifier
and never imported in production code (only by its own tests). Removing
both keeps the bounded context clean and shrinks the surface.
2026-05-19 14:17:15 +02:00
francwa f6eef59fca refactor: tech debt mini-pass (items 5, 6, 7, 20)
Low-risk cleanup items, no functional change to the parser. The
philosophy remains: keep the parser simple, the AI handles edge cases.

- Extract duplicated 'fs-safe title → dot-folder-name' regex into
  to_dot_folder_name() in domain/shared/value_objects.py. Used by both
  MovieTitle.normalized() and TVShow.get_folder_name() (item #5).
- ParsedRelease.languages now uses field(default_factory=list) instead
  of a manual __post_init__ assigning [] via object.__setattr__ (#6).
- tv_shows/entities.py module docstring: prepend ASCII ownership tree
  for quicker visual scan of the aggregate hierarchy (#7).
- file_extensions.yaml: split subtitle sidecars (.srt/.sub/.idx/.ass/.ssa)
  into a dedicated 'subtitle:' category instead of lumping them under
  'metadata:'. _METADATA_EXTENSIONS at the value_objects.py level remains
  the union of both — detect_media_type behavior unchanged. New loader
  load_subtitle_extensions() exposes the distinct subtitle set for future
  callers in the subtitles domain (#20).

Suite: 1020 passed, 8 skipped.
2026-05-18 16:24:28 +02:00
francwa 273510dff8 test(fixtures): seed PATH OF PAIN bucket with 10 worst-case fixtures
10 pathological release names mined from the real downloads folder.
Each fixture locks in the current parse_release output (including
its silent losses and false positives) so future parser improvements
are intentional, not silent drift.

Cases:
- Khruangbin yt-dlp slug (UTF-8 wide pipe '|', YT ID as group)
- Deutschland 83-86-89 franchise box (group=S03 misdetection)
- Chérie Le BéBé (accented chars preserved, VFF language)
- Jimmy Carr 8-word stand-up special title
- [ OxTorrent.vc ] prefix + XviD codec (site_tag prefix)
- Prodiges S12E01 with episode title + air-date silently lost
- The Prodigy: apostrophe + Blu-ray dash + 1080i + multi-word audio
  = full AI-path degeneration (everything UNKNOWN)
- Sleaford Mods yt-dlp slug (YT ID glued to year)
- Super Mario Bros [FR-EN] (bilingual tag mistaken for group)
- Gilmore Girls Complete S01-S07 (the well-behaved exception:
  COMPLETE token correctly drives tv_complete + REPACK + 10bit)

Also adds shitty + path_of_pain to the per-bucket sanity assertion.

Suite: 1020 passed, 8 skipped.
2026-05-18 15:57:56 +02:00
francwa c1831e3f46 test(fixtures): drop derry_duplicate_naming (was a copy-paste artifact)
The release name mixed two distinct releases — not a real-world case
worth anti-regression. SHITTY bucket now holds 14 fixtures (down from 15).
2026-05-18 15:51:11 +02:00
francwa aa182458b8 test(fixtures): seed SHITTY release bucket with 15 anti-regression cases
Add 15 expected.yaml fixtures under tests/fixtures/releases/shitty/
covering the awkward but real-world release names from the downloads
folder. Each fixture locks in the current parse_release behavior so
future parser changes are intentional, not silent drift.

Cases captured:
- Angel INTEGRALE 3-level hierarchy (tv_complete media_type)
- Buffy custom French title with dots preserved
- Archer S14E09E10E11 multi-episode (E11 lost — tech debt)
- Notre Planète lowercase s01e01
- Vinyl ' - 1x01 - FHD' (stray dash artifact — tech debt)
- Deutschland.83 (year-suffix as part of title)
- Tatortreiniger S01-06 range (falls to movie — tech debt)
- Derry Girls duplicated title
- Jurassic Park bare folder (media_type=unknown)
- La Nuit au Musée bilingual MULTI
- Chérie j'ai agrandi (ASCII-stripped apostrophe, parses fine)
- Honey Don't (unescaped apostrophe — full AI-path degeneration)
- Hook MULTi.SUBS movie with Subs/ folder
- Predator Badlands space separators (group=UNKNOWN — tech debt)
- Westworld S04 Subs.Only (no video file)

Each fixture also captures the future 3-flow routing (library /
torrents / seed_hardlinks) ahead of the organize_media refactor.

Suite: 1011 passed, 8 skipped.
2026-05-18 15:48:41 +02:00
francwa 774f71c8cc chore(gitignore): track CHANGELOG.md explicitly
The blanket *.md ignore was hiding CHANGELOG.md, forcing 'git add -f' on
every update. Allow-list it so the file lives under normal git tracking.
CLAUDE.md stays local (user keeps it personal until a dedicated repo).
2026-05-18 15:39:04 +02:00
francwa 7bc50fd5b8 test: add real-world release fixtures (EASY bucket)
Captures 5 canonical releases from /mnt/testipool/downloads as parametrized
fixtures under tests/fixtures/releases/easy/. Each fixture declares the
release name, expected ParsedRelease fields, original tree, and the future
routing (library / torrents / seed_hardlinks) for the upcoming organize_media
refactor.

Today only the 'parsed' section is asserted; tree is materialized into a
tmp_path to catch typos. Routing is captured ahead of the planner work — it
becomes verifiable once organize_media lands.

Cases: back_in_action (movie), slow_horses_single_ep (TV single),
foundation_season_pack (S02 + .nfo noise), long_walk_with_noise (movie +
KONTRAST.TOP.txt), sinners_yts (YTS bracket-heavy + Subs/ dir).

Also tracks CHANGELOG.md under [Unreleased] / Added.
2026-05-18 15:36:19 +02:00
francwa f17abdbaec chore: cleanup — remove shims, fix ruff warnings, ignore noisy rules
- Removed backward-compat shims _sanitise_for_fs / _strip_episode_from_normalised
  in domain/release/value_objects.py (zero callers).
- Fixed ruff warnings across the codebase:
    * PLW1510: explicit check=False on subprocess.run calls
    * PLC0415: promoted lazy imports to module top where no cycle exists
      (manage_subtitles, placer, qbittorrent/client, file_manager)
    * E402: fixed module-level import ordering in language_registry.py and
      subtitles/knowledge/loader.py
    * F841 / B007: removed unused locals (identifier.py)
    * C416: replaced unnecessary set comprehension with set() in
      release/knowledge.py
- Ruff config: ignore PLR0911/PLR0912 globally (noisy on mappers and
  orchestrator use-cases) and PLW0603 (intentional for the memory singleton).
- Updated tech debt memory: P1 done, ShowStatus actually complete (was a
  stale note).
2026-05-18 00:02:45 +02:00
francwa 1d50b63af2 Merge branch 'dev/sprint-cleanup'
Multi-week sprint: ISO 639-2/B language unification, release parser
unification + data-driven tokenizer, removal of fossil services
(movies/tv_shows/subtitles), subtitle services split into a package,
MediaInfo split, test suite expansion (990 passing).

See CHANGELOG.md [Unreleased] for the user-facing summary.
2026-05-17 23:42:05 +02:00
francwa 891ba502a2 chore: apply pre-commit auto-fixes (trim trailing whitespace, EOF) 2026-05-17 23:41:54 +02:00